I am trying to self-host a LibreTranslate service on a GPU server but the translation is not to be faster than on my CPU server, which surprises me.
My first hypothesis was that the GPU was not used by the LibreTranslate setup I made but if I use a tool like
nvtop while translating, it shows that the GPU is used.
Maybe someone here already worked on a similar setup and knows what I am doing wrong ?
I am using the command
docker-compose -f docker-compose.cuda.yml up -d --build and I tried to add those lines to the
runtime: nvidia environment: - NVIDIA_VISIBLE_DEVICES=all - NVIDIA_DRIVER_CAPABILITIES=all
Here is a description of my current setup:
Ubuntu: 20.04.5 LTS
Docker: 20.10.21 (+ nvidia-docker)
GPU: NVIDIA A100-SXM4-40GB
Thank you for your help !