GPU performances issues


I am trying to self-host a LibreTranslate service on a GPU server but the translation is not to be faster than on my CPU server, which surprises me.
My first hypothesis was that the GPU was not used by the LibreTranslate setup I made but if I use a tool like nvtop while translating, it shows that the GPU is used.
Maybe someone here already worked on a similar setup and knows what I am doing wrong ?

I am using the command docker-compose -f docker-compose.cuda.yml up -d --build and I tried to add those lines to the docker-compose.cuda.yml file:

    runtime: nvidia

Here is a description of my current setup:

Ubuntu: 20.04.5 LTS
Docker: 20.10.21 (+ nvidia-docker)
Docker-compose: 1.29.2
Driver: nvidia-driver-520

Thank you for your help !


Based on the CTranslate2 benchmarks I would expect the GPU translation to be significantly faster than CPU translation. My best guess of what’s happening here is that the GPU translations have a higher throughput but without a latency improvement so it’s not noticable if you’re the only one using the server at that time. I haven’t done much CTranslate2 inference on GPUs myself (because the CPU performance is so good :rocket:).