Unable to task more than 4 threads

Hi! Thanks for the great projects. I’m self-hosting and having an issue where despite having a 8 core/16 thread CPU, I’m unable to get LibreTranslate/Argos Translate to use more than 4 threads at once. I’ve tried submitting large single jobs, and multiple smaller jobs in parallel through the REST API. RAM utilization does not appear to be a limiting factor. LT_THREADS is set to 16 and htop seems to indicate that I am spinning up that many threads.

Checking my understanding a bit here:

I’ve tried running LibreTranslate both via pip and Docker and have observed the same 4 thread limit with both. What am I missing here? Is the intention to spin up multiple LibreTranslate services and round-robin to them via a reverse proxy? Or, should I be seeing more utilization than I currently am when submitting large numbers of parallel requests to a single LibreTranslate server?

Thanks!

Try running with gunicorn and choose a suitable number of workers: GitHub - LibreTranslate/LibreTranslate: Free and Open Source Machine Translation API. Self-hosted, offline capable and easy to setup.

Sorry if I’m missing something, but gunicorn --bind 0.0.0.0:5000 ‘wsgi:app(threads=“16”)’ isn’t creating different behavior for me. I’m not seeing an argument for “workers”.

It looks like gunicorn uses the same get_args function: LibreTranslate/scripts/gunicorn_conf.py at main · LibreTranslate/LibreTranslate · GitHub, and that function feeds threads into serve(), not into the number of cores active in the translation device, which is 4 by default: Multithreading and parallelism — CTranslate2 4.3.1 documentation

I’m sure I’m missing something obvious, thanks for your help so far!

Adding intra_threads=16 here allowed me to task all 16 threads of my cpu.

1 Like

Thanks for looking into this!

I could increase the default number of threads in Argos Translate if that would be useful. Or I could take the number of threads as a configuration option and then pass it to CTranslate2.

+1 for a configuration option.

I think you want arguments for both inter_threads and intra_threads as they’re both useful in differing contexts. It looks like GPUs, especially multiple, introduce extra complexity. I’m not the best with user interface design/human factors, so not quite sure the best way to handle it: Multithreading and parallelism — CTranslate2 4.3.1 documentation

I added a configuration option for inter_threads and intra_threads to the Argos Translate source:

This should be available with Argos Translate 1.10

export ARGOS_INTER_THREADS="4"
export ARGOS_INTRA_THREADS="6"
3 Likes