Hi! Thanks for the great projects. I’m self-hosting and having an issue where despite having a 8 core/16 thread CPU, I’m unable to get LibreTranslate/Argos Translate to use more than 4 threads at once. I’ve tried submitting large single jobs, and multiple smaller jobs in parallel through the REST API. RAM utilization does not appear to be a limiting factor. LT_THREADS is set to 16 and htop seems to indicate that I am spinning up that many threads.
Checking my understanding a bit here:
The LT_THREADS argument appears to be for the web server and has no affect on the Argos/C2Translate translation device?
C2Translate supports setting a number of threads for a translator device to use, but I don’t see a way to set that from LibreTranslate or Argos Translate
I’ve tried running LibreTranslate both via pip and Docker and have observed the same 4 thread limit with both. What am I missing here? Is the intention to spin up multiple LibreTranslate services and round-robin to them via a reverse proxy? Or, should I be seeing more utilization than I currently am when submitting large numbers of parallel requests to a single LibreTranslate server?
Sorry if I’m missing something, but gunicorn --bind 0.0.0.0:5000 ‘wsgi:app(threads=“16”)’ isn’t creating different behavior for me. I’m not seeing an argument for “workers”.
I could increase the default number of threads in Argos Translate if that would be useful. Or I could take the number of threads as a configuration option and then pass it to CTranslate2.
I think you want arguments for both inter_threads and intra_threads as they’re both useful in differing contexts. It looks like GPUs, especially multiple, introduce extra complexity. I’m not the best with user interface design/human factors, so not quite sure the best way to handle it: Multithreading and parallelism — CTranslate2 4.3.1 documentation