Unable to task more than 4 threads

buffalo.bill · July 31, 2024, 2:08pm

Hi! Thanks for the great projects. I’m self-hosting and having an issue where despite having a 8 core/16 thread CPU, I’m unable to get LibreTranslate/Argos Translate to use more than 4 threads at once. I’ve tried submitting large single jobs, and multiple smaller jobs in parallel through the REST API. RAM utilization does not appear to be a limiting factor. LT_THREADS is set to 16 and htop seems to indicate that I am spinning up that many threads.

Checking my understanding a bit here:

The LT_THREADS argument appears to be for the web server and has no affect on the Argos/C2Translate translation device?
- LibreTranslate/libretranslate/main.py at main · LibreTranslate/LibreTranslate · GitHub
C2Translate supports setting a number of threads for a translator device to use, but I don’t see a way to set that from LibreTranslate or Argos Translate
- argos-translate/argostranslate/translate.py at master · argosopentech/argos-translate · GitHub

I’ve tried running LibreTranslate both via pip and Docker and have observed the same 4 thread limit with both. What am I missing here? Is the intention to spin up multiple LibreTranslate services and round-robin to them via a reverse proxy? Or, should I be seeing more utilization than I currently am when submitting large numbers of parallel requests to a single LibreTranslate server?

Thanks!

pierotofy · August 2, 2024, 4:28am

Try running with gunicorn and choose a suitable number of workers: GitHub - LibreTranslate/LibreTranslate: Free and Open Source Machine Translation API. Self-hosted, offline capable and easy to setup.

buffalo.bill · August 2, 2024, 2:54pm

Sorry if I’m missing something, but gunicorn --bind 0.0.0.0:5000 ‘wsgi:app(threads=“16”)’ isn’t creating different behavior for me. I’m not seeing an argument for “workers”.

It looks like gunicorn uses the same get_args function: LibreTranslate/scripts/gunicorn_conf.py at main · LibreTranslate/LibreTranslate · GitHub, and that function feeds threads into serve(), not into the number of cores active in the translation device, which is 4 by default: Multithreading and parallelism — CTranslate2 4.3.1 documentation

I’m sure I’m missing something obvious, thanks for your help so far!

buffalo.bill · August 5, 2024, 9:23pm

Adding intra_threads=16 here allowed me to task all 16 threads of my cpu.

github.com

argosopentech/argos-translate/blob/master/argostranslate/translate.py#L168


      
          def __init__(self, from_lang: Language, to_lang: Language, pkg: Package):
              self.from_lang = from_lang
              self.to_lang = to_lang
              self.pkg = pkg
              self.translator = None
              self.sentencizer = SpacySentencizerSmall()
          
          def hypotheses(self, input_text: str, num_hypotheses: int = 4) -> list[Hypothesis]:
              if self.translator is None:
                  model_path = str(self.pkg.package_path / "model")
                  self.translator = ctranslate2.Translator(model_path, device=settings.device)
              paragraphs = ITranslation.split_into_paragraphs(input_text)
              info("paragraphs:", paragraphs)
              translated_paragraphs = []
              for paragraph in paragraphs:
                  translated_paragraphs.append(
                      apply_packaged_translation(
                          self.pkg, paragraph, self.translator, num_hypotheses, self.sentencizer
                      )
                  )
              info("translated_paragraphs:", translated_paragraphs)

argosopentech · August 6, 2024, 12:26am

Thanks for looking into this!

I could increase the default number of threads in Argos Translate if that would be useful. Or I could take the number of threads as a configuration option and then pass it to CTranslate2.

pierotofy · August 6, 2024, 4:15pm

+1 for a configuration option.

buffalo.bill · August 6, 2024, 5:44pm

I think you want arguments for both inter_threads and intra_threads as they’re both useful in differing contexts. It looks like GPUs, especially multiple, introduce extra complexity. I’m not the best with user interface design/human factors, so not quite sure the best way to handle it: Multithreading and parallelism — CTranslate2 4.3.1 documentation

argosopentech · August 15, 2024, 12:50pm

I added a configuration option for inter_threads and intra_threads to the Argos Translate source:

This should be available with Argos Translate 1.10

export ARGOS_INTER_THREADS="4"
export ARGOS_INTRA_THREADS="6"