Python Library of LibreTranslate run with GuniCorn and Nginx not freeing up threads

I have hosted LibreTranslate on Ubuntu 20.04 by following tutorial available
on https://github.com/LibreTranslate/LibreTranslate and https://github.com/argosopentech/LibreTranslate-init url.

Initially, the application is running perfectly fine. However, after running for certain amount of time, app starts to return 500 Internal Server error. So i investigated the issue and found that gunicorn threads are created with each requests but are not getting terminated after the request is processed. Now this is causing issue on long run as eventually there are no more resources available to create more threads. I am not sure weather its issue with the library or Gunicorn.

I have set GuniCorn workers count to 4.

So when i starts to receive 500 error, then each workers have around 18k thread count. Used following command to get thread count

watch ps -o thcount <pid>

Following are the gunicorn error logs

[2022-05-11 16:04:19 +0100] [553213] [ERROR] Error handling request /detect
Traceback (most recent call last):
  File "/home/support/LibreTranslate/env/lib/python3.8/site-packages/gunicorn/workers/sync.py", line 136, in handle
    self.handle_request(listener, req, client, addr)
  File "/home/support/LibreTranslate/env/lib/python3.8/site-packages/gunicorn/workers/sync.py", line 179, in handle_request
    respiter = self.wsgi(environ, resp.start_response)
  File "/home/support/LibreTranslate/wsgi.py", line 14, in app
    instance = main()
  File "/home/support/LibreTranslate/app/main.py", line 121, in main
    app = create_app(args)
  File "/home/support/LibreTranslate/app/app.py", line 113, in create_app
    remove_translated_files.setup(get_upload_dir())
  File "/home/support/LibreTranslate/app/remove_translated_files.py", line 23, in setup
    scheduler.start()
  File "/home/support/LibreTranslate/env/lib/python3.8/site-packages/apscheduler/schedulers/background.py", line 38, in start
    self._thread.start()
  File "/usr/lib/python3.8/threading.py", line 852, in start
    _start_new_thread(self._bootstrap, ())
RuntimeError: can't start new thread

ERROR:apscheduler.scheduler:Error submitting job "remove_translated_files (trigger: interval[0:30:00], next run at: 2022-05-12 05:56:34 BST)" to executor "default"
Traceback (most recent call last):
  File "/home/support/LibreTranslate/env/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 978, in _process_jobs
    executor.submit_job(job, run_times)
  File "/home/support/LibreTranslate/env/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/home/support/LibreTranslate/env/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/usr/lib/python3.8/concurrent/futures/thread.py", line 188, in submit
    self._adjust_thread_count()
  File "/usr/lib/python3.8/concurrent/futures/thread.py", line 213, in _adjust_thread_count
    t.start()
  File "/usr/lib/python3.8/threading.py", line 852, in start
    _start_new_thread(self._bootstrap, ())
RuntimeError: can't start new thread

From 100% of requests, around 30% resulted in 200 OK and rest were 500 Internal server error due to no threads being created.

Bellow is my system config:

Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   43 bits physical, 48 bits virtual
CPU(s):                          8
On-line CPU(s) list:             0-7
Thread(s) per core:              1
Core(s) per socket:              1
Socket(s):                       8
NUMA node(s):                    1
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           63
Model name:                      Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz
Stepping:                        2
CPU MHz:                         2294.686
BogoMIPS:                        4589.37
Hypervisor vendor:               VMware
Virtualisation type:             full
L1d cache:                       256 KiB
L1i cache:                       256 KiB
L2 cache:                        2 MiB
L3 cache:                        200 MiB
NUMA node0 CPU(s):               0-7

I have uploaded both access and error log files of gunicorn at this link. Upload files for free - gunicorn_logs.zip - ufile.io

1 Like

Thanks for the detailed bug report, I made an issue in the LibreTranslate-init repo to track this.

I’m not sure exactly how new threads should be garbage collected between Gunicorn and Nginx so don’t know how this is happening.

I’ve been running translate.argosopentech.com using LibreTranslate-init for a few months without a restart and haven’t run into this issue. Is it possible the server is just overloaded and failing in some instances? Also be aware that CTranslate2 creates additional threads when translating which would increase the thread count.