LibreTranslate under load

Currently trying out libretranslate for real time translations. We’ve built an image with 4 language models pre-installed and deployed the following setup:

2 data centers * 32 pods (16cpu/32ram), 64 pods total

We then started a load test with the following setup: 0 to 110 rps in 20 minutes, then 110 rps for 5 hours.
We go the required texts from real data: html text up to 15kb (avg 2-3kb).

The results:

  • On 110 rps, the response time was 4-5seconds on .99 percentile.
  • The in-fight translations went above 200.
  • The logs started flooding with warnings like "
    WARNING:waitress.queue:Task queue depth is 10"
  • Detected a possible memory leak. After about 4 hours of load, the pods were restarted by OOM because the memory usage went above the provided 32GB.

Did anyone do something similar? Or maybe someone has a real world example of libretranslate under load?

1 Like

In production I’d recommend to set up the server using gunicorn, which restarts the process once in a while to avoid memory leaks. And yes there’s a memory leak somewhere, so if you don’t restart the process once in a while it will eventually OOM.

What are you planning to use LibreTranslate for (and who’s we?) Just curious.