LibreTranslate under load

skamenetskiy · January 18, 2023, 9:23am

Currently trying out libretranslate for real time translations. We’ve built an image with 4 language models pre-installed and deployed the following setup:

2 data centers * 32 pods (16cpu/32ram), 64 pods total

We then started a load test with the following setup: 0 to 110 rps in 20 minutes, then 110 rps for 5 hours.
We go the required texts from real data: html text up to 15kb (avg 2-3kb).

The results:

On 110 rps, the response time was 4-5seconds on .99 percentile.
The in-fight translations went above 200.
The logs started flooding with warnings like "
WARNING:waitress.queue:Task queue depth is 10"
Detected a possible memory leak. After about 4 hours of load, the pods were restarted by OOM because the memory usage went above the provided 32GB.

Did anyone do something similar? Or maybe someone has a real world example of libretranslate under load?

pierotofy · January 19, 2023, 4:41am

In production I’d recommend to set up the server using gunicorn, which restarts the process once in a while to avoid memory leaks. And yes there’s a memory leak somewhere, so if you don’t restart the process once in a while it will eventually OOM.

What are you planning to use LibreTranslate for (and who’s we?) Just curious.