Performance/benchmark data?

Hello,

I was wondering if anyone has any performance or benchmark information available, not necessarily for specific hardware but for example how many 1 paragraph requests can a server of a certain size (assuming no GPU) handle in a reasonable amount of time (5-6 seconds or less)?

I guess the kind of information I am looking for is what kind of performance can you expect from systems similar to the following if anyone happens to know?

2x Xeon e5-2660v3 with 256GB DDR4?

These servers can be had for around 500-600 USD on Ebay that is why I went with this example. But this is a pretty specific ask but honestly I would be happy to see any kind of data for running on a physical box and not in a container/pod.

1 Like

CTranslate2 publishes some benchmarks (Argos Translate uses int8 quantization) you can look at. The CTranslate2 benchmarks wouldn’t include the time to run Stanza sentence boundary detection part of Argos Translate or the LibreTranslate application though so LibreTranslate will probably be ~2x slower than the CTranslate2 benchmarks.

I don’t think we’ve done many for LibreTranslate end to end. As a heuristic I would estimate LibreTranslate does ~3 sentences/second on medium end CPUs and 15-20 sentences/second on high end CPUs.

Adding automation for benchmarking could be a good feature. If we have a standard script for benchmarking LibreTranslate instances we could have people submit data on specific hardware to publish.

1 Like

If this were to become a standard feature or a script could be written for this then I could run it in various environments to give us all an idea or at least a starting point to reference. To make this work you would probably have to disable caching (if caching is even a thing which it may not be?).

1 Like

Here’s a basic benchmarking script:

import time

from libretranslatepy import LibreTranslateAPI

lt = LibreTranslateAPI("https://translate.argosopentech.com/")


def f():
    return lt.translate("LibreTranslate is awesome!", "en", "es")


def timed_f():
    start = time.time()
    f()
    end = time.time()
    return end - start


print(f())

num_trials = 10
trials = list()
for i in range(num_trials):
    time_in_seconds = timed_f()
    trials.append(time_in_seconds)
    print(time_in_seconds)

print("\nAverage time: " + str(sum(trials) / len(trials)) + " seconds")

Example output:

LibreTranslate es impresionante!
0.18764233589172363
0.18553662300109863
0.16850900650024414
0.18801355361938477
0.17693042755126953
0.19587469100952148
0.1862797737121582
0.18496179580688477
0.21926236152648926
0.18032336235046387

Average time: 0.18733339309692382 seconds
1 Like

LibreTranslate can use parallel CPU cores pretty well so you could also test how many requests it can handle at once. Just be careful not to overload other people’s servers.

1 Like

Yea knowing how many requests per second for a few different hosts would probably be a lot more useful because if you are only running 1 request at a time the result should always be close to the same assuming you aren’t running on a truly ancient machine. Is there any chance this could be modified for that? Maybe something like an argument that lets you say how many requests per second. My python isn’t super strong otherwise I would modify it.

I intend to run it on my own resources not public APIs so finding that point where the server starts to lag terribly or crashes is totally acceptable.

1 Like

Requests per second would be:
len(trials) / sum(trials)