LTEngine - LLM powered local machine translation

You’ll get a noticeable speed-up using a GPU for LTEngine. (In the 10x range, but depends on the card, LLM model, etc.).

2 Likes

Great! It sounds like you got LTEngine compiling on Linux.

2 Likes

Yes, it works just fine. There is some probability that on other Linuxes than Ubuntu Server more than only a minimum of libraries are preinstalled, so it might run directly without investigating what is missing.
I still need to write a systemd unit for starting the servers after boot, and then the installation is finished. Will post that for documentation… :slight_smile:

2 Likes

If you’re a low budget startup and look for an inexpensive infrastructure, i suggest loading LibreTranslate models on CPU (a single core can handle 4 simultaneous requests with almost no delay) and the LLM for LTEngine on the minimal GPU available : a LLM (except 7B and maybe 14B) will not run well on a CPU whatever the amount of RAM you’ve got.

You can rent a GPU on a monthly basis for one (L40S, 90 GB RAM, 46 GB VRAM) or two (H100, 180GB RAM, 80 GB VRAM) thousand euros, and from my experience, a 192GB RAM server is not much cheaper than a monthly thousand euros. They feature enough CPU cores for LibreTranslate to run well until a few thousand regular users.
As of buying, servers with Ada cards are pretty inexpensive (less than 100k for 4 GPUs), I don’t know about Hopper cards yet (awaiting a quote for H200 servers).

If you want to do testing and development, there is also a workstation GPU that replicates the L40S (or current Ada card), it’s the higher end RTX6(7 now?)000 Ada. It’s about 10k€, so its amortization is very fast, and anything you’ll do with it will be consistent with what will happen on an L40S.
However, you cannot use it for prototyping something that will run on a H100 or H200 since the Hopper architecture has different features than the Ada architecture (we hit a snag on this one last month).

3 Likes

Yes, it depends on the traffic volume and demands of translation speed.

I am still thinking about the best/most practical ways to measure throughput and throughput reserves, and calculating/estimating the actual/future CPU/GPU needs.
Word+char count and microseconds time usage plus what else?

Just started with the test runs to make sure all things run smoothly.
There were some little issues with the UTF8 encoding (dunno why I need to utf8::decode() it twice with LibreTranslate, and only once with LTEngine etc). But now it runs smoothly.

And I really need to look into the LTEngine source how to add multiple language output of the calculation result, to have not to do a complete recalculation for each languages combo. Too bad I have no clue of Rust, it’s a bit different…
For those who need multiple language translations this would maybe be a great performance boost as well as energy saving.

Timely answer to requests is the most sensible metric: up to 1.6s, one-thousand chars translations flow on LibreTranslate, above this value, the pipeline clogs and unanswered requests start piling up.

Don’t know the corresponding threshold for LTEngine, but it’s probably in the same range (rapid-firing your typical LLM with one three-thousand-words prompt per second for half an hour is ok on an Ada GPU).

Thanks, everyone, for sharing your experiences. This is really exciting.

Is there a way to get an API key somewhere to test and pay for the larger model translation? Given the significant costs of running this, I would like to do a comparison before investing in it. And like others here, we’re a small non-profit startup, we’d need to apply for a grant to do this at scale.

This all is awesome.

To ease my experimenting I modded LTEngine such that my Perl script can feed it directly the prompt for stuffing Gemma, and directly take the output, without JSON and all that, so there is no need to do the cargo building for testing every variation of the prompt string.

If it is ordered to do translation 1x source → 1x target language, time usage is 100%, compared with translation 1x source → 5 different target languages in one run = 325% time usage.
Less than I expected, but better than full 500% for 5x single language translations.

1 Like

I could not sleep much this night, got up early and continued tailoring the instruction string for Gemma.
Turned out, this “AI” is very dumb when it comes to structured work. It is like normal programming, just that there is no specification of the programming “language”. It has a grasp of some popular things like “html tags”, but if you need more, you need to find a form of natural speech regular expression defining. This is very yucky, as the slightest variation in the instruction “code” can turn sensible output into trash, albeit sometimes very funny trash.

Anyway it now

  • takes a list of languages to translate to, and some other optional parameters soon, like style formal/informal etc, translating or not translating quotes in different languages, etc
  • keeps untranslated not only HTML tags, but also user-specified tags __<somestring>__. (This is just my first idea how to format tags, not finally decided yet. Not sure yet whether it is possible to pass Perl regexps to Gemma, haven’t had success yet in my experiments)
  • puts out a cleanly formatted output that can easily be processed by Perl regexps

I am sure this will end in a super super nice translation API, at least for my exotic needs.

However the actual natural text ¨programming" is very sensitive to even the slightest variations not only in the result, but also in regards to computing time/cost.
I think there is the biggest optimization potential, will have to read and learn a lot…

Thank you so much @pierotofy @argosopentech ! You are my de-facto teachers how to deal with LLMs. That stuff is just exciting :smiley:

2 Likes
Your role: You are an expert linguist, specializing in translation. You are able to capture the nuances of the languages you translate. You pay attention to masculine/feminine/plural, proper use of articles and grammar, bias, attitude, style and tone from informal to formal. You always provide natural sounding translations that fully preserve the meaning of the original text. In doubt you give accuracy preference over elegance. You never provide explanations for your work. You must preserve all HTML tags and elements in the translation. You always answer with the translations ordered, and nothing else.
Your instructions:
At the end of this document, there is a German text, from directly after the line consisting of 40 equal signs (=) to the end of the document.
It begins with a dot, a whitespace and a tag delimited by double underscores (__).
Keep this and all other similar tags unmodified where they are in the original text, and when reproducing the text, print the tags unmodified where they belong at.
Iterate through this comma-separated list of languages enclosed in the brackets here [English, Russian, Spanish, Polish,Turkish,Arabic], doing this list of tasks enclosed in the brackets here [ Print translation ] in each iteration.
========================================
. __123_22__ Die Sonne scheint __38233__, es ist __566__ und ich bin __234534__. __3776__

Output:


"[English]\n. __123_22__ The sun is shining __38233__, it is
__566__ and I am __234534__. __3776__\n\n[Russian]\n. __123_22__ Солнце светит __38233
__, день __566__ и я __234534__. __3776__\n\n[Spanish]\n. __123_22__ El sol brilla __3
8233__, es __566__ y yo __234534__. __3776__\n\n[Polish]\n. __123_22__ Słońce świeci _
_38233__, jest __566__ i ja __234534__. __3776__\n\n[Turkish]\n. __123_22__ Güneş parl
ıyor __38233__, hava __566__ ve ben __234534__. __3776__\n\n[Arabic]\n. __123_22__ الش
مس تشرق __38233__، إنه __566__ وأنا __234534__. __3776__\n"

Arabic is strange…but whatever.
So far, so good
Attempting alphanum instead of num only results in START and END missing. Not yet succeeded there.
Performance is not so great: with 6 target languages it consumes as much time as 5 normal LTEngine calls, thus only 20% time saving.

Without tag handling:
Your instructions:

At the end of this document, there is a German text, from directly after the line consisting of 40 equal signs (=) to the end of the document.
Iterate through this comma-separated list of languages enclosed in the brackets here [English, Russian, Spanish, Polish, Turkish, Arabic], doing this list of tasks enclosed in the brackets here [ Print translation ] in each iteration.
========================================
Die Sonne scheint rot, es ist warm und ich bin zufrieden.

this only costs 220% instead of 600% for 6 calls of LTEngine for a single language each… almost 3x performance boost

I use Vast.ai (my referral code) to rent GPUs hourly for model training. The prices are affordable and it’s worked well for me.

2 Likes

I’ve used vast.ai in the past too, lowest prices.

2 Likes

So the LTEngine is much more than a “bare” Gemma?
They say Gemma is >140 languages, LTEngine about 40… so LTEngine is sort of “trained Gemma”?

Is there a recommended procedure to switch usage to another .gguf, like from default 4B to larger, or from Gemma3 to 4, or to Mistral etc?

Just curious whether I can train LTEngine by for example feed it all the texts and the data I can find about a particular topic?

Sorry for my stupid questions… I am completely new into this, this is so exciting… thank you for your patience with me.

Edit: Maybe you know some good primer reads into that topic how to do such with LLMs via API?

LTEngine enables a subset of languages (but you can enable/test more by adding the language you need to LTEngine/ltengine/src/languages.rs at main · LibreTranslate/LTEngine · GitHub ). It’s a matter of editing the prompt and verifying that the model can actually translate (don’t assume that it can, despite the claims).

1 Like