LTEngine - LLM powered local machine translation

Hey all :waving_hand:

I’m announcing a new project I’ve been working on called LTEngine.

It aims to offer a simple and easy way for running local and offline machine translation via large language models while offering a LibreTranslate API compatible server. It’s currently powered by LLAMA.cpp (via Rust bindings) running a variety of quantized Gemma3 models. The largest model (gemma3-27b) can fit on a consumer RTX 3090 with 24G of VRAM whereas smaller models can still run at decent speeds on a CPU only.

The software compiles to a single, cross-compatible, statically linked binary which includes everything. I’ve currently tested the software on macOS and Windows, will probably check Linux off the list in the upcoming days.

To build and run:

Requirements:

  • Rust
  • clang
  • CMake
  • A C++ compiler (g++, MSVC) for building the llama.cpp bindings

Steps:

git clone https://github.com/LibreTranslate/LTEngine --recursive
cd LTEngine
cargo build [--features cuda,vulkan,metal] --release
./target/release/ltengine

In my preliminary testing for English <=> Italian (which I can evaluate as a native speaker), the 12B and 27B models perform just as good or outperform DeepL on a variety of inputs, but obviously this is not conclusive and I’m releasing this first version early to encourage early feedback and testing.

The main drawback of this project compared to the current implementation of LibreTranslate is speed and memory usage. Since the models are much larger compared to the lightweight transformer models of argos-translate, inference time takes a while and memory requirements are much higher. I don’t think this will replace LibreTranslate, but rather offer a tradeoff between speed and quality. I think it will mostly be deployed in local, closed environments rather than being offered publicly on internet-facing servers.

The project uses the Gemma3 family of LLM models, but people can experiment with other language models like Llama or Qwen, so long as they work with llama.cpp they will work with LTEngine.

Look forward to hear your feedback :pray:

1 Like

I wish more software was just a statically linked executable that you can run.

2 Likes

Hi, I installed LT Engine and am testing it. It’s really cool. Please tell me if format: “html” in API will work?

Currently not. It’s on the TODO list.

1 Like

I’ve been testing LTEngine for a couple of days now and translating from English to Ukrainian. Gemma3-12b gives a good translation, but worse than Deepl. Gemma3-27b (I tested it online, because my video card is too small for this model) is on par with Deepl, and sometimes better. How do you achieve maximum speed to translate thousands of texts per day? Should I install a new 3090 24gb or 4090 24gb video card?

2 Likes

24GB will let you run Gemma3-27b locally. I think the biggest bottleneck is currently the mutex lock set here LTEngine/ltengine/src/llm.rs at main · LibreTranslate/LTEngine · GitHub, I haven’t had time to dig into the llama implementation to understand why parallel contexts cannot be run simultaneously, but I would think that removing the lock would allow us to run more translations / unit of time.

1 Like

That would be great. You are creating the best translator ever. Do you have a donation page for the project?

2 Likes

You can support the project financially by getting yourself an API key at https://portal.libretranslate.com and/or support upstream libraries like argos-translate by visiting Sponsor @argosopentech on GitHub Sponsors · GitHub

2 Likes