We can now run Gemma3 models locally

pierotofy · April 20, 2025, 10:18pm

Google just announced Gemma 3 QAT Models: Bringing state-of-the-Art AI to consumer GPUs - Google Developers Blog

I’ve tested the Gemma 3 27B model on translation tasks from English <=> Italian on my local machine and… it’s quite impressive!

I’ve used this prompt in my tests:

translate from English to Italian, only output the best translation, no explanations, capturing the nuances of the sentence, paying attention to masculine/feminine/plural, proper use of articles and grammar.

<sentence>

kernschmelze · August 18, 2025, 5:05pm

This is awesome.
Gemma managed to translate my UI templates so well that it even ordered the template words well, depending where they belong.

I have read somewhere that Gemma2 is in some cases even better. All 27B for text, no images, audio and video. This could compensate to some degree its lower technical abilities.
Definitely need to play around with these machines these days.

One of the things I want to look into is the performance when choosing multiple target languages concurrently.
The tokenization and the inference is language-independent iiuc, so outputting the result in one go for multiple languages could boost the performance a lot.
So adding one more output language will maybe, say, take 20,30% more work time for each added language.

The DeepL API accepts not only a single target language. You can pass a comma-separated list of target languages also.

I think if such indeed boosts performance/energy efficiency, this could be a great thing to add to LTEngine (and maybe LibreTranslate too).
So there is a lot to play with when winter comes