Gemma4 Released

Support for 140 languages
Create multilingual experiences that go beyond translation and understand cultural context.

1 Like

When it comes to translations, has anyone done a comparison of how well the various weights of the new Gemma4 variant stack up against the previous top performer, translategemma:27b? It would be good to see a real breakdown of the difference, both in terms of accuracy and tokens/second speed, both straight and at all of the various cache quantization levels under Flash Attention.

Haven’t tried it yet, but would love to know as well.

This is a slightly edited internal note, Gemma4 does not essentially differ from the latest TranslateGemma. I’ll post a dictionary of languages that can be fed to an agent ou to a user later.

Introduction

This report evaluates TranslateGemma, a translation tool optimized for 55 languages, by comparing its performance with that of previous models like madlad and specialized models. The aim is to determine whether TranslateGemma adequately meets the linguistic needs of our corporate network.

Architecture and Performance

TranslateGemma offers a powerful architecture capable of translating 64 segments of 32,000 tokens simultaneously, although load tests reveal practical limitations. For example, the model can handle between 200 and 350 pages in the queue, with processing times varying depending on the format and language. Empty translations can occur when the queue exceeds 200 pages.

Qualitative Metrics

The performance of TranslateGemma’s machine translation models is exceptional, approaching perfection for 7 languages. However, some languages, such as Armenian, exhibit anomalies, including unwanted insertions and repetitions. While specialized models still outperform TranslateGemma for certain European language pairs and at the other end of the spectrum, TranslateGemma excels in its consistency over longer texts and its ease of maintenance.

Limitations and Anomalies:

Misalignments and fabrications are observed in some languages, particularly African and Asian languages. Adding context partially improves these issues, but some translations remain unusable. The phenomenon of “reward hacking” explains why the metrics can remain high whereas the translations’ quality withers.

Evaluation and Methodology:

A direct evaluation was conducted using metrics such as BLEU and COMET, as well as a Large Language Model (Gemini) to assess the translations. The results indicate that 52 languages are suitable for speedy review, while more can yield editable translations. Certain languages, such as Kazakh and Yoruba, are recommended for exclusion from the interface.

Improvements and Recommendations

The report suggests adapting the interface to actual user needs by removing languages that are unusable and adding those that can be implemented after review. It also recommends grouping languages in a geo-economically coherent manner and limiting exposure to languages that cannot be revised, in order to avoid user frustration.

Conclusion

TranslateGemma represents a significant advancement, but its use should be adapted to practical needs. It is recommended to limit the agent API to actionable languages, while the web interface can expand the scope to benefit from user feedback.

1 Like

Ah yes, companies often make bold claims. But that’s why it’s good to verify. Thanks for the in-depth analysis!