Help Wanted: Estonian model for Argos Translate

I’m glad to introduce translation models to the community.
The models have a Transformer DEEP architecture with 159M parameters, which is 30% smaller than the Transformer BIG and almost 2 times faster in inference (according to my observations, comparable to the Transformer Base, as the decoder part has comparable dimensions).

EN_ET

Model BLEU COMET-22 Note
LT1.0 29,1 0,9017
GoogleTranslate 31 0,918
eng-est/opus…2022-03-13 28,30 -
facebook/m2m100_1.2B 24,5 -

ET_EN

Model BLEU COMET-22 Note
LT1.0 38.7 0,8903
GoogleTranslate 42,4 0,8997
est-eng/opus…2022-03-09 38,6 -
facebook/nllb-200-3.3B 35,7 -

To train the models, the back translation method was also used:

  • All Estonian Wikipedia and news for 2 years.
  • About half of Wikipedia is in English, news for one year and commentary on the news.

In total there are about 35M pairs of back translation sentences.

To be honest, both models, although they slowed down, continued to train and improve metrics, but they exhausted the GPU time limit.

Models:

translate-en_et-1_0.argosmodel

translate-et_en-1_0.argosmodel

1 Like