Odd translation behavior repeating words

argosopentech · December 9, 2023, 7:18pm

I would use essentially the same data sources. I try to use as many different data sources from Opus as possible but exclude some of the smaller ones because they’re not worth the effort. I normally exclude CCMatrix if there are other better options available because it’s very large and of mediocre quality.

One more model I would like to improve is the English to Polish one, but it seems to be at version 1.9 and that would be good enough?

The 1.9 models are from Opus-MT and work very well in my tests. I would recommend focusing on other languages but if you can make a model better than the Opus-MT ones that’s awesome.