Help Wanted: Improve en-de translation

Trained on Locomotive and tested two models, version 1.0.6 same parameters as DEEP ru_en/en_ru models (vocab 32000, feed_forward 4096, encoder layers 20, batch size 8192, accu 25, each model is 173MB). 20k train_steps only, but progression is smooth and DE-EN learns very little in the last 10k steps.
Some learning margin on EN_DE though.

Sources (opus)
Alle ELRC-German_Foreign_Offic, CCMatrix (top20%); Open Subtitles (top 70% for weight); EuroPat (top 70% for weight); DGT; EuroParl; EUbookshop

de_en : ppl 9.1881 BLEU 60.64265 (eval.py)
en_de : ppl 10.2332 BLEU 46.43329

If you need a version with only 2 digits (1.1), I am currently trying to improve on this using the excerpt filter on CCMatrix.

3 Likes

Don’t worry about the version number, I can change it to “1.9” once you have your final model trained.

The models look good! Here’s some text I ran through them:

English Source Text

In the preface to my translation of the “Iliad” I have given my views as to the main principles by which a translator should be guided, and need not repeat them here, beyond pointing out that the initial liberty of translating poetry into prose involves the continual taking of more or less liberty throughout the translation; for much that is right in poetry is wrong in prose, and the exigencies of readable prose are the first things to be considered in a prose translation. That the reader, however, may see how far I have departed from strict construe, I will print here Messrs. Butcher and Lang’s translation of the sixty lines or so of the “Odyssey.” Their translation runs:

- Butler Translation Preface Homer’s Odyssey Project Gutenberg

German Translation (1.0.6)

Im Vorwort zu meiner Übersetzung der “Ilias” habe ich meine Ansichten zu den Hauptprinzipien gegeben, nach denen ein Übersetzer geführt werden sollte, und sie müssen sie hier nicht wiederholen, außer darauf hinzuweisen, dass die anfängliche Freiheit, Poesie in Prosa zu übersetzen, das ständige Nehmen von mehr oder weniger Freiheit während der Übersetzung beinhaltet; denn vieles, was in Poesie richtig ist, ist in Prosa falsch, und die Notwendigkeiten lesbarer Prosa sind die ersten Dinge, die in einer Prosaübersetzung berücksichtigt werden. Daß der Leser jedoch sehen mag, wie weit ich von der strengen Auslegung abgewichen bin, werde ich hier die Herren drucken. Butcher und Langs Übersetzung der sechzig Zeilen oder so der “Odyssee”. Ihre Übersetzung läuft:

English Back Translation (1.0.6)

In the preface to my translation of the “Iliad”, I have given my views on the main principles by which a translator should be guided, and they do not have to repeat them here, except to point out that the initial freedom to translate poetry into prose involves the constant taking of more or less freedom during translation; For much of what is right in poetry is wrong in prose, and the necessities of legible prose are the first things to be considered in a prose translation. However, that the reader may see how far I have departed from the strict interpretation, I will print the gentlemen here. Butcher and Lang’s translation of the sixty lines or so the “Odyssey”. Your translation is ongoing:

They are pretty good, but if I apply all the tricks from my discussions with lynxpda, they could be even better so I am giving it a try.

2 Likes

Two weeks later, I haven’t succeeded in improving the model. I spent the last three days running trains from the raw dataset, and realized there is a pretty faire amount of entropy involved in the final result.
Right now, I am still trying to determine which parameters would discriminate in a few hours a dataset that won’t yield a good model from one that will.
I’ll publish a post next week with what I found out.

3 Likes