The model is trained on opus documents using opus-nlp-downloader.
bible-uedin, CCAligned, GNOME, NeuLab-TedTalks, OpenSubtitles, QED, TED2020, wikimedia, XLEnt
This is my first attempt at creating a translation model and it is able to translate but has some issues with large texts. I am submitting because there is currently no Thai translations in argos translate. I would like to create a Thai translation model with higher accuracy and would greatly appreciate any tip or links to documentation that could help me.
I can also provide the argosdata files if these are useful. I also ran this on my own Windows machine using WSL and Docker, I have documented the steps for this in Obsidian (Markdown) and can provide these if they are useful.
If you train a th->en model I can add this to Argos Translate. Also please publish the argosdata files and I can add them to the data index.
For improving the quality the best thing to do is to find more high quality data. It looks like you used most of the data from Opus which is what I generally do. If you find Thai specific data sources that could also help.
If you have good documentation please publish that too. You could post it in this thread or somewhere else.
I have trained the other model, I have noticed that it ran out of memory at around 9 hours and I suspect this may have happened to the previous one. It translates sufficiently but with some issues like with the previous. I will try rerunning when I have time.
I have added the files at the following link. argostranslatedata.zip contains .argosdata files, argostranslatedocs.zip contains markdown documentation (Obsidian style format) and argostranslatemodel.zip contains both models.
The documentation looks good too. This forum supports Markdown and images so you could make a post or series of posts here with the docs so that they’re accessible to people.