Language Support

Great, once you’re done with Danish I can add it to the package index. No need to figure out ipfs, I can do it. I’ve been running ipfs add my-pkg.argosmodel to generate IPFS links but they’re mostly not used currently.

jefs42 · May 13, 2022, 12:10am

Well… I’m not sure what “done” means
I ditched CCMatrix, did add OpenSubtitles instead. Trained to 30000

Installed locally at https://translate.fortytwo-it.com/ and the translations do seem a lot better. Are there plus/minus votes (obviously there’s suggest )

For the time being I have the Danish data files and training JSON here, but I can/will move the .argosdata files to the main /argosdata/ URL for permanent.

The new Danish models are here

[This may also be useful for the Section IV of my tutorial… what to do with them now! :D)

(I can probably ipfs add, but I was trying to get current files to help share, but it just sat there, nothing downloaded anywhere)

argosopentech · May 13, 2022, 12:09pm

I added the Danish .argosmodel packages to the package index and the .argosdata packages to the Argos Train data index.

Thanks for training these and helping document the process!

jefs42 · May 13, 2022, 3:52pm

Of course.
Could you adjust the argosdata links to just Index of /argosdata ? I stuck them in the /training/ sub-directory to keep them separate for, well, training

I’ll leave them in both locations for the time being and just make a /training_no/ for my Norwegian next project.

tallesairan · June 10, 2022, 11:39pm

What are the validation rules for a new language? I’m finishing the Brazilian Portuguese, Portuguese from Portugal is very different - after several days organizing the data sources it is almost complete, I’m working on validating the translations now, as soon as it’s complete I could help with how to send/share

argosopentech · June 10, 2022, 11:45pm

There aren’t any specific validation rules. If you have a trained model or data you want to submit you can make a post or pull request.

Currently for Portuguese there are already trained models but no data packages. To replace the current models we would want a a demonstration that the new models are an improvement over the existing ones.

tallesairan · June 13, 2022, 4:25pm

I would like to know if you have the argos-train repository with tensorflow ? or training checkpoints in google drive can be used in opennmt-py ? after a lot of research, I was in doubt because it is used in tensorflow
I found them here: Checkpoint Exports – Google Drive

tallesairan · June 13, 2022, 9:53pm

After searching a lot I found the command

ct2-opennmt-tf-converter

and worked fine