Yep, should be fixed. I want to do some more automation for generating the index JSON so I don’t keep having typos.
isn’t there a problem with the Czech in the json?
Czech → English = cs
English → Czech = zh
w00t! Awesome. Love seeing more languages getting added.
Trained with CCAligned, Paracrawl, Europarl and WikiMatrix. Getting about ready for a retrain including CCMatrix…
I haven’t completely figured out this IPFS yet…
Great, once you’re done with Danish I can add it to the package index. No need to figure out ipfs, I can do it. I’ve been running ipfs add my-pkg.argosmodel
to generate IPFS links but they’re mostly not used currently.
Well… I’m not sure what “done” means
I ditched CCMatrix, did add OpenSubtitles instead. Trained to 30000
Installed locally at https://translate.fortytwo-it.com/ and the translations do seem a lot better. Are there plus/minus votes (obviously there’s suggest
)
For the time being I have the Danish data files and training JSON here, but I can/will move the .argosdata
files to the main /argosdata/ URL for permanent.
The new Danish models are here
[This may also be useful for the Section IV of my tutorial… what to do with them now! :D)
(I can probably ipfs add, but I was trying to get current files to help share, but it just sat there, nothing downloaded anywhere)
I added the Danish .argosmodel packages to the package index and the .argosdata packages to the Argos Train data index.
Thanks for training these and helping document the process!
Of course.
Could you adjust the argosdata links to just Index of /argosdata ? I stuck them in the /training/ sub-directory to keep them separate for, well, training
I’ll leave them in both locations for the time being and just make a /training_no/ for my Norwegian next project.
What are the validation rules for a new language? I’m finishing the Brazilian Portuguese, Portuguese from Portugal is very different - after several days organizing the data sources it is almost complete, I’m working on validating the translations now, as soon as it’s complete I could help with how to send/share
There aren’t any specific validation rules. If you have a trained model or data you want to submit you can make a post or pull request.
Currently for Portuguese there are already trained models but no data packages. To replace the current models we would want a a demonstration that the new models are an improvement over the existing ones.
I would like to know if you have the argos-train repository with tensorflow ? or training checkpoints in google drive can be used in opennmt-py ? after a lot of research, I was in doubt because it is used in tensorflow
I found them here: Checkpoint Exports – Google Drive
After searching a lot I found the command
ct2-opennmt-tf-converter
and worked fine