Training New Language Tutorial (Work in Progress)

jefs42 · May 9, 2022, 3:36am

The info and video here is very helpful, but I needed more help in the details to finally get my first model trained (thanks for the help! )

Also, I’m old. So when I go search for “How do I…” I scroll down to get past the video tutorials looking for actual words.

So I thought I’d write a word-only (some screenshots of course) tutorial based on my learning.

should I just nix it? or may it be helpful?
not complete yet, but any thoughts or suggestions? anything incorrect or needs better description?
links, credits, logos? I do suggest reading the argos-train github and watching the video first and try to link/credit as much as possible.

I was also thinking an “advanced” section. Like iterations (The best way to train models?)

argosopentech · May 9, 2022, 11:03am

I think a good written tutorial could be very helpful!

I’m hoping to add more documentation to Argos Train either as a /docs folder of markdown files or as a readthedocs.org page.

jefs42 · May 9, 2022, 6:38pm

I figured I’d finish this as-is while I do my new Danish training. But you of course can take any/all of it for a central docs location.

Markdown did occur to me while I was posting this - basically like you mean - thinking I could then just make it a repository where others could submit edit pull requests. And found this - https://github.com/thephpleague/commonmark - where I could turn it back into an HTML webpage.

And splitting it up eventually also… Basically step 1 (preparing the argosdata files) and step 3 (the actual training of the model) are the same and it’s just setting up the system/environment that could have different HowTo’s:

the video version renting vast.ai with docker image (currently doing)
own linux system, but load the docker image (no clue)
own linux system semi-manually, not using docker (did do, but on a PC without a CUDA GPU )

I guess actually Step 3 might vary, depending on where Step 2 leaves off…

And then a step 4 of what to do with it all, once you’re done

pierotofy · May 11, 2022, 2:18am

+1 for a written tutorial! I think it would help many others.

jefs42 · May 11, 2022, 4:32am

Thanks. And really… thanks to the community and help here.

The general instructions on Github and the video tutorial got me pretty close, but it was the helpful replies here about various details that got me to my first .argosmodel

So it occurred to me after redoing it a few times, that taking those extra details, that you can find in various threads here and there, and putting them into a more detailed A-Z instructions might be useful to others as it has been to me.

The current vast.ai is mostly done (for review, edits, corrections…)

Just retraining the same data source the other way (I think maybe just removing /run/source and /run/target might be enough…, or everything other than /run/cache to avoid re-downloading) and then what to do with your fancy new argosdata and model afterwards:

I wrote a small python script for myself that installs all .argosmodel files in the current directory to argostranslate which then makes it available for local libretranslate server (note: run as same user libretranslate server is running as, restart server)
then… idk. Post here your new translation argosdata and argosmodel links? Submit to github?

I really need to “borrow” the play PC back from the kids to write non-Vast.ai instructions. Work PC is fine - for work - it just doesn’t have the GPU requirements to finish the job.

Further down the line… the argos-train-init expects/assumes apt based system. Perhaps instructions and/or alternative init scripts? Like a lot of C builds have *nix, mac, win instructions and/or build files. So like a argos-train-rpm-init and some other recent Linux dist I tried out had something different… wasn’t apt or rpm… Forget. A bit more raw like Debian vs Ubuntu…