Adding a non supported language

argosopentech · May 10, 2022, 10:44pm

“Killed” normally means you ran out of RAM, you can try adding swap space.

The max data size config is to prevent this problem since the data is loaded into memory during training. I’ve had good results just excluding the largest datasets since they’re likely lower quality and cause problems.

I did some experiments with multiple servers to use CCMatrix and other large datasets preprocessed but ended up with worse results. Sometimes the max data size can exclude OpenSubtitles though which is a very high quality dataset in my experience.

jefs42 · May 19, 2022, 3:04am

I get this sometimes:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc4 in position 67: invalid continuation byte

Is there a problem in the data from OPUS directly? Or a download from OPUS, adjust, make zip, change name, upload to place A, training then downloads to place B… and the data is somehow getting slightly mangled in between those various transfers?

argosopentech · May 19, 2022, 12:13pm

I haven’t seen that before, my best guess is a bad character encoding somewhere in the data.

jefs42 · May 20, 2022, 1:26am

baby character encoding? Not familiar with that term

Got this with a few data sources from OPUS for Romanian. Want to say CCAligned and OpenSubtitles. But also with CCMatrix for Norwegian.

Not sure who’s saying it - argos-train, sentencepiece, cTranslate2, etc…

And I know download a zip could get slightly mangled but still extractable (I do usually do test zip). Then extract, change some things, re-zip, upload. Then train downloads it, extracts…

I assume it’s GIGO of some sort. Just don’t know if it starts with the OPUS data itself, or any of the steps between download OPUS and running train. Going to try redoing Romanian from scratch, maybe just do one OPUS source data train, then add a second if that works, then add a third, etc.

argosopentech · May 20, 2022, 11:35am

Woops, *bad character encoding.

ecx0d · January 1, 2023, 6:30pm

I would like to knit a model pair of en-ro and ro-en, but since vast.ai is not free, I wanted to ask how many hours you estimate it can take.

argosopentech · January 1, 2023, 11:34pm

It takes around 8 hours to train a model on a RTX 3090.

Viji_Nair · February 24, 2025, 7:11am

Actually i have created a en-mr.argosdata and mr-en.argosdata zip file which consists of source.txt, target.txt and metadata.json file… but when i try to convert the en-mr.argosdata file into en-mr.argosmodel using argos-train, it converts but the file is not available in any directory… so please help me regarding this

christopherpickering · June 10, 2025, 6:24pm

I’m spending like 20 hrs on 2x rtx3090 (no nvlink), my config must not be right

"gpu_ranks": [0,1],
    "world_size": 2,
    "batch_size": 2000,
    "accum_count": 27,
    "warmup_steps": 8000,
    "vocab_size": 32000,
    "train_steps": 70000,
    "avg_checkpoints": 3,
    "src_seq_length": 200,
    "tgt_seq_length": 200,
    "enc_layers": 18,
    "dec_layers": 6,
    "heads": 8,
    "hidden_size": 512,
    "word_vec_size": 512,
    "transformer_ff": 4096,
    "valid_steps": 2500,
    "save_checkpoint_steps": 1000,
    "num_worker": 32,
    "valid_batch_size": 128,
    "bucket_size": 32768,
    "decay_method": "noam",
    "early_stopping": 0,
    "dropout": 0.3,
    "learning_rate": 2.3,
    "max_grad_norm": 1.0,
    "model_dtype": "fp32"

argosopentech · June 11, 2025, 2:01am

I’ve had it take the better part of two days to train a model on RTX 3090s from vast.ai. 4090s are a decent bit faster. I typically only use single GPUs, I don’t know how the double GPUs will work.

You can reduce train_steps as low as 10k and still get decent results which can be useful for experimentation and fast iteration. For production models I recommend train_steps of at least 50k.

NicoLe · June 21, 2025, 8:20am

You may use multigpu on openNMT-py, only thing is you will have to modify the scripts to

declare the environment variables (if you use Locomotive, put it after the imports in train.py)
add 2 lines for the “world_size” and “gpu_ranks” params around line 350 of train.py (that is, if you use Locomotive).