We are looking for all Indian Languages,
Kannada, Telugu, Tamil, Malayalam, Marathi, Gujarati, Odia, Kashmiri etc.
Thanks and Regards,
Srikanth
We are looking for all Indian Languages,
Kannada, Telugu, Tamil, Malayalam, Marathi, Gujarati, Odia, Kashmiri etc.
Thanks and Regards,
Srikanth
I advise you to train your own models using Locomotive. You may find sources on Github, and training data on opus (Iāll be updating the language codes soon, they are listed in data.py).
Hey @NicoLe, Iām interested in giving the model training a try. The Locomotive guide was pretty straightforward, and I tried converting the OPUS models. They gave me scores of less than 20, but their README was listed near 90s?
python opus_mt_convert.py -s en -t ta
downloaded from OPUS-MT-models/en-ta/opus-2019-12-04.zip
python eval.py --config run/en_ta-opus_1.9/config.json
BLEU score: 3.39328
Also find it a bit weird that it kept doing a BLEU evaluation even though I didnāt request it. Wasnāt able to trial run it.
Any pointers on how to proceed? Should I try training from scratch?
BLEU is the default metric when using the eval script: it will always yield a BLEU whatsoever.
Your post is not completely clear about:
I would run two evals with arguments --comet and --flores_dataset dev or devtest to get: 1. COMET scores (above 0.8 is generally not bad, under 0.7 is real bad) 2. two sets of scores to see whether the model is really out of whack or if itās an accident on the default evaluation dataset.
As of training⦠first, you should identifiy good sources,
Over the course of the last two years, Iāve devised extra data processing which is not publicized, and quite time-consuming (at least as long as training) to get professionally useful quality, but if you get your sources right you shouldnāt come too far from it.
Then as a function of your GPUās ability, choose an architecture for your model. Vanilla does not need much VRAM, but it does not yield very good models. The architectures I train need 25 to 40GB of VRAM to train so a gamerās GPU may not be enough. If you have this oomph within your equipment, Iāll tell you what parameters work without too much tweaking.