New Argos model en_ky

Hello,

Here are models I trained on en > ky and ky > en.

A few notes on them: the opus data would not train over ~45% accurate, and we checked some data in there, and it was not good… so we took the sentences (over 20M) and retranslated them through facebooks nllb data. NSFW content was filtered out.

The en > ky model training came out with accuracy 81.1 and ppl of 7.4, the bleu accuracy was 75.6, but translations came out pretty decent. It does well on format text, but not so well on short conversational text. However, it does not hallucinate and generally give a good output.

The ky > en model training came out with accuracy of 70 and ppl of 13.1, but was 65.6. It was just a reverse training, but the stanza model didn’t exist so I substituted russian. Maybe there would have been a better choice.

Here’s a link to the models: GitHub - christopherpickering/argos_models

2 Likes

This is awesome @christopherpickering , thanks for sharing. :folded_hands:

I’ve bumped your account’s trust level, in the future you should be able to post links.

1 Like

The French invasion of Malta (Maltese: Invażjoni Franċiża ta’ Malta , French: Débarquement Français à Malte ) was the successful invasion of the islands of Malta and Gozo, then ruled by the Order of St. John, by the French First Republic led by Napoleon Bonaparte in June 1798 as part of the Mediterranean campaign of the French Revolutionary Wars.

Франциянын Мальтага кол салуусу (Мальтезия: Invażjoni Franqueiża ta’ Malta, французча: Débarquement Français à Malte) Мальта жана Гозо аралдарына ийгиликтүү кол салуу болгон, андан кийин 1798-жылдын июнь айында Наполеон Бонапарт жетектеген Франциянын Биринчи Республикасы тарабынан Франциянын революциялык согуштарынын Жер Ортолук деңиз кампаниясынын алкагында башкарылган.

Backtranslation:

Couldn’t run the backtranslation as-is because there’s an issue with the stanza model configuration for ky => en :

  File "/Users/pt/Documents/LibreTranslate/venv/lib/python3.9/site-packages/stanza/pipeline/core.py", line 85, in __init__
    if len(self.load_list) == 0: raise Exception('No processor to load. Please check if your language or package is correctly set.')
Exception: No processor to load. Please check if your language or package is correctly set.

After changing the “ru” key in stanza/resources.json to “ky” and renaming the folder from “ru” to “ky” in stanza/ the model runs fine:

The French invasion of Malta (Maltesia: Invażjoni Franqueiża ta’ Malta, French: Débarquement Français à Malte) was a successful attack on Malta and the Gozo Islands, followed by the First Republic of France led by Napoleon Bonaparte in June 1798 under the Mediterranean Campaign of French Revolutionary Wars.

1 Like

thanks! I updated w/ the link.

thanks for catching that, I’m pushing that update to github.

2 Likes

It looks like ky was added to the lang list, was it this model? It has a mapping for ever lang to ky but they error when testing, should this map only be en>ky and ky>en, not all lang codes?
https://libretranslate.com/languages Or is that standard procedure?

Yes it’s this model. LT will automatically pivot via English if a direct language to language model is not available, e.g. ky => it becomes ky => en => it.

What error are you getting and how are you triggering it?

1 Like

oh, very nice!

It’s translating en>ky no problem (I copy the top 3 paragraphs from Kyrgyzstan - Wikipedia) but if I click reverse, then I get this error:

I enabled debug and then remembered my local model had the wrong stanza :smiley:

I added the correctd model and it works great.

I’m running the latest docker image (just repulled) and then venv/bin/python scripts/install_models.py , but the ky model doesn’t download.

Sorry for the beginner questions, I’m new to the project :slight_smile:

Until the model is published on the argospm index (GitHub - argosopentech/argospm-index: Argos Translate package index) people will need to install it manually. No worries, it’s a good question!

1 Like

I’m going to try to upload these packages to the Argos Translate package index soon. I’m seeing this issue with the ky->en model on my local Argos Translate installation:

  checkpoint = torch.load(filename, lambda storage, loc: storage)
Traceback (most recent call last):
  File "/home/pj/Downloads/env/lib/python3.10/site-packages/argostranslategui/gui.py", line 396, in load_languages
    self.languages = translate.load_installed_languages()
  File "/home/pj/Downloads/env/lib/python3.10/site-packages/argostranslate/translate.py", line 636, in load_installed_languages
    return get_installed_languages()
  File "/home/pj/Downloads/env/lib/python3.10/site-packages/argostranslate/translate.py", line 521, in get_installed_languages
    packages = package.get_installed_packages()
  File "/home/pj/Downloads/env/lib/python3.10/site-packages/argostranslate/package.py", line 327, in get_installed_packages
    to_return.append(Package(path))
  File "/home/pj/Downloads/env/lib/python3.10/site-packages/argostranslate/package.py", line 193, in __init__
    raise FileNotFoundError(
FileNotFoundError: Error opening package at /home/pj/.local/share/argos-translate/packages/__MACOSX/metadata.json no metadata.json
Aborted (core dumped)

I think Argos Translate automatically looks for metadata.json in every directory in .local/share/argos-translate/packages so installing this package with __MACOSX zipped in the argosmodel breaks things. This should be a pretty easy fix by deleting __MACOSX and rezipping.

Maybe I should also look at an Argos Translate source change to be more tolerant of this and not crash.

I deleted __MACOSX and rezipped and it works

1 Like

I just published the Kyrgyz models!

2 Likes

Cool! Thank you! I’ll try and watch for that file next time.

2 Likes