A few notes on them: the opus data would not train over ~45% accurate, and we checked some data in there, and it was not good… so we took the sentences (over 20M) and retranslated them through facebooks nllb data. NSFW content was filtered out.
The en > ky model training came out with accuracy 81.1 and ppl of 7.4, the bleu accuracy was 75.6, but translations came out pretty decent. It does well on format text, but not so well on short conversational text. However, it does not hallucinate and generally give a good output.
The ky > en model training came out with accuracy of 70 and ppl of 13.1, but was 65.6. It was just a reverse training, but the stanza model didn’t exist so I substituted russian. Maybe there would have been a better choice.
Франциянын Мальтага кол салуусу (Мальтезия: Invażjoni Franqueiża ta’ Malta, французча: Débarquement Français à Malte) Мальта жана Гозо аралдарына ийгиликтүү кол салуу болгон, андан кийин 1798-жылдын июнь айында Наполеон Бонапарт жетектеген Франциянын Биринчи Республикасы тарабынан Франциянын революциялык согуштарынын Жер Ортолук деңиз кампаниясынын алкагында башкарылган.
Backtranslation:
Couldn’t run the backtranslation as-is because there’s an issue with the stanza model configuration for ky => en :
File "/Users/pt/Documents/LibreTranslate/venv/lib/python3.9/site-packages/stanza/pipeline/core.py", line 85, in __init__
if len(self.load_list) == 0: raise Exception('No processor to load. Please check if your language or package is correctly set.')
Exception: No processor to load. Please check if your language or package is correctly set.
After changing the “ru” key in stanza/resources.json to “ky” and renaming the folder from “ru” to “ky” in stanza/ the model runs fine:
The French invasion of Malta (Maltesia: Invażjoni Franqueiża ta’ Malta, French: Débarquement Français à Malte) was a successful attack on Malta and the Gozo Islands, followed by the First Republic of France led by Napoleon Bonaparte in June 1798 under the Mediterranean Campaign of French Revolutionary Wars.
It looks like ky was added to the lang list, was it this model? It has a mapping for ever lang to ky but they error when testing, should this map only be en>ky and ky>en, not all lang codes? https://libretranslate.com/languages Or is that standard procedure?
Yes it’s this model. LT will automatically pivot via English if a direct language to language model is not available, e.g. ky => it becomes ky => en => it.
What error are you getting and how are you triggering it?
I’m going to try to upload these packages to the Argos Translate package index soon. I’m seeing this issue with the ky->en model on my local Argos Translate installation:
checkpoint = torch.load(filename, lambda storage, loc: storage)
Traceback (most recent call last):
File "/home/pj/Downloads/env/lib/python3.10/site-packages/argostranslategui/gui.py", line 396, in load_languages
self.languages = translate.load_installed_languages()
File "/home/pj/Downloads/env/lib/python3.10/site-packages/argostranslate/translate.py", line 636, in load_installed_languages
return get_installed_languages()
File "/home/pj/Downloads/env/lib/python3.10/site-packages/argostranslate/translate.py", line 521, in get_installed_languages
packages = package.get_installed_packages()
File "/home/pj/Downloads/env/lib/python3.10/site-packages/argostranslate/package.py", line 327, in get_installed_packages
to_return.append(Package(path))
File "/home/pj/Downloads/env/lib/python3.10/site-packages/argostranslate/package.py", line 193, in __init__
raise FileNotFoundError(
FileNotFoundError: Error opening package at /home/pj/.local/share/argos-translate/packages/__MACOSX/metadata.json no metadata.json
Aborted (core dumped)
I think Argos Translate automatically looks for metadata.json in every directory in .local/share/argos-translate/packages so installing this package with __MACOSX zipped in the argosmodel breaks things. This should be a pretty easy fix by deleting __MACOSX and rezipping.
Maybe I should also look at an Argos Translate source change to be more tolerant of this and not crash.