Hi all,
I’ve just discovered LibreTranslate today and I’m very keen on trying it out. We have our own dedicated environment for this with enough GPU power, so we’d like to run it and, more importantly, train it within our environment. We’d use our own corpora which one more argument to use our own environment. The video tutorial for training a new language model shows it has to be done on vast.ai if I understand correctly. So, can we use our environment for training as well?
Secondly, the aim for us would be to use it as a backend service with enabled API which we would integrate with our translation tools. This is also possible, right?
Best, seba
1 Like
the tutorial presents vast.ai because it’s easy to use for people who don’t have gpu but if you have an environment with gpu you can do exactly like the tutorial but without the specific handling for vast.ai
then with the obtained module you can install it and it will be available with argos and libretranslate
1 Like
So just to confirm we can train the model without exposing our proprietary data outside our environment?
yes argos-train can be used on your computer
1 Like
So long as you respect the terms of the AGPLv3 license LibreTranslate/LICENSE at main · LibreTranslate/LibreTranslate · GitHub, yes. If you modify the software you’ll need to make the source code of your modifications available to your users. (I’m not a lawyer and this does not constitute legal advice).
1 Like
Please excuse my poor wording. We wouldn’t change the software at all. We would train LibreTranslate with our own data and have it running separately. Then, our own translation tool would call LibreTranslate API. There wouldn’t be any modification or integration. I was referring to integration from the process point of view.
2 Likes
I have two follow-up questions.
-
Is it possible to set it up in a way that we get back bilingual documents? So it’s easier for translators to check the machine translations. Ideally this would be in a feedback learning loop, if possible.
-
Is it possible to hire someone to the initial setup in our environment? If so, is this the place to search for them?
Hi there,
It is perfectly possible. You need
- a Python dependency mirror in your environment if you want to isolate it from the Internet
- a “Locomotive” server for training the models (this requires GPU)
- a “LibreTranslate” instance does not require GPU for inference, it can run up to 4 one thousand character requests a second on a single CPU core
If you want to use your own models, I advise defining a dedicated user to run libretranslate. You will need to copy the packages under said user’s home directory from root/.local/share/argos-translate/packages, but updating LT or argostranslate will not change the models used for production.
Pipeline as follows
- train models on the Locomotive server (Windows for convenience, but it’s OS agnostic), best metrics to follow are val.BLEU and, to a lesser extent, ppl
- serve them to the LibreTranslate instance using pscp
- install them with a python “file_to_package.py” script (see there)
- copy them to the user directory (remove existing packages before)
- reboot the service
2 Likes
I can help setup the environment. You can email me at [email protected]
1 Like