Training an Argos Translation Model Locally on Windows

Pre-requisites

  • Good GPU (I am using Nvidia GeForce 3060 Ti, others may work but additional setup outside the guide may be required for allowing CUDA in WSL)
  • Windows OS
  • WSL Kernel Version > 5.10.43.3
  • WSL 2
  • Python 3

Scope

3 Likes

Downloading Text for Training

Automatic Quick Setup

Download Text Automatically

Setup

git clone https://github.com/Interaction-Bot/opus-nlp-downloader.git
cd opus-nlp-downloader
pip install -r requirements.txt
python main.py get en th
python main.py download en th data/

Response

{'wikimedia': {'links': 'https://object.pouta.csc.fi/OPUS-wikimedia/v20210402/moses/en-th.txt.zip', 'sentences': 26597}, 'CCAligned': {'links': 'https://object.pouta.csc.fi/OPUS-CCAligned/v1/moses/en-th.txt.zip', 'sentences': 10746372}, 'OpenSubtitles': {'links': 'https://object.pouta.csc.fi/OPUS-OpenSubtitles/v2018/moses/en-th.txt.zip', 'sentences': 3281533}, 'XLEnt': {'links': 'https://object.pouta.csc.fi/OPUS-XLEnt/v1.2/moses/en-th.txt.zip', 'sentences': 1236145}, 'Tanzil': {'links': 'https://object.pouta.csc.fi/OPUS-Tanzil/v1/moses/en-th.txt.zip', 'sentences': 93540}, 'QED': {'links': 'https://object.pouta.csc.fi/OPUS-QED/v2.0a/moses/en-th.txt.zip', 'sentences': 264677}, 'GNOME': {'links': 'https://object.pouta.csc.fi/OPUS-GNOME/v1/moses/en-th.txt.zip', 'sentences': 78}, 'NeuLab-TedTalks': {'links': 'https://object.pouta.csc.fi/OPUS-NeuLab-TedTalks/v1/moses/en-th.txt.zip', 'sentences': 102773}, 'bible-uedin': {'links': 'https://object.pouta.csc.fi/OPUS-bible-uedin/v1/moses/en-th.txt.zip', 'sentences': 124386}, 'TED2020': {'links': 'https://object.pouta.csc.fi/OPUS-TED2020/v1/moses/en-th.txt.zip', 'sentences': 160762}}

Alternative - Collect Translation Texts

Opus project

  • Gather data from above link
  • Get English Text
  • Get Thai Text - copy of English text
  • Get License information

Creating an Argos Data Package

  • Folder structure
data-<dataSource>-<codeFrom>_<codeTo>
	metadata.json
	README
	source
	target
  • metadata.json
{
	"name": "<dataSource>",
	"type": "data",
	"from_code": "<codeFrom>",
	"to_code": "<codeTo>",
	"size": <sentences>,
	"reference": ""
}
  • Zip folders and change extension to .argosdata

Hosting the Package Locally

  • Add all .argosdata files to a folder
  • Install python 3
  • In the folder run
python3 -m http.server
  • Links to add to data-index inside docker will be of the format
http://host.docker.internal:8000/<your-file>.argosdata
1 Like

Setting up Nvidia GPU to use CUDA

Running Argos Train on Docker

docker run --gpus all -it argosopentech/argostrain /bin/bash

If it already exists

docker container attach argostrain

Initialize

su argosopentech
source ~/argos-train-init

Add data from Download Text for Training to data-index.json for each .argosdata file

{
	"name": "<dataSource>",
	"type": "data",
	"from_code": "<codeFrom>",
	"to_code": "<codeTo>",
	"size": <size>,
	"reference": "",
	"links": [
		"<linkToArgosdataFile>"
	]
}

Train

argos-train

You will then get some prompts, for English to Thai enter the following:

From code (ISO 639): en
To code (ISO 639): th
From name: English
To name: Thai
Version: 1.0.0

On finish you should see something like this

Testing Model

pip install argostranslate
pip install pathlib
python
import argostranslate.package
import argostranslate.translate
import pathlib

argostranslate.package.update_package_index()
package_path = pathlib.Path("<file_name>.argosmodel")
argostranslate.package.install_from_path(package_path)

translatedText = argostranslate.translate.translate(term, from_code, to_code)

Troubleshooting

If you have CUDA issues this could be because your WSL version is not up to date. To update perform the following steps:

  • Go to Settings > Check for Updates

  • Select Advanced Options and turn on receiving updates for other products (WSL)
    updates-other-products

  • Go back and select “Check for updates” and install any updates. Restart and you should have the latest WSL version; you can check this with wsl cat /proc/version and you want a version greater than 5.10.43

2 Likes

WSL

Setup and Install

Install WSL and Ubuntu

Prerequisites: Windows 10 version 2004 and higher (Build 19041 and higher) or Windows 11.

Enable WSL

If you have an older PC with WSL 1 you will need to upgrade to WSL 2 and follow instructions from Step 4.

Then install Ubuntu (or you favourite Linux distro)

wsl --set-version 2
wsl --install -d Ubuntu

Docker

Setup and Install

Install Docker

Setup Docker with VS Code (Optional but Easier if you are a vim noob like me)

Install the Following Extensions

You should see the two highlighted items in your side bar for Remote Explorer and Docker management respectively
docker-wsl-extensions

1 Like