Multilingual translation with CTranslate2 and pre-trained FairSeq models

argosopentech · September 12, 2022, 1:00pm

It’s a bit hacky but you’ll probably need to use the seq2seq sentence boundary detection system instead of Stanza to get this to work.

export ARGOS_STANZA_AVAILABLE=0
export ARGOS_DEV_MODE=1
argos-translate-gui

The plan going forward is to have better support for models that support many languages (not just a single from_lang and to_lang) by adding this to the model package’s metadata.json:

{
    "languages": [
        {
            "code": "en",
            "name": "English"
        },
        {
            "code": "es",
            "name": "Spanish"
        },
        {
            "code": "chunk",
            "name": "Chunk"
        }
    ]
}

I then want to make “chunk” a valid language that works similar to the current seq2seq chunking system to split input text into sentences to be translated separately. Currently if you’re not using Stanza you have to install the sbd system as it’s own package (which is available on the dev index) but I want to also support having them combined with the main translation language model.