I’ve started working on NLLU (https://github.com/LibreTranslate/nllu) with the goal of running inference on NLLB at scale (and cheaply) to generate a corpus of backtranslated data for a variety of languages.
I’ve started running inference on 15 million Paracrawl sentences from English → Italian as a first run. It should take about a week.
I plan to generate data for Polish and Dutch next.