To get this production ready we would need to:
- Train a new model using this data.
- Write the code to break up text, run inference on it, and rebuild the xml structure.
- Generate data for other languages.
- Train new models with tag data.
I’m currently planning to do few shot translation with an API model provider and then come back to this. Since model training is time consuming and expensive I’m planning to train new models all at once for Argos Translate 2.0 with other potentially breaking changes like removing the tokenizer. If anyone is interested in working on this we could train a test model and test running inference before scaling up to more languages.