Handling the formal/informal distinction in Argos Translate

Yeah this is a hard problem. Different formal/informal language model variants for Spanish, French, etc. would probably be excessive and increase the bandwidth required to download all of the language models. With multilingual translation maybe it would be more efficient to have separate language codes for different flavors of the same language?

Plus I don’t know how we would find data for formal/informal; Opus doesn’t make this distinction. I normally default to the ISO 639 language codes but I don’t think they have a formal/informal distinction.

We could pass metadata to the translation model to give it information about formal/informal but that would have an overhead of requiring more infrastructure. The current system should try to infer if you want formal/informal based on the context and how similar sentences were translated in the training data.

This problem could also be solved with few shot translation models like Chat GPT that understand more of the context for the users translation but that’s not currently how Argos Translate works.