Handling the formal/informal distinction in Argos Translate

argosopentech · April 19, 2023, 1:14am

github.com/argosopentech/argos-translate

The question of polite forms in languages other than English

opened 11:43AM - 31 Mar 23 UTC

closed 12:51PM - 14 May 23 UTC

Hi @[PJ-Finlay](https://github.com/argosopentech/argos-translate/commits?author=…PJ-Finlay) First of all, thank you for developing this software. I have managed to find workarounds for my cuda-related issue, I was also able to quickly build a Tk interface as I could not get the GUI to work with windows 10 at this point; the fact that everything finds easy solutions is great, and the quality of translations is reasonably good for a free tool. There is one thing that could definitely greatly improve the translation quality, and that is to find a way to tell the software in which "politeness" level its translations should be written. This would be definitely helpful for languages such as Japanese, but also many European languages such as German (du vs. Sie), French (tu vs. vous), Spanish (tu vs. Usted), and probably others. In my scenario, I was trying to translate a polite letter from English to Spanish, and I noticed the "you" were translated to "tu" instead of Usted, and the verbs were also in 2nd person, which is not polite. Since I can also write French, I decided to try a direct French to Spanish translation using "vous", but it turns out that this is translated fr -> en -> es and since there is no politeness in English, the "vous" was lost in the way... (This also means using English as a pivot language is not a super idea as the language is very semantically poor btw.) I found a "workaround" by putting "usted" in my English writing instead of "you", but I'm not 100% sure the verbs are correctly conjugated. Can you think of something or indicate me in which direction I could find solutions? Many thanks in advance!

In my scenario, I was trying to translate a polite letter from English to Spanish, and I noticed the “you” were translated to “tu” instead of Usted, and the verbs were also in 2nd person, which is not polite.

This is a good question. English doesn’t have a polite form (we’re straightforward and rude by default haha). So if you’re translating from English to a language with a formal/informal distinction there currently isn’t a good way to tell Argos Translate which one you want.

I’m open to suggestions on how to handle this well. I’ve generally defaulted to a Unicode->Unicode architecture for Argos Translate where the neural network handles any complexity with specific languages. However, for this issue maybe it could be useful to pass some sort of metadata to the translation model about the type of translation the user wants (formal/informal etc.).

pierotofy · April 19, 2023, 1:48am

This could be a separate model (or a fine tuned model?).

From a language model index perspective, something that perhaps is still missing is the ability to have different “variants” of the language models, whether it’s a formal/informal distinction or a particular language (e.g. British or American English).

argosopentech · April 19, 2023, 2:23am

Yeah this is a hard problem. Different formal/informal language model variants for Spanish, French, etc. would probably be excessive and increase the bandwidth required to download all of the language models. With multilingual translation maybe it would be more efficient to have separate language codes for different flavors of the same language?

Plus I don’t know how we would find data for formal/informal; Opus doesn’t make this distinction. I normally default to the ISO 639 language codes but I don’t think they have a formal/informal distinction.

We could pass metadata to the translation model to give it information about formal/informal but that would have an overhead of requiring more infrastructure. The current system should try to infer if you want formal/informal based on the context and how similar sentences were translated in the training data.

This problem could also be solved with few shot translation models like Chat GPT that understand more of the context for the users translation but that’s not currently how Argos Translate works.

argosopentech · April 19, 2023, 2:58am

I think there are standard language codes for regional dialects of a language. For example, British English is en-Gb. However, there generally aren’t codes for a specific situations in a language like formal/informal.