Measuring Sentiment Bias in Machine Translation (Argos Translate, fairseq-nllb, BERT2BERT)

Biases induced to text by generative models have l become an increas-ingly large topic in recent years. In this paper we explore how machine translationmight introduce a bias in sentiments as classified by sentiment analysis models. For this, we compare three open access machine translation models for five dif-ferent languages on two parallel corpora to test if the translation process causes a shift in sentiment classes recognized in the texts. Though our statistic test indicate shifts in the label probability distributions, we find none that appears consistent enough to assume a bias induced by the translation process.

This study set out to explore whether MT systems introduce biases in sentiment expressions. We compared three translation models (fairseq-nllb [27], Argos-translate [10], and BERT2BERT [35]) for five languages (German, English, Hebrew, Spanish, and
Chinese) from the TED2020 and Global Voices corpora. Our statistical analyses (paired t-test and χ2-test) were not able to confirm any bias. The closest to this is the translation from German to English by the Argo translation system, which causes a shift towards neutral sentiments for both corpora. This ‘bias’, however, cannot be substantiated by a notably large WD.

I found this paper doing a Google Scholar search for Argos Translate. It looks like they’re trying to see if machine translation changes the sentiment of translated text.

Ah, that’s awesome. It’s always great to see one’s software being cited on a paper.

