Submitting feedback for individual translations

argosopentech · February 11, 2023, 5:52pm

We get a lot of feedback to Argos Translate and LibreTranslate where users report an individual word or phrase that they think was translated incorrectly. [1] [2]

When we get these reports as GitHub issues or forum posts there’s not much we can productively do with the information besides saying, “this would be improved with better models.”

Where should we direct people to submit feedback?

They could submit suggested translations to the LibreTranslate.com/suggest API endpoint. We could also build a web frontend for the endpoint so people don’t need to be able to make a POST request manually to submit feedback.

We also have the Community Dataset repo but we probably don’t want people making pull requests there for individual words because it could get difficult to maintain.

pierotofy · February 12, 2023, 2:47am

This seems the most practical way (perhaps with a UI to help).

lucw · April 26, 2023, 9:35am

What would make most sense would be for some kind of online service to then immediately start using the improved translation, thereby immediately benefiting other users of the service, and later to be included in the model after some human review.

argosopentech · April 26, 2023, 10:34am

Using the submitted translations immediately probably isn’t practical because we’d need to retrain the language models every time someone submitted a new translation. I think we’ll just save the submitted translations and then add them to the training data when we’re training new models.

lucw · April 26, 2023, 12:15pm

I’m a total novice in machine translation , but would something like this be achievable?
And note that my view may be tainted by my use case. I study languages and most of my data consists of small sentences (which may have a higher hit rate in a cache type of setup).

process translations through the models.
a user notices an incorrect translation, and submits a correction.
that correction is immediately used by the translation engine if the same input string is received again (through a cache lookup)
at some point later, a manual review process takes places to validate the correction. If it’s good, we incorporate it into the training data, if not, we remove it from the cache at step 3.

dingedi · April 26, 2023, 12:59pm

it would be necessary to check the translations submitted, I think it is too risky to display the new translation proposal without validating it beforehand.

ArtanisTheOne · April 26, 2023, 1:55pm

You can use semantic similarity (cosine sim) using SentenceTransformers to measure how close the translations are. I’ve found there are still some issues and it’s not always picking up on issues (eg noisy random punctuation missing in src or tgt dont reflect much in the similarity)