Be inspired by features of other online translation services and discuss reverse engeneering them

EmanuelLoos · August 17, 2021, 2:26pm

The best non-free online translation service I know is DeepL. DeepL feeds its translation AI with the user input which makes it so accurate. Specifically when using DeepL it not just provides the user wit one possible translation but multiple ones. It displays only one possibility at first but if the translation doesn’t read right the user can simply click on that part and select from a list of different possibilities. If the user does this is reported back to DeepL somehow and according to DeepLs privacy policy the AI trained to translate differently in that case after checked by someone working there. After training the AI the user input is deleted. Also when clicking on a word its possible translations according to a dictionary are shown to the user which also is a help.

LibreTranslate could implement this in a much more privacy friendly way by asking the user before sending the correction/improvement showing the user exactly what data would be sent and letting the user decide whether to send it or not. There could be a reviewing page where other users can review the corrections. I believe implementing this would improve translation quality greatly.

What do you think about my idea?

argosopentech · August 18, 2021, 9:12am

There’s currently a repo for submitting community data but it’s not really used:

Multiple translations are available through Argos Translate ITranslation.hypotheses. Automated data collection could be a cool idea though.

EmanuelLoos · October 16, 2021, 2:36pm

How about implementing a dictionary (Wiktionary?) that shows relevant information about the word where the cursor is at that time?

argosopentech · October 19, 2021, 11:33am

The models are currently trained with Wiktionary data but making more explicit use of definition data would also be possible.

EmanuelLoos · October 22, 2021, 5:54pm

But what if a word has multiple meanings? Wouldn’t it be good to show them to the user like DeepL does for the word where the cursor is?

argosopentech · October 23, 2021, 8:35pm

CTranlate2 and Argos Translate support multiple hypotheses so we could do something like that for entire translations. It might also be possible to use more explicit dictionary data (definitions, synonyms, translations, word origins, pronunciations) to give to users.