Hi guys,
I came across a wild example of work when tried to translate russian text from english to russian (yes, exactly like that). That’s not a big deal, I just have to check language before giving result to user
I cannot post a link even to libretranslate. Just ask chatgpt to write some sentences in russian and then translate it from ru to en. It’s unnecessary to know the language to see that result is weird. Vice versa it works fine.
Models are trained in one direction (at least in libretranslate), so giving it input that it’s never seen before in the source won’t be interpreted correctly. Models cant see an input and an output and understand the semantics of going from output to source, only source to output.
Use the opensource fasttext language detection model, it’s good enough. And if it isn’t, you can use stuff like Lingua. It’s even possible to train a higher quality fasttext model from language data online like I did, because a solution like the one I just linked is much slower than fasttext.
1 Like