Libretranslation is having poor quality for chinese to english conversion on Local Host Script and also Punctuation of the sentence conversion is not handling properly

krishna11 · February 16, 2024, 11:24am

I uploaded chinese text file from via Local host script for chinese to english conversion.so although the text is extracted but where ever brackets and special characters are available presence in the text and it was not dealing properly like some garbage translation getting I don’t know,I am sorry and chuckles etc…

How can we encounter this issue on Local host API script for chinese to english conversion??

I am sharing sample output here(chinese to english)

.\nI don’t know.\nResearch findings\nFor infants and young children who cannot be fully breastfed

Thanks
krishna

pierotofy · February 16, 2024, 3:06pm

Can you share the input?

krishna11 · February 19, 2024, 10:34am

[摘要] 目的了解近 5年中国 3~5岁学龄前儿童龋齿患病及治疗情况。、湿疹、起过敏的蛋白质,
I am sharing input here as per your information

I am getting these repeated words for during conversion chinese to english conversion for handling special characters in between sentences like this
♪ I can’t ♪
3.I don’t know.
4.I don’t know.
I don’t know.
[Chuckles]
What?

What?
[ Chuckles ]
I’m sorry.

Although special characters are not getting any proper outputs for during chinese to english conversation (。、 ! , @ # $ /‘’-*() & etc…)

could you elaborate about this issue and How can we encounter this issue to get proper exact output on Local host API script for during chinese to english conversion??

Thanks
krishna

Bo_Alchemist · January 3, 2025, 12:25pm

Thanks

NicoLe · January 4, 2025, 7:09pm

Hi,
First thing, we need to know what is your localhost version: is it 1.9 (installation with pip) or did you download the updated code?
Updated code is not optimal for Chinese as it uses spacy multilingual, which does not support Chinese non_alphanum characters.

Then, do you use the document translation, or a custom API?
First case, there are some issues with the docs translation, that I have not clearly identified and that appeared during the 2024 year. I run two labs: one has not been fully updated from Dec, 2023 and documents translate more or les if slowly, the other has been updated, and document translation yields such gibberish as the output you published although text translation is fine.
If you use a custom API, then preprocess your files to get sentences with stanza for Chinese (get the old code through GitHub), and feed them to the model with the code in Locomotive’s eval.py that’s optimized to get a list of sentence and deliver them as such: Locomotive/eval.py at main · LibreTranslate/Locomotive · GitHub