Improving Chinese Translations

The Chinese model isn’t very good we should train a new one.

I trained a new English → Chinese model. I’m still testing but I think it’s an improvement over the current Chinese package at least in some cases.

translate-en_zh-1_7.argosmodel

English

Ich will den Kreuzstab gerne tragen (lit. ‘“I will gladly carry the cross-staff”’), BWV 56, is a church cantata composed by Johann Sebastian Bach for the 19th Sunday after Trinity. It was first performed in Leipzig on 27 October 1726. The composition is a solo cantata (German: Solokantate) because, apart from the closing chorale, it requires only a single vocal soloist, in this case a bass. The autograph score is one of a few cases where Bach referred to one of his compositions as a cantata. In English, the work is commonly referred to as the Kreuzstab cantata. Bach composed the cantata in his fourth year as Thomaskantor; it is regarded as part of his third cantata cycle.

The text was written by Christoph Birkmann, a student of mathematics and theology in Leipzig who collaborated with Bach. He describes in the first person a Christian willing to “carry the cross” as a follower of Jesus. The poet compares life to a voyage towards a harbour, referring indirectly to the prescribed Gospel reading which says that Jesus travelled by boat. The person, at the end, yearns for death as the ultimate destination, to be united with Jesus. This yearning is reinforced by the closing chorale: the stanza “Komm, o Tod, du Schlafes Bruder” (“Come, o death, you brother of sleep”) from Johann Franck’s 1653 hymn “Du, o schönes Weltgebäude”, which uses the imagery of a sea voyage.

Current en->zh model

我希望登缩Kreuzstab gerne tragen(lit. “I” ,我高兴地携带交叉人员,“BWV56”,这是由Johann Sebastian Bach组成的教堂,在Trinity后的19个星期。 这是最早于10月27日在莱普齐格举行的。 组成是一家女市长(希腊:特派),因为除了关闭的服装外,它只要求一名单一的联络点,在这种情况下,只需要一名邮票。 汽车分数是几个案例之一,其中每一个人都提到其组成是可以想象的。 英文说,工作通常称为Kreuzstabta。 每一位成员在第四年组成了塔塔塔尔,被看作是其第三大可能塔周期的一部分。

该案文由Christooph Birkmann撰写,这是数学的学生,也是与Bach合作的Leipzig学。 他首先描述一个愿意“跨越”的基督教徒。 诗歌将生命与朝一港口的旅舍相比较,间接提及了所谓的“热爱”乘船。 最终,作为最终目的死亡的年终者,将团结起来。 这一年因关闭的服装而得到加强:Sstanza “Komm, o Tod, du Schlafes Bruder”(“Come, 死亡,你的兄弟睡觉”,来自Johann Franck’1653 hymn “Du, oschönes Weltgebäude”,利用海上航线的图像。

Proposed

Ich will den Kreuzstab gerne tragen (lit `I will grly re the cross-staff’), BWV 56, is a Church canta composed of Johann Sebastian Bach for the 19th Sunday after Tri. 第一次于1726年10月27日在莱普齐格进行。 组成是一个 so子公司(German: Solokantate),因为除了关闭园区外,它只需要一个单一的协调人,在这种情况下需要一个大使馆。 自动记分数是Bach把其组成之一称为“直系”。 英文的工作通常称为Kruzstab canta。 每一家都在其第四年作为托马斯卡纳克特人组成,被视为他第三个罐体周期的一部分。

案文由Leipzig数学和神学学生Christoph Birkmann撰写,他与Bach合作。 他首先描述了一名愿意“穿过十字路口”的基督教信徒。 诗人把生命与前往港口的航程相比较,间接地提到《福音》的传译,其中说,耶稣乘坐船。 此人最终将死亡作为最终目的地,与耶稣团结在一起。 关闭的园区“Komm, o Tod, du Schlafes Bruder”(“Come, o death, 您的兄弟睡觉”)从Johann Franck的1653年“Du, o schönes Weltgebäude”上演,后者使用了海上航程的图像。

English

In September and October 2022 the Conservative government of the United Kingdom, led by the newly-appointed prime minister Liz Truss, faced a credibility crisis that led to Truss’ resignation.

The crisis began following the September mini-budget, which was received negatively by the world financial markets. It ultimately led to the dismissal of the chancellor of the Exchequer, Kwasi Kwarteng, on 14 October, and his replacement by Jeremy Hunt. In the following days Truss came under increasing pressure to reverse further elements of the mini-budget to satisfy the markets, and by 17 October five Conservative members of parliament had called for her resignation.

Current

在9月和10月20日,由新任命的首席部长Lz Truss领导的联合王国保守政府面临着导致入侵的可信危机。

危机始于9月份的小预算,这是世界金融市场的负面影响。 最终导致10月14日Kwasi Kwarteng被撤职,他由Jeremy Hunt取代。 此后几天,入侵越来越严重,迫使其扭转小预算的更多因素,以满足市场,到10月17日,议会五个观察成员要求辞职。

Proposed

2022年9月和10月,由新任命的总理利兹·特鲁斯领导的联合王国保守政府面临着导致特鲁斯辞职的可信危机。
危机始于9月份的小型预算,而这一预算受到世界金融市场的消极影响。 最终导致10月14日开除了Exchequer、Kwasi Kwarteng的陪审员,并由杰里米·亨特接替他。 在随后的几天里,Trus人受到越来越大的压力,要扭转小型预算中满足市场需求的内容,到10月17日,议会的五名保守成员要求她辞职。

English

Upon Sweyn Forkbeard’s death, Canute’s brother Harald was King of Denmark. Canute went to Harald to ask for his assistance in the conquest of England, and the division of the Danish kingdom. His plea for division of kingship was denied, though, and the Danish kingdom remained wholly in the hands of his brother, although, Harald lent to Canute the command of the Danes in any attempt he had a mind to make on the English throne. Harald probably saw it was out of his hands anyway. It was a vendetta that held his brother, Canute, and the Vikings driven away in spite of their conquest with Forkbeard. They were bound to fight again, on the basis of vengeance for betrayal.

It is possible Harald was at the siege of London, and the King of Denmark was content with Canute in control of the army. His name was to enter the fraternity of Christ Church, Canterbury, at some point, in 1018, although it is unsure if it was before or after he went home to Denmark with the invasion fleet of his Danes.

In 1018, Harold II died and Canute succeeded him. In 1019, he was to return to Denmark to over-winter, and affirm his succession to the Danish crown. With a Letter in which he states intentions to avert troubles to be done against England, it seems Danes were set against him, and the attack on the Wends was possibly part of his suppression of dissent. In the spring of 1020 he was back in England, his hold on Denmark presumably stable. Ulf Jarl, his brother-in-law, was his appointee as the Earl of Denmark.

Current

在Sweyn Forkbeard去世后,Caute’兄弟Harald是丹麦国王。 对Harald要求他协助寻求英格兰和丹麦王国的分裂。 他对国王的分裂表示不服,但丹麦王国完全以他的兄弟为手,尽管Harald lent to Canute the指挥 of the Danes的任何企图都用英文诗歌。 哈利德可能发现,这几条道路是手无的。 这是一个煽动的朋友,尽管他们寻求福克贝德。 他们不得不再次以报复为条件进行战斗。
有可能遭到伦敦的包围,丹麦国王在军队控制下被枪击。 他的名字是进入圣克里斯蒂安会的兄弟,在大约1018年里,尽管在他来到丹麦之前或之后,他前往丹麦的Danes的入侵队,这是不可靠的。
在1018年,哈德二世去世,加拉特接替他。 在1019年,他将返回丹麦,并申明他继承丹麦王位。 他在信中表示打算避免对英格兰的麻烦,他似乎对他的袭击,而对我们的进攻很可能是他镇压不满的一部分。 在1020年春天,他在英格兰返回,他对丹麦的声望稳定。 Ulf Jarl,他的兄弟法是他被任命为丹麦的埃拉尔。

Proposed

Upon Sweyn Forkbeard的去世,Canute的兄弟Harald是丹麦国王。 Canute去Harald,要求他协助英国征服和丹麦王国分裂。 他的国王分离请求遭到拒绝,但丹麦王国仍然完全掌握着他的兄弟的手,尽管哈勒德曾试图在英国王位上调丹麦人的指挥。 Harald可能看到他没有手。 它是一座etta子,他兄弟Canute和Vikings不顾与Forkbeard的交火被赶走。 他们必须在复仇基础上再次作战。

哈拉尔德有可能被伦敦包围,丹麦国王也同意控制军队。 他的名字是于1018年进入Canterbury的基督教会的兄弟,尽管如果他在丹麦人入侵船队之前或之后来到丹麦,那是不可靠的。

1018年,Harold II死亡,Canute接替他。 1019年,他将返回丹麦,再过间,并确认他继承丹麦王。 他在信中表示有意避免对英格兰造成麻烦,似乎丹麦人反对他,对温斯的袭击可能是他压制不同意见的一部分。 1020年的春天,他回到英格兰,对丹麦的统治大概稳定。 Ulf Jarl是他的兄弟,是丹麦的委托人。

:clap: would be interesting to hear from Chinese speakers here on the forum about the improvements.

2 Likes

Here’s the Chinese → English model:

translate-zh_en-1_7.argosmodel

Chinese


艾米利亚·普莱特
,波兰立陶宛贵族革命家。1830年,十一月起义爆发,她组建了一支小部队,参加起义,与俄罗斯帝国军队作战,参与了几场战斗,并获得起义军的上尉军衔。1831年12月23日,起义被镇压后,普莱特因病逝世,年仅25岁。作为起义的代表性人物之一,她在波蘭立陶宛白俄罗斯等国被视为民族英雄。

1.7 - Proposed

Ayilia Plet, Minister of Justice, Revolution of Lithuania, Poland. In 1830, when the intifada erupted in November, she formed a small force to participate in the intifada, to fight Russian imperial forces, to participate in several battles and to obtain the rank of Captain of the intifada. On 23 December 1831, following the repression of the intifada, Plat died as a result of illness and was only 25 years old. As a representative of the intifada, she was considered a national hero in countries such as Bo, Lithuania and Belarus.

1.1 - Current

Emilial Pellet, Poland’s distinguished ethnic, revolutionary. In 1830, she formed a small force to participate in the intifada, engage in battles with the Russian embassy, engage in several battles and obtain the title of the Lord’s Army. On 23 December 1831, following the repression of the Intifada, Pleit died of illness and only 25 years of age. As a representative of the intifada, she was seen as a national heroic in countries such as Porn, Lithuania and Belarus.

Chinese

维尔纽斯位于立陶宛的东南部(54°41′N 25°17′E)。距离白俄罗斯边界仅有40公里。维尔纽斯的地理位置位于立陶宛的一角,造成这种情况应归因于几个世纪以来立陶宛这个国家边界形状的改变;过去,维尔纽斯曾经处于立陶宛大公国的地理中心。

维尔纽斯位于维尔尼亚河内里斯河的汇合处。据认为维尔纽斯的名称就得名于穿过城市的维尔尼亚河。维尔纽斯距离波罗的海和立陶宛主要海港克莱佩达312公里。维尔纽斯和立陶宛其他主要城市之间以公路相连,距离考纳斯102公里,希奥利艾214公里, 帕內韋日斯135公里。

维尔纽斯附近是声称是欧洲地理中心的几个地点之一。

维尔纽斯市的面积为402平方公里。其中20.2%的地区被建筑物所覆盖,绿地占面积的43.9%,水面占2.1%。

维尔纽斯的气候类型介于大陆性气候与海洋性气候之间,年平均气温为6.1 °C;一月平均气温为−4.9 °C,七月平均气温位17.0 °C。年平均降水量月为661毫米。

维尔纽斯的夏季溫暖,整個白天的气温都可以來到20幾攝氏度,偶爾還會破30度,这时该市的夜生活相当活跃,在白天,户外酒吧和咖啡馆非常普遍。夏季的歷史最高溫是1959年7月的35.4度。

冬季非常寒冷,气温很少在0摄氏度以上,在一月和二月,气温低于零下20摄氏度并不罕见。在每年寒冷的冬季,维尔纽斯的河流和附近的湖泊都会封冻,一项流行的消遣活动就是冰上钓鱼,钓鱼者在冰上凿出一个洞,然后用带饵的钩钓鱼。冬季的歷史最低溫是1940年1月的零下37.2度。

1.7 - Proposed

The 位于 is located in the south-east of Lithuania (54°41’N 25°17’E). There are only 40 kilometres from the Belarus border. The geographical location of the Gulf of Guinea is situated in the corner of Lithuania, which is due to changes in the country’s boundaries over centuries; in the past, the Gulf of Guinea has been in the geographical centre of the Grand Duchy of Lithuania.

The 位于 is located in the junctions of the Vernia and Neris rivers. It is believed that the name of the thorium will be known to pass through the city of Vernia. It is 312 kilometres from the main seaports of the Baltic Sea and Lithuania. Road links exist between Brunei and other major Lithuanian cities, 102 kilometres from Cunas, 214 kilometres from Chioli and 135 kilometres from Pa® Day.

The proximity of the Gulf of Guinea is one of several locations that claim to be the European Geographic Centre.

The area of the city is 402 square kilometres. Of these, 20.2 per cent were covered by buildings, 43.9 per cent were greened and 2.1 per cent were watered.

The climate type of arsenic varies from continental to marine climate, with an annual average temperature of 6.1°C; a monthly average temperature of 6.9°C; and an average temperature of 17.0°C in July. The average annual precipitation is 661 mm per month.

The summer temperatures of the arsenic, all of which are rated for day-to-day temperatures, can be rounded up to 20 ± 30 times, when the city’s night lives are quite active and, in the day, bars and cafés are common. The peak of summer temperature was 35.4 in July 1959.

Winter is very cold, temperatures are rarely above 0 degrees Celsius, and in January and February temperatures are less than zero 20 degrees Celsius. During the cold winter of each year, arsenic rivers and nearby lakes are frozen, and a popular disincentive is ice-fishing, with liners breaking a hole on the ice, followed by coltan. Winter minimum temperature is 37.2 in January 1940.

1.1 - Current

ville Nis are located in southern Lithuania (54°41’N 25° 17’E). There are only 40 kilometres away from the Belarusian border. The geographic location of the ville Niues lies in Lithuania, which should be attributed to changes in the country’s border profile since several centuries; in the past, the geographic centre of the Principality of Lithuania has been located in the Principality.

ville Nis are located in the veterans River and the Rivas River. It was believed that the name of Vilius would be known as the Veria River that had crossed the city. Vilnius distances from the Baltic and Lithuania’s main seaport, Kleipa 312 km. The main cities of Vilnius and Lithuania are connected to the road, distance from 102 km, Helioai 214 km, and Patrickdays 135 km.

The vicinity of the ville News are one of the locations claimed to be the European Geographical Centre.

The area of the city of Vilnes is 402 square kilometres. Of these, 20.2 per cent of the areas covered by the buildings, 43.9 per cent of the green area and 2.1 per cent of water.

The climate types of ville Nis are between the continental climate and the ocean climate, with an annual average temperature of 6.1 °C; the average temperature of 1 January - 4.9 °C, the average temperature of 17.0 °C in July. The average annual rainfall was 661 mm.

The summer warming of the ville Niues, which can be pushed back into 20 milli, occasional recuperation, 30 times when the city’s night lives are quite active, and in the white day, the bonus and the café are very prevalent. The summer peak was 35.4 in July 1959.

The winter was very cold, with little temperatures exceeding 0°C, and in January and February, the temperature was less than 20°C. In the winter of cold typhoons each year, the veterans rivers and the nearby lakes are clocked, a pandemic sterilization activity is the ice, and the fishermen are hiding a hole in the ice, followed by the chewing. The winter was the lowest of 372 in January 1940.

Chinese

大部分法律都有规定版权期限。版權過期後,作品就属于全人类共有的財產,称为公共财产或是公有领域(public domain)的財產。版权保护的时间各不相同,中国大陆台湾香港法律所設的保护期限,是作品原創者去世后50年;美国和欧洲则是作品原創者去世后70年。因此像《红楼梦》、《傲慢与偏见》之类的作品原文,现在都已经过了版权保护的期限,任何人都可以随意取得、复制或者在原作基础上继续创作,包括翻譯、引用、出版等。不過,例如翻譯後的衍生作品則不同了,如果衍生作品的作者仍然在世、或是去世但未超過上述年份,則仍然受到版權保護。

另外,有些作品的版權許可協議,自由度足以讓維基百科的編輯者使用,又或者作品屬於公有領域,但這必須得到嚴格證實。沒有版權公告的物品不代表作品可以隨便自由引用。

另外,大多数国家的版权法中也有叫做“合理使用”(fair use)的概念。这个概念十分复杂,而且各国的规定都不盡相同。基本上,您可以在不侵犯他人权益的大前提下,摘录别人作品的一小段内容。例如您要写一篇《哈利·波特》的书评、講述作者寫作手法等,您当然想引用该书中的部分内容、並加以評述。「合理使用」的概念允许您自行摘录《哈利·波特》书中的小段內容,而無需預先諮詢版權持有人。但是至于这“一小段”到底应该是多长,而且是在什么情况下才可以摘录这一小段,却没有明确的界定。

1.7 Proposed

Most laws provide for the duration of copyright. later, works are common to all mankind.” They are referred to as public property or public domain (public domain). The duration of copyright protection varies from mainland China, Taiwan, Hong Kong law (“Hong Kong Law”) to 50 years after the death of former authors of works; the United States and Europe are 70 years after the death of former authors of works”. As a result, works such as the Red Crescent, the arrogance and prejudice have now gone through the period of copyright protection, and any person may freely acquire, reproduce or continue to create on the basis of his or her original work, including translation, quotation, publication, etc. There is a difference between work derived from it, for example, when the author of the derived work is still alive or dies, but not in the years mentioned above, it is still circumscribed.

Furthermore, some works may be produced in such a way that they may be produced in such a way that they are sufficient for the purpose of satisfying the needs of those who use them, or for the purpose of producing them in the public domain, provided that the.” Articles of the non-representative version of the bulletin may be quoted freely.

In addition, the concept of “reasonable use” (fair use) is included in the copyright laws of most countries. This concept is complex and national provisions are not identical. In essence, you can extract a small part of the work of others without encroaching on the interests of others. For example, you would like to write an assessment of Khalil Port, a copy of which, of course, you would like to quote some of the contents of the book and to quote it. The concept of “reasonable use” allows you to excerpt from the sub-paragraphs in the Khalil Port, without having to produce a first-come, first-served version of the holder. However, it should be long and, in what circumstances, it could be excerpted without a clear definition.

1.1 Current

Most laws provide for the duration of copyright. The edition of the CBT is a common heritage of all mankind, known as public property or public domains (public domains). The time for copyright protection differs from time to time, in China’s continental, Taiwan and Hong Kong law, the period of protection of the capital, 50 years after the death of the owner of the works, and in the United States and Europe, 70 years after the death of the owner of the works. As a result, the original text of the works like the Redehouse dream, the arrogance and prejudice, has now been subject to the time frame for the protection of copyrights, and any person may continue to be created on the basis of his or her intention, including the reference, publication, etc. Unlike, for example, the derivatives of the redundant, and if the author of the derivative is still born, dead or lost, the year unfolding, it is still in the form of the batch.

In addition, some works are available for sterilization, free of charge for the use of the ministers of the hidings, or the work of the matrimonial domain, provided that the ante receives a glossary. The articles that have been published in the form of a declaration of the Fund are free of charge.

Moreover, the concept of “reasonable use” (fair use) is also known in the copyright law of most countries. This concept is complex and States do not have the same provisions. In essence, you may, without prejudice to the rights and interests of others, extract a small fraction of the works of others. For example, you would like to write a book “Harli Putt”, the stereotype writer’s written hand law, and you would of course wish to cite some of the elements of the book and the combination of the monet. The concept of rational use allows you to release a small paragraph in the Harli Putt book, which is required by the pioneering version of the holder. However, the phrase “a small paragraph” should be too long, and in what circumstances it would be possible to extract the small paragraph without a clear definition.

Based on the country naming improvement alone the proposed model might be better. :laughing:

But in seriousness, the text in the proposed model is more discursive and coherent.

1 Like

The new Chinese models are live on the package index.

1 Like

I have tried it.The effect has indeed been greatly improved.

Could you help me answer the following questions?

1.What kind of corpus is used for training? Is it given in the argos train/data index. json section?
2.When I do Chinese trainning,Do Chinese sentence need to be segmented/tokenized first in the training dataset?

1 Like

Yes, the data is downloaded from here automatically by Argos Train.

No, Argos Train does tokenization for you here.

2 Likes

thank you very much for your help

1 Like

I did notice a slight improvement with version 1.7 of the chinese model, but in many cases, it’s still very far from the mainstream translation engines (Google, Azure, DeepL). I tried to translate all my Mandarin flashcards and the result is here: Grid // Baserow (it’s still running, but eventually there will be around 2000+ entries).

I’m a complete novice when it comes to machine translation, is there a way to improve the accuracy of Argos Translate Chinese → English translations significantly ? What would be required ? For example, is contribution of GPU time in any way useful ? I’m trying to understand in what way I could help the Libre Translate community, I would love to have a really accurate open source Chinese->English translation engine.

1 Like

Thanks for doing these experiments!

The main way to improve the translations is more high quality training data. It only takes ~24 hours to train a new Chinese model on a RTX 3090 so GPU time isn’t a bottleneck.

There’s surprisingly little data available for the en-zh language pair which is part of why the translations aren’t great. I think it’s also difficult to translate Chinese->English because they’re very different languages.

Novice question: where can I see the data which was used to train the zh->en model ? What are the generally accepted rules about how to enhance that data ? Is using data generated by other translation engines acceptable ?

1 Like

All of the current data comes from Opus. Then I have a system for packaging the data for Argos Train.

I normally only use data with open source licenses. Some datasets have non commercial restrictions and I don’t use them. Using training data generated by other translation engines should work well as long as their terms allow it.

Found this paper which explores using the romanized form of Chinese (pinyin) as input/output for Chinese characters. Wondering if that could help, may try it out in some future endeavors.

1 Like

Argos Translate now has Opus-MT models for Chinese and Traditional Chinese.

If anyone is a native speaker I’d love to hear feedback:

1 Like