The Chinese model isn’t very good we should train a new one.
I trained a new English → Chinese model. I’m still testing but I think it’s an improvement over the current Chinese package at least in some cases.
Ich will den Kreuzstab gerne tragen (lit. ‘“I will gladly carry the cross-staff”’), BWV 56, is a church cantata composed by Johann Sebastian Bach for the 19th Sunday after Trinity. It was first performed in Leipzig on 27 October 1726. The composition is a solo cantata (German: Solokantate) because, apart from the closing chorale, it requires only a single vocal soloist, in this case a bass. The autograph score is one of a few cases where Bach referred to one of his compositions as a cantata. In English, the work is commonly referred to as the Kreuzstab cantata. Bach composed the cantata in his fourth year as Thomaskantor; it is regarded as part of his third cantata cycle.
The text was written by Christoph Birkmann, a student of mathematics and theology in Leipzig who collaborated with Bach. He describes in the first person a Christian willing to “carry the cross” as a follower of Jesus. The poet compares life to a voyage towards a harbour, referring indirectly to the prescribed Gospel reading which says that Jesus travelled by boat. The person, at the end, yearns for death as the ultimate destination, to be united with Jesus. This yearning is reinforced by the closing chorale: the stanza “Komm, o Tod, du Schlafes Bruder” (“Come, o death, you brother of sleep”) from Johann Franck’s 1653 hymn “Du, o schönes Weltgebäude”, which uses the imagery of a sea voyage.
Current en->zh model
我希望登缩Kreuzstab gerne tragen(lit. “I” ,我高兴地携带交叉人员,“BWV56”,这是由Johann Sebastian Bach组成的教堂,在Trinity后的19个星期。 这是最早于10月27日在莱普齐格举行的。 组成是一家女市长(希腊:特派),因为除了关闭的服装外,它只要求一名单一的联络点,在这种情况下,只需要一名邮票。 汽车分数是几个案例之一,其中每一个人都提到其组成是可以想象的。 英文说,工作通常称为Kreuzstabta。 每一位成员在第四年组成了塔塔塔尔,被看作是其第三大可能塔周期的一部分。
该案文由Christooph Birkmann撰写,这是数学的学生,也是与Bach合作的Leipzig学。 他首先描述一个愿意“跨越”的基督教徒。 诗歌将生命与朝一港口的旅舍相比较,间接提及了所谓的“热爱”乘船。 最终,作为最终目的死亡的年终者,将团结起来。 这一年因关闭的服装而得到加强:Sstanza “Komm, o Tod, du Schlafes Bruder”(“Come, 死亡,你的兄弟睡觉”,来自Johann Franck’1653 hymn “Du, oschönes Weltgebäude”,利用海上航线的图像。
Ich will den Kreuzstab gerne tragen (lit `I will grly re the cross-staff’), BWV 56, is a Church canta composed of Johann Sebastian Bach for the 19th Sunday after Tri. 第一次于1726年10月27日在莱普齐格进行。 组成是一个 so子公司(German: Solokantate),因为除了关闭园区外,它只需要一个单一的协调人,在这种情况下需要一个大使馆。 自动记分数是Bach把其组成之一称为“直系”。 英文的工作通常称为Kruzstab canta。 每一家都在其第四年作为托马斯卡纳克特人组成,被视为他第三个罐体周期的一部分。
案文由Leipzig数学和神学学生Christoph Birkmann撰写,他与Bach合作。 他首先描述了一名愿意“穿过十字路口”的基督教信徒。 诗人把生命与前往港口的航程相比较,间接地提到《福音》的传译,其中说,耶稣乘坐船。 此人最终将死亡作为最终目的地,与耶稣团结在一起。 关闭的园区“Komm, o Tod, du Schlafes Bruder”(“Come, o death, 您的兄弟睡觉”)从Johann Franck的1653年“Du, o schönes Weltgebäude”上演,后者使用了海上航程的图像。
In September and October 2022 the Conservative government of the United Kingdom, led by the newly-appointed prime minister Liz Truss, faced a credibility crisis that led to Truss’ resignation.
The crisis began following the September mini-budget, which was received negatively by the world financial markets. It ultimately led to the dismissal of the chancellor of the Exchequer, Kwasi Kwarteng, on 14 October, and his replacement by Jeremy Hunt. In the following days Truss came under increasing pressure to reverse further elements of the mini-budget to satisfy the markets, and by 17 October five Conservative members of parliament had called for her resignation.
危机始于9月份的小预算,这是世界金融市场的负面影响。 最终导致10月14日Kwasi Kwarteng被撤职,他由Jeremy Hunt取代。 此后几天,入侵越来越严重,迫使其扭转小预算的更多因素,以满足市场,到10月17日,议会五个观察成员要求辞职。
危机始于9月份的小型预算,而这一预算受到世界金融市场的消极影响。 最终导致10月14日开除了Exchequer、Kwasi Kwarteng的陪审员,并由杰里米·亨特接替他。 在随后的几天里,Trus人受到越来越大的压力,要扭转小型预算中满足市场需求的内容,到10月17日,议会的五名保守成员要求她辞职。
Upon Sweyn Forkbeard’s death, Canute’s brother Harald was King of Denmark. Canute went to Harald to ask for his assistance in the conquest of England, and the division of the Danish kingdom. His plea for division of kingship was denied, though, and the Danish kingdom remained wholly in the hands of his brother, although, Harald lent to Canute the command of the Danes in any attempt he had a mind to make on the English throne. Harald probably saw it was out of his hands anyway. It was a vendetta that held his brother, Canute, and the Vikings driven away in spite of their conquest with Forkbeard. They were bound to fight again, on the basis of vengeance for betrayal.
It is possible Harald was at the siege of London, and the King of Denmark was content with Canute in control of the army. His name was to enter the fraternity of Christ Church, Canterbury, at some point, in 1018, although it is unsure if it was before or after he went home to Denmark with the invasion fleet of his Danes.
In 1018, Harold II died and Canute succeeded him. In 1019, he was to return to Denmark to over-winter, and affirm his succession to the Danish crown. With a Letter in which he states intentions to avert troubles to be done against England, it seems Danes were set against him, and the attack on the Wends was possibly part of his suppression of dissent. In the spring of 1020 he was back in England, his hold on Denmark presumably stable. Ulf Jarl, his brother-in-law, was his appointee as the Earl of Denmark.
在Sweyn Forkbeard去世后,Caute’兄弟Harald是丹麦国王。 对Harald要求他协助寻求英格兰和丹麦王国的分裂。 他对国王的分裂表示不服,但丹麦王国完全以他的兄弟为手,尽管Harald lent to Canute the指挥 of the Danes的任何企图都用英文诗歌。 哈利德可能发现,这几条道路是手无的。 这是一个煽动的朋友,尽管他们寻求福克贝德。 他们不得不再次以报复为条件进行战斗。
在1018年,哈德二世去世,加拉特接替他。 在1019年,他将返回丹麦,并申明他继承丹麦王位。 他在信中表示打算避免对英格兰的麻烦,他似乎对他的袭击,而对我们的进攻很可能是他镇压不满的一部分。 在1020年春天,他在英格兰返回,他对丹麦的声望稳定。 Ulf Jarl,他的兄弟法是他被任命为丹麦的埃拉尔。
Upon Sweyn Forkbeard的去世,Canute的兄弟Harald是丹麦国王。 Canute去Harald,要求他协助英国征服和丹麦王国分裂。 他的国王分离请求遭到拒绝,但丹麦王国仍然完全掌握着他的兄弟的手,尽管哈勒德曾试图在英国王位上调丹麦人的指挥。 Harald可能看到他没有手。 它是一座etta子,他兄弟Canute和Vikings不顾与Forkbeard的交火被赶走。 他们必须在复仇基础上再次作战。
1018年,Harold II死亡,Canute接替他。 1019年,他将返回丹麦,再过间,并确认他继承丹麦王。 他在信中表示有意避免对英格兰造成麻烦,似乎丹麦人反对他,对温斯的袭击可能是他压制不同意见的一部分。 1020年的春天,他回到英格兰,对丹麦的统治大概稳定。 Ulf Jarl是他的兄弟,是丹麦的委托人。
would be interesting to hear from Chinese speakers here on the forum about the improvements.
Here’s the Chinese → English model:
1.7 - Proposed
Ayilia Plet, Minister of Justice, Revolution of Lithuania, Poland. In 1830, when the intifada erupted in November, she formed a small force to participate in the intifada, to fight Russian imperial forces, to participate in several battles and to obtain the rank of Captain of the intifada. On 23 December 1831, following the repression of the intifada, Plat died as a result of illness and was only 25 years old. As a representative of the intifada, she was considered a national hero in countries such as Bo, Lithuania and Belarus.
1.1 - Current
Emilial Pellet, Poland’s distinguished ethnic, revolutionary. In 1830, she formed a small force to participate in the intifada, engage in battles with the Russian embassy, engage in several battles and obtain the title of the Lord’s Army. On 23 December 1831, following the repression of the Intifada, Pleit died of illness and only 25 years of age. As a representative of the intifada, she was seen as a national heroic in countries such as Porn, Lithuania and Belarus.
维尔纽斯的气候类型介于大陆性气候与海洋性气候之间，年平均气温为6.1 °C；一月平均气温为−4.9 °C，七月平均气温位17.0 °C。年平均降水量月为661毫米。
1.7 - Proposed
The 位于 is located in the south-east of Lithuania (54°41’N 25°17’E). There are only 40 kilometres from the Belarus border. The geographical location of the Gulf of Guinea is situated in the corner of Lithuania, which is due to changes in the country’s boundaries over centuries; in the past, the Gulf of Guinea has been in the geographical centre of the Grand Duchy of Lithuania.
The 位于 is located in the junctions of the Vernia and Neris rivers. It is believed that the name of the thorium will be known to pass through the city of Vernia. It is 312 kilometres from the main seaports of the Baltic Sea and Lithuania. Road links exist between Brunei and other major Lithuanian cities, 102 kilometres from Cunas, 214 kilometres from Chioli and 135 kilometres from Pa® Day.
The proximity of the Gulf of Guinea is one of several locations that claim to be the European Geographic Centre.
The area of the city is 402 square kilometres. Of these, 20.2 per cent were covered by buildings, 43.9 per cent were greened and 2.1 per cent were watered.
The climate type of arsenic varies from continental to marine climate, with an annual average temperature of 6.1°C; a monthly average temperature of 6.9°C; and an average temperature of 17.0°C in July. The average annual precipitation is 661 mm per month.
The summer temperatures of the arsenic, all of which are rated for day-to-day temperatures, can be rounded up to 20 ± 30 times, when the city’s night lives are quite active and, in the day, bars and cafés are common. The peak of summer temperature was 35.4 in July 1959.
Winter is very cold, temperatures are rarely above 0 degrees Celsius, and in January and February temperatures are less than zero 20 degrees Celsius. During the cold winter of each year, arsenic rivers and nearby lakes are frozen, and a popular disincentive is ice-fishing, with liners breaking a hole on the ice, followed by coltan. Winter minimum temperature is 37.2 in January 1940.
1.1 - Current
ville Nis are located in southern Lithuania (54°41’N 25° 17’E). There are only 40 kilometres away from the Belarusian border. The geographic location of the ville Niues lies in Lithuania, which should be attributed to changes in the country’s border profile since several centuries; in the past, the geographic centre of the Principality of Lithuania has been located in the Principality.
ville Nis are located in the veterans River and the Rivas River. It was believed that the name of Vilius would be known as the Veria River that had crossed the city. Vilnius distances from the Baltic and Lithuania’s main seaport, Kleipa 312 km. The main cities of Vilnius and Lithuania are connected to the road, distance from 102 km, Helioai 214 km, and Patrickdays 135 km.
The vicinity of the ville News are one of the locations claimed to be the European Geographical Centre.
The area of the city of Vilnes is 402 square kilometres. Of these, 20.2 per cent of the areas covered by the buildings, 43.9 per cent of the green area and 2.1 per cent of water.
The climate types of ville Nis are between the continental climate and the ocean climate, with an annual average temperature of 6.1 °C; the average temperature of 1 January - 4.9 °C, the average temperature of 17.0 °C in July. The average annual rainfall was 661 mm.
The summer warming of the ville Niues, which can be pushed back into 20 milli, occasional recuperation, 30 times when the city’s night lives are quite active, and in the white day, the bonus and the café are very prevalent. The summer peak was 35.4 in July 1959.
The winter was very cold, with little temperatures exceeding 0°C, and in January and February, the temperature was less than 20°C. In the winter of cold typhoons each year, the veterans rivers and the nearby lakes are clocked, a pandemic sterilization activity is the ice, and the fishermen are hiding a hole in the ice, followed by the chewing. The winter was the lowest of 372 in January 1940.
Most laws provide for the duration of copyright. later, works are common to all mankind.” They are referred to as public property or public domain (public domain). The duration of copyright protection varies from mainland China, Taiwan, Hong Kong law (“Hong Kong Law”) to 50 years after the death of former authors of works; the United States and Europe are 70 years after the death of former authors of works”. As a result, works such as the Red Crescent, the arrogance and prejudice have now gone through the period of copyright protection, and any person may freely acquire, reproduce or continue to create on the basis of his or her original work, including translation, quotation, publication, etc. There is a difference between work derived from it, for example, when the author of the derived work is still alive or dies, but not in the years mentioned above, it is still circumscribed.
Furthermore, some works may be produced in such a way that they may be produced in such a way that they are sufficient for the purpose of satisfying the needs of those who use them, or for the purpose of producing them in the public domain, provided that the.” Articles of the non-representative version of the bulletin may be quoted freely.
In addition, the concept of “reasonable use” (fair use) is included in the copyright laws of most countries. This concept is complex and national provisions are not identical. In essence, you can extract a small part of the work of others without encroaching on the interests of others. For example, you would like to write an assessment of Khalil Port, a copy of which, of course, you would like to quote some of the contents of the book and to quote it. The concept of “reasonable use” allows you to excerpt from the sub-paragraphs in the Khalil Port, without having to produce a first-come, first-served version of the holder. However, it should be long and, in what circumstances, it could be excerpted without a clear definition.
Most laws provide for the duration of copyright. The edition of the CBT is a common heritage of all mankind, known as public property or public domains (public domains). The time for copyright protection differs from time to time, in China’s continental, Taiwan and Hong Kong law, the period of protection of the capital, 50 years after the death of the owner of the works, and in the United States and Europe, 70 years after the death of the owner of the works. As a result, the original text of the works like the Redehouse dream, the arrogance and prejudice, has now been subject to the time frame for the protection of copyrights, and any person may continue to be created on the basis of his or her intention, including the reference, publication, etc. Unlike, for example, the derivatives of the redundant, and if the author of the derivative is still born, dead or lost, the year unfolding, it is still in the form of the batch.
In addition, some works are available for sterilization, free of charge for the use of the ministers of the hidings, or the work of the matrimonial domain, provided that the ante receives a glossary. The articles that have been published in the form of a declaration of the Fund are free of charge.
Moreover, the concept of “reasonable use” (fair use) is also known in the copyright law of most countries. This concept is complex and States do not have the same provisions. In essence, you may, without prejudice to the rights and interests of others, extract a small fraction of the works of others. For example, you would like to write a book “Harli Putt”, the stereotype writer’s written hand law, and you would of course wish to cite some of the elements of the book and the combination of the monet. The concept of rational use allows you to release a small paragraph in the Harli Putt book, which is required by the pioneering version of the holder. However, the phrase “a small paragraph” should be too long, and in what circumstances it would be possible to extract the small paragraph without a clear definition.
Based on the country naming improvement alone the proposed model might be better.
But in seriousness, the text in the proposed model is more discursive and coherent.
The new Chinese models are live on the package index.
I have tried it.The effect has indeed been greatly improved.
Could you help me answer the following questions？
1.What kind of corpus is used for training? Is it given in the argos train/data index. json section?
2.When I do Chinese trainning,Do Chinese sentence need to be segmented/tokenized first in the training dataset?
Yes, the data is downloaded from here automatically by Argos Train.
No, Argos Train does tokenization for you here.
thank you very much for your help
I did notice a slight improvement with version 1.7 of the chinese model, but in many cases, it’s still very far from the mainstream translation engines (Google, Azure, DeepL). I tried to translate all my Mandarin flashcards and the result is here: Grid // Baserow (it’s still running, but eventually there will be around 2000+ entries).
I’m a complete novice when it comes to machine translation, is there a way to improve the accuracy of Argos Translate Chinese → English translations significantly ? What would be required ? For example, is contribution of GPU time in any way useful ? I’m trying to understand in what way I could help the Libre Translate community, I would love to have a really accurate open source Chinese->English translation engine.
Thanks for doing these experiments!
The main way to improve the translations is more high quality training data. It only takes ~24 hours to train a new Chinese model on a RTX 3090 so GPU time isn’t a bottleneck.
There’s surprisingly little data available for the en-zh language pair which is part of why the translations aren’t great. I think it’s also difficult to translate Chinese->English because they’re very different languages.
Novice question: where can I see the data which was used to train the zh->en model ? What are the generally accepted rules about how to enhance that data ? Is using data generated by other translation engines acceptable ?
All of the current data comes from Opus. Then I have a system for packaging the data for Argos Train.
I normally only use data with open source licenses. Some datasets have non commercial restrictions and I don’t use them. Using training data generated by other translation engines should work well as long as their terms allow it.
Found this paper which explores using the romanized form of Chinese (pinyin) as input/output for Chinese characters. Wondering if that could help, may try it out in some future endeavors.