User request to train chinese->chinese(tranditional) model

We currently translate zh->zt by pivoting through English:

# Current implementaion
zh -> en -> zt

# With zh -> zt model
zh -> zt

I’m not sure how commonly used the zh-zt is but I think it would make sense to have a model for this language pair.

See the pic. Look like kind of chinese → english → chinese(tranditional) result.Is multilanguage translation implement by this way?
And more important,the result is not chinese(tranditional),both input and output is chinese(simplified).
And, chinese(simplified) → chinese(tranditional) don’t need a middle languge cause there is a one-to-one correspondence between simplified Chinese characters and traditional Chinese characters.
Maybe there is room for optimization. Thank you for your work.

I’ve been studying Chinese for a year and what I can already tell is that there is little use for such a translation model between simplified and traditional Chinese: the difference goes down to how you write some of the character components.
For instance the “speech” key features a semi-dozen strokes in traditional writing, in simplified it looks like an i, a.s.o. It was designed under Mao, which is why most overseas Chinese communities still use traditional writing.
A rule-based post-processing can easily do this kind of thing.
Ancient texts are quite impossible to translate anyway, because the grammar has been reformed during the first Chinese Republic a century ago.
Chinese is not the only language to have undergone dramatic changes during the last century: that has been deemed necessary to overcome mass illiteracy in many countries, Turkey being another example. Portugal and the Soviet Union simplified grammar and writing to, though not to the point of making written language unrecognizable to the older generations.
Even if one tried to train a model to that end, there’s simply not enough data on OPUS to make it worth the while (300K sentence pairs, 95% of which in the notoriously useless CCMultiAligned), so the result would be worse than the current pivot.

1 Like