I’m currently mostly working on the training scripts to better automate training and train more and better models. There are also a number of open tickets for various smaller things.
Looking forward, breaking changes in 2.0 are still a ways off but I want to do single character tokenization and seq2seq sentence boundary detection. Depending on how the field progresses few shot translation may also play a larger role in later versions, but is already implemented.
Repost from Github