I’ve done some experiments training models on unstructured text data by splitting sentences and “translating” to recover the second half of the sentence.
For example:
{"q":"I baked a cake ", "source":"auto", "target":"infer"}
↓
{"translatedText": "for my friend's birthday party."}
I haven’t had very good results but I think this would work well with more powerful models. This is similar to the AlexaTM model which used a translation style encoder-decoder architecture for generating text.