Is Encoder-Decoder Redundant for Neural Machine Translation?

argosopentech · November 1, 2022, 11:11pm

The authors suggest treating translation like a generic language modeling task of generating the next word in a sequence of tokens instead of having separate networks for encoding the input text and decoding the output text. This would mean doing something like few-shot translation using something like a autoregressive decoder only Transformer model.