SGConv - Structural Global Convolution Networks

In the past few years Transformer networks have been the dominant way to do natural language processing; they’re efficient to train in parallel and can understand the context a word is being used in well. Transformer models work by calculating an attention value between every pair of tokens in the input sequence which is O(n2); because Transformers are O(n2) it’s difficult to scale them up to large input sequences. Argos Translate currently uses Transformers and deals with this problem by splitting the input text into sentences and then translating each sentence independently. However, this means that translations don’t have access to any context from nearby sentences.

Convolutional Neural Networks are frequently used in image processing tasks. CNNs pass a convolution filter over sections of an image that extracts a more concise representation of a section of pixels. This allows CNNs to aggregate information about an area of an image into a global understanding of the image.

This paper shows that you can effectively use CNNs to do NLP on long sequences of text tokens. The convolutions connect the tokens being translated to nearby tokens most heavily but are also able to use context from long range dependencies too.