Many concepts have been proposed for meta learning with neural networks (NNs), e.g., NNs that learn to reprogram fast weights, Hebbian plasticity, learned learning rules, and meta recurrent NNs. Our Variable Shared Meta Learning (VSML) unifies the above and demonstrates that simple weight-sharing and sparsity in an NN is sufficient to express powerful learning algorithms (LAs) in a reusable fashion. A simple implementation of VSML where the weights of a neural network are replaced by tiny LSTMs allows for implementing the backpropagation LA solely by running in forward-mode. It can even meta learn new LAs that differ from online backpropagation and generalize to datasets outside of the meta training distribution without explicit gradient calculation. Introspection reveals that our meta learned LAs learn through fast association in a way that is qualitatively different from gradient descent.
Is it possible to implement modifiable versions of backpropagation or related algorithms as part of the end-to-end differentiable activation dynamics of a neural net (NN), instead of inserting them as separate fixed routines? Here we propose the Variable Shared Meta Learning (VSML) principle for this purpose. It introduces a novel way of using sparsity and weight-sharing in NNs for meta learning. We build on the arguably simplest neural meta learner, the meta recurrent neural network (Meta RNN) [16,10,56], by replacing the weights of a neural network with tiny LSTMs. The resulting system can be viewed as many RNNs passing messages to each other, or as one big RNN with a sparse shared weight matrix, or as a system learning each neuron’s functionality and its LA. VSML generalizes the principle behind end-to-end differentiable fast weight programmers [45,46,3,41], hyper networks [14], learned learning rules [4,13,33], and hebbian-like synaptic plasticity [44,46,25,26,30]. Our mechanism, VSML, can implement backpropagation solely in the forward-dynamics of an RNN. Consequently, it enables meta-optimization of backproplike algorithms. We envision a future where novel methods of credit assignment can be meta learned while still generalizing across vastly different tasks.
I think this type of meta learning strategies where you learn something analogous to back propagation are going to become more common. However, I still haven’t seen many examples of them being practical to use for large neural networks.