Learning to solve sequential tasks with recurrent models requires the ability to memorize long sequences and to extract task-relevant features from them. In this paper, we study memorization from the point of view of the design and training of recurrent neural networks. We study how to maximize the short-term memory of recurrent units, an objective difficult to achieve using backpropagation. We propose a new model, the Linear Memory Network, which features an encoding-based memorization component built with a linear autoencoder for sequences. Additionally, we provide a specialized training algorithm that initializes the memory to efficiently encode the hidden activations of the network. Experimental results on synthetic and real-world datasets show that the chosen encoding mechanism is superior to static encodings such as orthogonal models and the delay line. The method also outperforms RNN and LSTM units trained using stochastic gradient descent. Experiments on symbolic music modeling show that the training algorithm specialized for the memorization component improves the final performance compared to stochastic gradient descent.
Encoding-based memory for recurrent neural networks
Carta A.;Sperduti A.;Bacciu D.
2021-01-01
Abstract
Learning to solve sequential tasks with recurrent models requires the ability to memorize long sequences and to extract task-relevant features from them. In this paper, we study memorization from the point of view of the design and training of recurrent neural networks. We study how to maximize the short-term memory of recurrent units, an objective difficult to achieve using backpropagation. We propose a new model, the Linear Memory Network, which features an encoding-based memorization component built with a linear autoencoder for sequences. Additionally, we provide a specialized training algorithm that initializes the memory to efficiently encode the hidden activations of the network. Experimental results on synthetic and real-world datasets show that the chosen encoding mechanism is superior to static encodings such as orthogonal models and the delay line. The method also outperforms RNN and LSTM units trained using stochastic gradient descent. Experiments on symbolic music modeling show that the training algorithm specialized for the memorization component improves the final performance compared to stochastic gradient descent.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.