Learning to solve sequential tasks with recurrent models requires the ability to memorize long sequences and to extract task-relevant features from them. In this paper, we study memorization from the point of view of the design and training of recurrent neural networks. We study how to maximize the short-term memory of recurrent units, an objective difficult to achieve using backpropagation. We propose a new model, the Linear Memory Network, which features an encoding-based memorization component built with a linear autoencoder for sequences. Additionally, we provide a specialized training algorithm that initializes the memory to efficiently encode the hidden activations of the network. Experimental results on synthetic and real-world datasets show that the chosen encoding mechanism is superior to static encodings such as orthogonal models and the delay line. The method also outperforms RNN and LSTM units trained using stochastic gradient descent. Experiments on symbolic music modeling show that the training algorithm specialized for the memorization component improves the final performance compared to stochastic gradient descent.

Encoding-based memory for recurrent neural networks

Carta A.;Sperduti A.;Bacciu D.
2021-01-01

Abstract

Learning to solve sequential tasks with recurrent models requires the ability to memorize long sequences and to extract task-relevant features from them. In this paper, we study memorization from the point of view of the design and training of recurrent neural networks. We study how to maximize the short-term memory of recurrent units, an objective difficult to achieve using backpropagation. We propose a new model, the Linear Memory Network, which features an encoding-based memorization component built with a linear autoencoder for sequences. Additionally, we provide a specialized training algorithm that initializes the memory to efficiently encode the hidden activations of the network. Experimental results on synthetic and real-world datasets show that the chosen encoding mechanism is superior to static encodings such as orthogonal models and the delay line. The method also outperforms RNN and LSTM units trained using stochastic gradient descent. Experiments on symbolic music modeling show that the training algorithm specialized for the memorization component improves the final performance compared to stochastic gradient descent.
2021
Carta, A.; Sperduti, A.; Bacciu, D.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11568/1126482
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 3
  • ???jsp.display-item.citation.isi??? 1
social impact