CINECA IRIS Institutional Research Information System

Learning to solve sequential tasks with recurrent models requires the ability to memorize long sequences and to extract task-relevant features from them. In this paper, we study memorization from the point of view of the design and training of recurrent neural networks. We study how to maximize the short-term memory of recurrent units, an objective difficult to achieve using backpropagation. We propose a new model, the Linear Memory Network, which features an encoding-based memorization component built with a linear autoencoder for sequences. Additionally, we provide a specialized training algorithm that initializes the memory to efficiently encode the hidden activations of the network. Experimental results on synthetic and real-world datasets show that the chosen encoding mechanism is superior to static encodings such as orthogonal models and the delay line. The method also outperforms RNN and LSTM units trained using stochastic gradient descent. Experiments on symbolic music modeling show that the training algorithm specialized for the memorization component improves the final performance compared to stochastic gradient descent.

Encoding-based memory for recurrent neural networks

Carta A.;Sperduti A.;Bacciu D.

2021-01-01

Abstract

Learning to solve sequential tasks with recurrent models requires the ability to memorize long sequences and to extract task-relevant features from them. In this paper, we study memorization from the point of view of the design and training of recurrent neural networks. We study how to maximize the short-term memory of recurrent units, an objective difficult to achieve using backpropagation. We propose a new model, the Linear Memory Network, which features an encoding-based memorization component built with a linear autoencoder for sequences. Additionally, we provide a specialized training algorithm that initializes the memory to efficiently encode the hidden activations of the network. Experimental results on synthetic and real-world datasets show that the chosen encoding mechanism is superior to static encodings such as orthogonal models and the delay line. The method also outperforms RNN and LSTM units trained using stochastic gradient descent. Experiments on symbolic music modeling show that the training algorithm specialized for the memorization component improves the final performance compared to stochastic gradient descent.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2021
			
	Codice DOI
	
				https://dx.doi.org/10.1016/j.neucom.2021.04.051
			
	Tutti gli autori
	
						Carta, A.; Sperduti, A.; Bacciu, D.
					
	Appare nelle tipologie:
	
				1.1 Articolo in rivista

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11568/1126482

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

3

1

social impact