CINECA IRIS Institutional Research Information System

In this research, some of the issues that arise from the scalarization of the multi-objective optimization problem in the Advantage Actor–Critic (A2C) reinforcement learning algorithm are investigated. The paper shows how a naive scalarization can lead to gradients overlapping. Furthermore, the possibility that the entropy regularization term can be a source of uncontrolled noise is discussed. With respect to the above issues, a technique to avoid gradient overlapping is proposed, while keeping the same loss formulation. Moreover, a method to avoid the uncontrolled noise, by sampling the actions from distributions with a desired minimum entropy, is investigated. Pilot experiments have been carried out to show how the proposed method speeds up the training. The proposed approach can be applied to any Advantage-based Reinforcement Learning algorithm.

Solving the scalarization issues of Advantage-based Reinforcement Learning algorithms

Galatolo F. A.;Cimino M. G. C. A.;Vaglini G.

2021-01-01

Abstract

In this research, some of the issues that arise from the scalarization of the multi-objective optimization problem in the Advantage Actor–Critic (A2C) reinforcement learning algorithm are investigated. The paper shows how a naive scalarization can lead to gradients overlapping. Furthermore, the possibility that the entropy regularization term can be a source of uncontrolled noise is discussed. With respect to the above issues, a technique to avoid gradient overlapping is proposed, while keeping the same loss formulation. Moreover, a method to avoid the uncontrolled noise, by sampling the actions from distributions with a desired minimum entropy, is investigated. Pilot experiments have been carried out to show how the proposed method speeds up the training. The proposed approach can be applied to any Advantage-based Reinforcement Learning algorithm.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2021
			
	Codice DOI
	
				https://dx.doi.org/10.1016/j.compeleceng.2021.107117
			
	Tutti gli autori
	
						Galatolo, F. A.; Cimino, M. G. C. A.; Vaglini, G.

File in questo prodotto:

File	Dimensione	Formato
2004.04120.pdf accesso aperto Tipologia: Documento in Pre-print Licenza: Creative commons Dimensione 1.85 MB Formato Adobe PDF Visualizza/Apri	1.85 MB	Adobe PDF	Visualizza/Apri
1-s2.0-S004579062100121X-main.pdf solo utenti autorizzati Tipologia: Versione finale editoriale Licenza: NON PUBBLICO - Accesso privato/ristretto Dimensione 3.53 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	3.53 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11568/1105586

Citazioni

ND

3

3

social impact