Autonomous decision-making has always been one of the primary goals to pursue as concerns mobile robots. Researchers of this field have recently turned their attention to Deep Reinforcement Learning (DRL). This paper presents a Double Deep Q Network architecture for managing the high level decisions of a mobile robot involved in a site servicing task. We imagined a scenario where an autonomous service robot must react to alarms due to failures in its area of interest; the robot must have onboard the necessary servicing tool by resorting to a tool change station if needed, reach the area of the failure and fix it, while at the same time handling its battery status. One of the key properties, yet rarely examined, when it comes to robots' long-term independence is the energy-awareness, namely the ability of autonomously managing the charge state as a function of current and future needs. The proposed Deep Q Network training reward scheme is defined specifically to obtain an energy-aware high-level controller, by penalizing both extremely low levels of battery charge as well as unnecessary recharges. The model is numerically simulated on a graph scenario constituted of several failure and charging nodes. Results show that the trained agent always succeeds in reaching the destination without ever incurring in a complete discharge, as it promptly performs temporary stops at charging locations whenever needed.
An Energy-Aware Decision-Making Scheme for Mobile Robots on a Graph Map Based on Deep Reinforcement Learning
Gemignani G.;Bongiorni M.;Pollini L.
2024-01-01
Abstract
Autonomous decision-making has always been one of the primary goals to pursue as concerns mobile robots. Researchers of this field have recently turned their attention to Deep Reinforcement Learning (DRL). This paper presents a Double Deep Q Network architecture for managing the high level decisions of a mobile robot involved in a site servicing task. We imagined a scenario where an autonomous service robot must react to alarms due to failures in its area of interest; the robot must have onboard the necessary servicing tool by resorting to a tool change station if needed, reach the area of the failure and fix it, while at the same time handling its battery status. One of the key properties, yet rarely examined, when it comes to robots' long-term independence is the energy-awareness, namely the ability of autonomously managing the charge state as a function of current and future needs. The proposed Deep Q Network training reward scheme is defined specifically to obtain an energy-aware high-level controller, by penalizing both extremely low levels of battery charge as well as unnecessary recharges. The model is numerically simulated on a graph scenario constituted of several failure and charging nodes. Results show that the trained agent always succeeds in reaching the destination without ever incurring in a complete discharge, as it promptly performs temporary stops at charging locations whenever needed.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.