CINECA IRIS Institutional Research Information System

In this work, we propose a 3D scene reconstruction algorithm based on a fully convolutional 3D denoising autoencoder neural network. The network is capable of reconstructing a full scene from a single depth image by creating a 3D representation of it and automatically filling holes and inserting hidden elements. We exploit the fact that our neural network is capable of generalizing object shapes by inferring similarities in geometry. Our fully convolutional architecture enables the network to be unconstrained by a fixed 3D shape, and so it is capable of successfully reconstructing arbitrary scene sizes. Our algorithm was evaluated on a real word dataset of tabletop scenes acquired using a Kinect and processed using KinectFusion software in order to obtain ground truth for network training and evaluation. Extensive measurements show that our deep neural network architecture outperforms the previous state of the art both in terms of precision and recall for the scene reconstruction task. The network has been broadly profiled in terms of memory footprint, number of floating point operations, inference time and power consumption in CPU, GPU and embedded devices. Its small memory footprint and its low computation requirements enable low power, memory constrained, real time always-on embedded applications such as autonomous vehicles, warehouse robots, interactive gaming controllers and drones.

Fully convolutional denoising autoencoder for 3D scene reconstruction from a single depth image

Palla, Alessandro;Moloney, David;Fanucci, Luca

2017-01-01

Abstract

In this work, we propose a 3D scene reconstruction algorithm based on a fully convolutional 3D denoising autoencoder neural network. The network is capable of reconstructing a full scene from a single depth image by creating a 3D representation of it and automatically filling holes and inserting hidden elements. We exploit the fact that our neural network is capable of generalizing object shapes by inferring similarities in geometry. Our fully convolutional architecture enables the network to be unconstrained by a fixed 3D shape, and so it is capable of successfully reconstructing arbitrary scene sizes. Our algorithm was evaluated on a real word dataset of tabletop scenes acquired using a Kinect and processed using KinectFusion software in order to obtain ground truth for network training and evaluation. Extensive measurements show that our deep neural network architecture outperforms the previous state of the art both in terms of precision and recall for the scene reconstruction task. The network has been broadly profiled in terms of memory footprint, number of floating point operations, inference time and power consumption in CPU, GPU and embedded devices. Its small memory footprint and its low computation requirements enable low power, memory constrained, real time always-on embedded applications such as autonomous vehicles, warehouse robots, interactive gaming controllers and drones.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2017
			
	Codice ISBN
	
				978-1-5386-1107-4
			
	Appare nelle tipologie:
	
				4.1 Contributo in Atti di convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11568/887780

Citazioni

ND

7

3

social impact