Recent impressive growth of AI applications in the most diversified heterogeneous domains is largely motivated by the availability of hardware accelerators used from the backstage of data centers (such as TPU, Tensor Processing Units, or VPUs, Visual Processing Units) to the far edge of embedded devices equipped with DPUs and Deep Learning Processing Units. High level toolchains for a more friendly usability of these platform had similar relevance in the process. In this paper we considered edge devices that provide an essential contribution for the deployment of “distributed intelligence” and are used typically at the gateway, CPE or Edge computing level. One of the typical assumptions is that Field Programmable Gate Array (FPGA) are far more expensive - with respect to power consumption - than legacy SBCs (single board computers). The main contribution of the paper is a fair comparison (at the same clock frequency and with the same main CPU) of processing time and power consumption of two different boards used for deep neural network classification. We will highlight the relevance of classification speed with respect to common KPIs adopted to compare the performances of automatic classification such as Loss, Precision, Recall, etc. This will be particularly relevant in the challenging domains of hardware accelerated real time control loops to provide distributed intelligence at the application level but also at the inner functions of emerging networking architectures.
Experimental Comparison Between SBC and FPGA for Embedded Neural Network Acceleration
Tamburello, Marialaura;Caruso, Giuseppe;Adami, Davide;Giordano, Stefano
2023-01-01
Abstract
Recent impressive growth of AI applications in the most diversified heterogeneous domains is largely motivated by the availability of hardware accelerators used from the backstage of data centers (such as TPU, Tensor Processing Units, or VPUs, Visual Processing Units) to the far edge of embedded devices equipped with DPUs and Deep Learning Processing Units. High level toolchains for a more friendly usability of these platform had similar relevance in the process. In this paper we considered edge devices that provide an essential contribution for the deployment of “distributed intelligence” and are used typically at the gateway, CPE or Edge computing level. One of the typical assumptions is that Field Programmable Gate Array (FPGA) are far more expensive - with respect to power consumption - than legacy SBCs (single board computers). The main contribution of the paper is a fair comparison (at the same clock frequency and with the same main CPU) of processing time and power consumption of two different boards used for deep neural network classification. We will highlight the relevance of classification speed with respect to common KPIs adopted to compare the performances of automatic classification such as Loss, Precision, Recall, etc. This will be particularly relevant in the challenging domains of hardware accelerated real time control loops to provide distributed intelligence at the application level but also at the inner functions of emerging networking architectures.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.