The rapid development of Artificial Intelligence (AI) algorithms has created a need for a resource-optimised hardware accelerator. Among various platforms, Coarse-Grained Reconfigurable Array (CGRA) have gained importance as on-edge accelerators. They comprise of heterogeneous Processing Element (PE) matrix, which allows for high flexibility and parallelisation of calculations. They are mainly used for speeding up Data Flow Graph (DFG) execution. We aim to provide a general purpose, highly parameterised, and flexible architecture for AI on-edge data crunching. We propose a CGRA with a vector extension which allows for dynamically adjustable precision of calculation while maintaining a desired performance-power-area optimisation. It targets 4 bits integer (INT4) and 8 bits integer (INT8) quantization for fast and efficient Neural Network (NN) processing. In this paper, we examined hardware costs required to support the vector extension functionality. We synthesised the design on the 40nm Standard-Cell technology from TSMC. The obtained results show that the proposed extension attains on average 28.2% decrease in power consumption and 21.6% decrease in area compared to a reference design of the same computation power.

Flexible Precision Vector Extension for Energy Efficient Coarse-Grained Reconfigurable Array AI-Engine

Mystkowska G.;Zulberti L.;Monopoli M.;Nannipieri P.;Fanucci L.
2024-01-01

Abstract

The rapid development of Artificial Intelligence (AI) algorithms has created a need for a resource-optimised hardware accelerator. Among various platforms, Coarse-Grained Reconfigurable Array (CGRA) have gained importance as on-edge accelerators. They comprise of heterogeneous Processing Element (PE) matrix, which allows for high flexibility and parallelisation of calculations. They are mainly used for speeding up Data Flow Graph (DFG) execution. We aim to provide a general purpose, highly parameterised, and flexible architecture for AI on-edge data crunching. We propose a CGRA with a vector extension which allows for dynamically adjustable precision of calculation while maintaining a desired performance-power-area optimisation. It targets 4 bits integer (INT4) and 8 bits integer (INT8) quantization for fast and efficient Neural Network (NN) processing. In this paper, we examined hardware costs required to support the vector extension functionality. We synthesised the design on the 40nm Standard-Cell technology from TSMC. The obtained results show that the proposed extension attains on average 28.2% decrease in power consumption and 21.6% decrease in area compared to a reference design of the same computation power.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11568/1285109
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact