Speeding up Quantized DNN Kernels Having 4-Bit Integer Weights Using the SWAR Approach

Grassi, Lorenzo; Cococcioni, Marco

doi:10.1007/978-3-032-17174-0_27

Quantized neural network kernels using very-low-precision arithmetic such as 4-bit integer weights are gaining popularity for their reduced memory requirements and enhanced computational efficiency. However, standard processor architectures are often not optimized for such fine-grained computations. In this paper, we investigate the utilization of the SIMD Within A Register (SWAR) technique to efficiently execute quantized DNN kernels with 4-bit integer weights. Leveraging bitlevel parallelism through SWAR, we achieve significant speedups (up to 7×) compared to standard implementations. We discuss the key SWAR based implementation strategies and demonstrate their efficacy through experimental results. C++ source code available at https://github.com/lorenzograssi01/swaruint4

Speeding up Quantized DNN Kernels Having 4-Bit Integer Weights Using the SWAR Approach

Lorenzo Grassi^Co-primo;Marco Cococcioni^Co-primo

2026-01-01

Abstract

Quantized neural network kernels using very-low-precision arithmetic such as 4-bit integer weights are gaining popularity for their reduced memory requirements and enhanced computational efficiency. However, standard processor architectures are often not optimized for such fine-grained computations. In this paper, we investigate the utilization of the SIMD Within A Register (SWAR) technique to efficiently execute quantized DNN kernels with 4-bit integer weights. Leveraging bitlevel parallelism through SWAR, we achieve significant speedups (up to 7×) compared to standard implementations. We discuss the key SWAR based implementation strategies and demonstrate their efficacy through experimental results. C++ source code available at https://github.com/lorenzograssi01/swaruint4

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2026
			
	Codice DOI
	
				https://dx.doi.org/10.1007/978-3-032-17174-0_27
			
	Tutti gli autori
	
						Grassi, Lorenzo; Cococcioni, Marco

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11568/1348967

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

0

ND

CINECA IRIS Institutional Research Information System

Speeding up Quantized DNN Kernels Having 4-Bit Integer Weights Using the SWAR Approach

Lorenzo Grassi^Co-primo;Marco Cococcioni^Co-primo

Co-primo

Co-primo

2026-01-01

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

Attenzione

Citazioni

social impact

CINECA IRIS Institutional Research Information System

Speeding up Quantized DNN Kernels Having 4-Bit Integer Weights Using the SWAR Approach

Lorenzo GrassiCo-primo;Marco Cococcioni Co-primo

Co-primo

Co-primo

2026-01-01

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Informazioni

Attenzione

Citazioni

social impact

Conferma cancellazione

Lorenzo Grassi^Co-primo;Marco Cococcioni^Co-primo

Scheda breve

Scheda completa

Scheda completa (DC)