Quantized neural network kernels using very-low-precision arithmetic such as 4-bit integer weights are gaining popularity for their reduced memory requirements and enhanced computational efficiency. However, standard processor architectures are often not optimized for such fine-grained computations. In this paper, we investigate the utilization of the SIMD Within A Register (SWAR) technique to efficiently execute quantized DNN kernels with 4-bit integer weights. Leveraging bitlevel parallelism through SWAR, we achieve significant speedups (up to 7×) compared to standard implementations. We discuss the key SWAR based implementation strategies and demonstrate their efficacy through experimental results. C++ source code available at https://github.com/lorenzograssi01/swaruint4
Speeding up Quantized DNN Kernels Having 4-Bit Integer Weights Using the SWAR Approach
Marco Cococcioni
Co-primo
2026-01-01
Abstract
Quantized neural network kernels using very-low-precision arithmetic such as 4-bit integer weights are gaining popularity for their reduced memory requirements and enhanced computational efficiency. However, standard processor architectures are often not optimized for such fine-grained computations. In this paper, we investigate the utilization of the SIMD Within A Register (SWAR) technique to efficiently execute quantized DNN kernels with 4-bit integer weights. Leveraging bitlevel parallelism through SWAR, we achieve significant speedups (up to 7×) compared to standard implementations. We discuss the key SWAR based implementation strategies and demonstrate their efficacy through experimental results. C++ source code available at https://github.com/lorenzograssi01/swaruint4I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


