LLAMA-2 Acceleration Using the ARM Scalable Vector Extension

Rossi, F.; Cococcioni, M.; Saponara, S.

doi:10.1007/978-3-031-84100-2_31

The rise of large language models (LLMs) has spurred recent advances in artificial intelligence (AI), transforming natural language generation and processing. These models perform exceptionally well in a variety of tasks, including machine translation and sentiment analysis, thanks to their unparalleled size and complexity. However, their complexity poses computational difficulties that call for strong hardware acceleration and effective algorithms. To tackle this, we investigate how to speed up LLM processes using the ARM Scalable Vector Extension (SVE). With its ability to vectorize, SVE can potentially improve ARM-based processors’ parallel processing. We present the results of this approach, describing the features of SVE, and going over optimization strategies for LLMs on high-performance computing systems. The results of our experiments show how SVE auto-vectorization enables a speed-up by a factor of up to 4.25× in training time compared to a non-SVE optimized code.

LLAMA-2 Acceleration Using the ARM Scalable Vector Extension

Rossi F.;Cococcioni M.;Saponara S.

2024-01-01

Abstract

The rise of large language models (LLMs) has spurred recent advances in artificial intelligence (AI), transforming natural language generation and processing. These models perform exceptionally well in a variety of tasks, including machine translation and sentiment analysis, thanks to their unparalleled size and complexity. However, their complexity poses computational difficulties that call for strong hardware acceleration and effective algorithms. To tackle this, we investigate how to speed up LLM processes using the ARM Scalable Vector Extension (SVE). With its ability to vectorize, SVE can potentially improve ARM-based processors’ parallel processing. We present the results of this approach, describing the features of SVE, and going over optimization strategies for LLMs on high-performance computing systems. The results of our experiments show how SVE auto-vectorization enables a speed-up by a factor of up to 4.25× in training time compared to a non-SVE optimized code.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2024
			
	Codice DOI
	
				https://dx.doi.org/10.1007/978-3-031-84100-2_31
			
	Tutti gli autori
	
						Rossi, F.; Cococcioni, M.; Saponara, S.

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11568/1307448

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

0

ND

CINECA IRIS Institutional Research Information System

LLAMA-2 Acceleration Using the ARM Scalable Vector Extension

Rossi F.;Cococcioni M.;Saponara S.

2024-01-01

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

Attenzione

Citazioni

social impact

CINECA IRIS Institutional Research Information System

LLAMA-2 Acceleration Using the ARM Scalable Vector Extension

Rossi F.;Cococcioni M.;Saponara S.

2024-01-01

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Informazioni

Attenzione

Citazioni

social impact

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)