CINECA IRIS Institutional Research Information System

Genes regulate fundamental processes in living cells, such as the synthesis of proteins or other functional molecules. Studying gene expression is hence crucial for both diagnostic and therapeutic purposes. State-of-the-art Deep Learning techniques such as Xpresso have proposed to predict gene expression from raw DNA sequences. However, DNA sequences challenge computational approaches because of their length, typically in the order of the thousands, and sparsity, requiring models to capture both short-and long-range dependencies. Indeed, the application of recent techniques like transformers is prohibitive with common hardware resources. This paper proposes FNETCOMPRESSION, a novel gene-expression prediction method. Crucially, FNETCOM-PRESSION combines Convolutional encoders and memoryefficient Transformers to compress the sequence up to 95% with minimal performance tradeoff. Experiments on the Xpresso dataset show that FNETCOMPRESSION outscores our baselines and the margin is statistically significant. Moreover, FNETCOMPRESSION is 88% faster than a classical transformer-based architecture with minimal performance tradeoff. 1

Squeeze and Learn: Compressing Long Sequences with Fourier Transformers for Gene Expression Prediction

Vittorio Pipoli^Primo;Giuseppe Attanasio;Marta Lovino;Elisa Ficarra

2024-01-01

Abstract

Genes regulate fundamental processes in living cells, such as the synthesis of proteins or other functional molecules. Studying gene expression is hence crucial for both diagnostic and therapeutic purposes. State-of-the-art Deep Learning techniques such as Xpresso have proposed to predict gene expression from raw DNA sequences. However, DNA sequences challenge computational approaches because of their length, typically in the order of the thousands, and sparsity, requiring models to capture both short-and long-range dependencies. Indeed, the application of recent techniques like transformers is prohibitive with common hardware resources. This paper proposes FNETCOMPRESSION, a novel gene-expression prediction method. Crucially, FNETCOM-PRESSION combines Convolutional encoders and memoryefficient Transformers to compress the sequence up to 95% with minimal performance tradeoff. Experiments on the Xpresso dataset show that FNETCOMPRESSION outscores our baselines and the margin is statistically significant. Moreover, FNETCOMPRESSION is 88% faster than a classical transformer-based architecture with minimal performance tradeoff. 1

Scheda breve

Scheda completa

Scheda completa (DC)

Anno

2024

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11568/1324628

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

ND

ND

social impact