Over the past decade, the widespread adoption of RNA-seq methodology for transcript-level monitoring has resulted in a surge of biological data requiring comprehensive analysis. The BioSkel project aims to develop a framework for RNA sequencing analysis on multi/many-core machines. This framework relies on generic and modular high-level parallel patterns, enabling biologists to customize their data processing to their specific needs while abstracting away the complexities of parallelization. In this study, we introduce the initial prototype of BioSkel for RNA sequencing analysis, which comprises three main steps: sequence alignment, feature counting, and differential expression analysis. This prototype leverages FastFlow as a back-end for parallelizing the execution, either in shared- and distributed-memory. We provide experimental validations of our approach, considering different architectures and dataset sizes. As a valuable byproduct, we introduce a distributed HPC version of Bowtie2 tool, the first publicly available to our knowledge.
Parallelizing RNA-Seq Analysis with BioSkel: A FastFlow Based Prototype
Tonci, Nicolo;
2025-01-01
Abstract
Over the past decade, the widespread adoption of RNA-seq methodology for transcript-level monitoring has resulted in a surge of biological data requiring comprehensive analysis. The BioSkel project aims to develop a framework for RNA sequencing analysis on multi/many-core machines. This framework relies on generic and modular high-level parallel patterns, enabling biologists to customize their data processing to their specific needs while abstracting away the complexities of parallelization. In this study, we introduce the initial prototype of BioSkel for RNA sequencing analysis, which comprises three main steps: sequence alignment, feature counting, and differential expression analysis. This prototype leverages FastFlow as a back-end for parallelizing the execution, either in shared- and distributed-memory. We provide experimental validations of our approach, considering different architectures and dataset sizes. As a valuable byproduct, we introduce a distributed HPC version of Bowtie2 tool, the first publicly available to our knowledge.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


