Scientific workflows are increasingly characterized by complex task dependencies and large-scale dataexchanges, which place significant pressure on the input/output (I/O) systems of traditional Workflow Engines(WFEs). These challenges are particularly evident in data-intensive and real-time processing contexts, whereconventional disk-based I/O mechanisms often become performance bottlenecks. This paper presents anapproach to enhancing the DAGonStar scientific workflow engine by integrating CAPIO, a middleware designedto support memory-based streaming I/O. The integration combines DAGonStar's orchestration capabilities withCAPIO's efficient data handling to better support workflows operating on continuous or large-scale datasets.We describe the architectural modifications introduced to enable this collaboration and provide an analysis ofthe resulting system. The proposed solution aims to improve the responsiveness and flexibility of scientificworkflows by streamlining data transfers and simplifying task coordination. This work contributes to theevolution of workflow systems toward more efficient and scalable models for scientific computing.
Streaming I/O for scientific workflow engine acceleration
Torquati M.;
2026-01-01
Abstract
Scientific workflows are increasingly characterized by complex task dependencies and large-scale dataexchanges, which place significant pressure on the input/output (I/O) systems of traditional Workflow Engines(WFEs). These challenges are particularly evident in data-intensive and real-time processing contexts, whereconventional disk-based I/O mechanisms often become performance bottlenecks. This paper presents anapproach to enhancing the DAGonStar scientific workflow engine by integrating CAPIO, a middleware designedto support memory-based streaming I/O. The integration combines DAGonStar's orchestration capabilities withCAPIO's efficient data handling to better support workflows operating on continuous or large-scale datasets.We describe the architectural modifications introduced to enable this collaboration and provide an analysis ofthe resulting system. The proposed solution aims to improve the responsiveness and flexibility of scientificworkflows by streamlining data transfers and simplifying task coordination. This work contributes to theevolution of workflow systems toward more efficient and scalable models for scientific computing.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


