With the proliferation of data across diverse computing domains, the ability to process raw information and extract actionable insights has become increasingly vital for both enterprises and public institutions. To address this demand, developers have created applications that can consume and analyze large-scale data streams. Distributed Stream Processing Systems (DSPSs) have emerged as a solution, offering programming models that simplify development and runtime configuration by abstracting complexities such as scheduling, fault tolerance, query processing, and system optimizations. Benchmark suites play a critical role in evaluating the capabilities of these frameworks by enabling fair and comprehensive performance comparisons. One such suite, DSPBench, comprises 15 applications originally implemented using Apache Storm. In this work, we extend DSPBench by porting all available applications to Apache Flink. A distinguishing feature of Flink is its support for both streaming and batch execution modes through a unified API. Beyond the extension of the benchmark suite, our contribution is a detailed analysis of the implications of using these two execution modes, along with a benchmarking study across a wide range of real-world streaming workloads to evaluate their respective strengths and limitations. The results demonstrate that Flink generally achieves higher throughput than Storm for most applications. Furthermore, Flink’s streaming mode exhibits superior throughput while the batch mode exhibits reduced resource consumption.
Benchmarking batch and stream processing execution modes in Apache Flink
Mencagli, Gabriele;Griebler, Dalvan
2026-01-01
Abstract
With the proliferation of data across diverse computing domains, the ability to process raw information and extract actionable insights has become increasingly vital for both enterprises and public institutions. To address this demand, developers have created applications that can consume and analyze large-scale data streams. Distributed Stream Processing Systems (DSPSs) have emerged as a solution, offering programming models that simplify development and runtime configuration by abstracting complexities such as scheduling, fault tolerance, query processing, and system optimizations. Benchmark suites play a critical role in evaluating the capabilities of these frameworks by enabling fair and comprehensive performance comparisons. One such suite, DSPBench, comprises 15 applications originally implemented using Apache Storm. In this work, we extend DSPBench by porting all available applications to Apache Flink. A distinguishing feature of Flink is its support for both streaming and batch execution modes through a unified API. Beyond the extension of the benchmark suite, our contribution is a detailed analysis of the implications of using these two execution modes, along with a benchmarking study across a wide range of real-world streaming workloads to evaluate their respective strengths and limitations. The results demonstrate that Flink generally achieves higher throughput than Storm for most applications. Furthermore, Flink’s streaming mode exhibits superior throughput while the batch mode exhibits reduced resource consumption.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


