Applications characterized by the continuous processing of large data streams have recently attracted the attention of the scientific community and industrial stakeholders. The need of high-level programming tools has led to the design of Data Stream Processing Systems (DSPSs) able to ease the development of streaming applications in distributed computing environments. Several systems of this kind have been released and currently maintained as open source projects, like Apache Storm and Spark Streaming. Although some benchmark applications are often used by the scientific community to test and evaluate new techniques to improve the performance and usability of DSPSs, the available benchmark suites are still lacking of representative workloads coming from the different areas of interest in the stream processing domain. The goal of this paper is to present a new benchmark suite composed of 15 applications coming from areas like Finance, Telecommunication, Sensor Networks, Social Networks and others. The paper describes in detail the nature of these applications, their full workload characterization in terms of selectivity, processing cost, input size and overall memory occupation, and provides a first assessment of the usefulness of our benchmark suite to compare real DSPSs by selecting Apache Storm and Spark Streaming for this analysis.
DSPBench: a Suite of Benchmark Applications for Distributed Data Stream Processing Systems
Griebler D.;Mencagli G.;
2020-01-01
Abstract
Applications characterized by the continuous processing of large data streams have recently attracted the attention of the scientific community and industrial stakeholders. The need of high-level programming tools has led to the design of Data Stream Processing Systems (DSPSs) able to ease the development of streaming applications in distributed computing environments. Several systems of this kind have been released and currently maintained as open source projects, like Apache Storm and Spark Streaming. Although some benchmark applications are often used by the scientific community to test and evaluate new techniques to improve the performance and usability of DSPSs, the available benchmark suites are still lacking of representative workloads coming from the different areas of interest in the stream processing domain. The goal of this paper is to present a new benchmark suite composed of 15 applications coming from areas like Finance, Telecommunication, Sensor Networks, Social Networks and others. The paper describes in detail the nature of these applications, their full workload characterization in terms of selectivity, processing cost, input size and overall memory occupation, and provides a first assessment of the usefulness of our benchmark suite to compare real DSPSs by selecting Apache Storm and Spark Streaming for this analysis.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.