Applications characterized by the continuous processing of large data streams have recently attracted the attention of the scientific community and industrial stakeholders. The need of high-level programming tools has led to the design of Data Stream Processing Systems (DSPSs) able to ease the development of streaming applications in distributed computing environments. Several systems of this kind have been released and currently maintained as open source projects, like Apache Storm and Spark Streaming. Although some benchmark applications are often used by the scientific community to test and evaluate new techniques to improve the performance and usability of DSPSs, the available benchmark suites are still lacking of representative workloads coming from the different areas of interest in the stream processing domain. The goal of this paper is to present a new benchmark suite composed of 15 applications coming from areas like Finance, Telecommunication, Sensor Networks, Social Networks and others. The paper describes in detail the nature of these applications, their full workload characterization in terms of selectivity, processing cost, input size and overall memory occupation, and provides a first assessment of the usefulness of our benchmark suite to compare real DSPSs by selecting Apache Storm and Spark Streaming for this analysis.

DSPBench: a Suite of Benchmark Applications for Distributed Data Stream Processing Systems

Griebler D.;Mencagli G.;
2020-01-01

Abstract

Applications characterized by the continuous processing of large data streams have recently attracted the attention of the scientific community and industrial stakeholders. The need of high-level programming tools has led to the design of Data Stream Processing Systems (DSPSs) able to ease the development of streaming applications in distributed computing environments. Several systems of this kind have been released and currently maintained as open source projects, like Apache Storm and Spark Streaming. Although some benchmark applications are often used by the scientific community to test and evaluate new techniques to improve the performance and usability of DSPSs, the available benchmark suites are still lacking of representative workloads coming from the different areas of interest in the stream processing domain. The goal of this paper is to present a new benchmark suite composed of 15 applications coming from areas like Finance, Telecommunication, Sensor Networks, Social Networks and others. The paper describes in detail the nature of these applications, their full workload characterization in terms of selectivity, processing cost, input size and overall memory occupation, and provides a first assessment of the usefulness of our benchmark suite to compare real DSPSs by selecting Apache Storm and Spark Streaming for this analysis.
2020
Bordin, M. V.; Griebler, D.; Mencagli, G.; Geyer, C. F. R.; Fernandes, L. G.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11568/1066388
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 31
  • ???jsp.display-item.citation.isi??? 19
social impact