Stream joins are among the most computationally demanding stateful operators in stream processing. Tuples arriving from different streams must be analyzed on-the-fly to identify pairs that satisfy specific user-defined conditions. Since buffering all tuples from the input streams is infeasible due to memory constraints, stream joins are typically computed over a subset of the received tuples. This subset is often organized either by a specific time interval ( online interval joins ) or by fixed-length temporal windows with a defined slide ( window joins ). In this paper, we present various parallel patterns for stream join computation, aimed at effectively increasing overall query throughput. Our focus is on leveraging shared-nothing parallelism to provide portable parallelization strategies that can be efficiently executed on modern scale-in and scale-out Stream Processing Engines. Among the proposed patterns, the one exhibiting hybrid parallelism emerges as the most promising in terms of performance and load balancing. The experimental evaluation highlights the performance characteristics of the proposed patterns using real-world datasets and diverse key distributions, and compares them with state-of-the-art solutions, confirming the effectiveness of the parallel pattern with hybrid parallelism against the main competitors.

Scalable join operators over data streams with shared-nothing parallelism

Mencagli G.;Griebler D.
2026-01-01

Abstract

Stream joins are among the most computationally demanding stateful operators in stream processing. Tuples arriving from different streams must be analyzed on-the-fly to identify pairs that satisfy specific user-defined conditions. Since buffering all tuples from the input streams is infeasible due to memory constraints, stream joins are typically computed over a subset of the received tuples. This subset is often organized either by a specific time interval ( online interval joins ) or by fixed-length temporal windows with a defined slide ( window joins ). In this paper, we present various parallel patterns for stream join computation, aimed at effectively increasing overall query throughput. Our focus is on leveraging shared-nothing parallelism to provide portable parallelization strategies that can be efficiently executed on modern scale-in and scale-out Stream Processing Engines. Among the proposed patterns, the one exhibiting hybrid parallelism emerges as the most promising in terms of performance and load balancing. The experimental evaluation highlights the performance characteristics of the proposed patterns using real-world datasets and diverse key distributions, and compares them with state-of-the-art solutions, confirming the effectiveness of the parallel pattern with hybrid parallelism against the main competitors.
2026
Mencagli, G.; Rymarchuk, Y.; Griebler, D.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11568/1359929
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact