In stream processing, a vast volume of data is continuously processed by standing queries that extract insights from raw inputs. These queries often maintain an internal state, representing useful information from the stream’s history, to produce results. Notable examples of state paradigms include sliding windows, where computation is periodically repeated over the most recent data (e.g., inputs received in the last ten seconds, sliding every half second). Additionally, this state is replicated per distinct key, a user-defined attribute used to partition the physical stream into logical sub-streams. The combination of numerous keys (often millions in real-world scenarios) and the window size can make the overall state of a streaming query enormous, potentially exceeding available memory. This issue is particularly critical when the processing is done on resource-constrained, low-end devices like in the Edge computing paradigm. In this paper, we focus on designing a family of persistent operators capable of transparently maintaining their internal state in an external Key-Value Store, thereby leveraging secondary memory. We present this design within the context of the WindFlow stream processing library for multi-core architectures. The paper details our design and implementation, along with an experimental evaluation based on a set of benchmarks, to assess the performance of persistent operators compared with traditional in-memory processing.

Larger-Than-Memory Stateful Stream Processing with WindFlow

Frassinelli S.;Mencagli G.
2025-01-01

Abstract

In stream processing, a vast volume of data is continuously processed by standing queries that extract insights from raw inputs. These queries often maintain an internal state, representing useful information from the stream’s history, to produce results. Notable examples of state paradigms include sliding windows, where computation is periodically repeated over the most recent data (e.g., inputs received in the last ten seconds, sliding every half second). Additionally, this state is replicated per distinct key, a user-defined attribute used to partition the physical stream into logical sub-streams. The combination of numerous keys (often millions in real-world scenarios) and the window size can make the overall state of a streaming query enormous, potentially exceeding available memory. This issue is particularly critical when the processing is done on resource-constrained, low-end devices like in the Edge computing paradigm. In this paper, we focus on designing a family of persistent operators capable of transparently maintaining their internal state in an external Key-Value Store, thereby leveraging secondary memory. We present this design within the context of the WindFlow stream processing library for multi-core architectures. The paper details our design and implementation, along with an experimental evaluation based on a set of benchmarks, to assess the performance of persistent operators compared with traditional in-memory processing.
2025
Frassinelli, S.; Mencagli, G.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11568/1305327
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact