Stream processing is a computing paradigm that enables the efficient extraction of insights and analytics from continuous data streams. These applications typically store large states, often represented as sliding temporal windows that are replicated for different keys, i.e., grouping stream items with the same key attribute(s). Larger-than-memory scenarios are common in real-world applications, especially on low-/mid-range servers or edge resources with limited memory, making efficient interaction with secondary storage crucial. This paper proposes strategies to represent the large state of stream processing applications on Key-Value Stores (KVSs). These strategies distinguish between fragment-based and window-centric layouts and manage archives by partitioning state data structures by keys or not. We highlight potential state caching optimizations to further improve throughput. We implemented all proposed strategies on WindFlow, a parallel library for stream processing on multicores. The experimental analysis highlights trade-offs between performance, memory footprint, and secondary storage requirements, as well as the effectiveness of optimizations in balancing performance and memory consumption. We also compare our strategies with Apache Flink and its support for external state backends, highlighting the better efficiency and effectiveness of our proposed approach.

Enabling large-state stream processing on memory-constrained multi-core systems via key-value stores

Mencagli G.
;
Griebler D.
2026-01-01

Abstract

Stream processing is a computing paradigm that enables the efficient extraction of insights and analytics from continuous data streams. These applications typically store large states, often represented as sliding temporal windows that are replicated for different keys, i.e., grouping stream items with the same key attribute(s). Larger-than-memory scenarios are common in real-world applications, especially on low-/mid-range servers or edge resources with limited memory, making efficient interaction with secondary storage crucial. This paper proposes strategies to represent the large state of stream processing applications on Key-Value Stores (KVSs). These strategies distinguish between fragment-based and window-centric layouts and manage archives by partitioning state data structures by keys or not. We highlight potential state caching optimizations to further improve throughput. We implemented all proposed strategies on WindFlow, a parallel library for stream processing on multicores. The experimental analysis highlights trade-offs between performance, memory footprint, and secondary storage requirements, as well as the effectiveness of optimizations in balancing performance and memory consumption. We also compare our strategies with Apache Flink and its support for external state backends, highlighting the better efficiency and effectiveness of our proposed approach.
2026
Filippi, A.; Mencagli, G.; Griebler, D.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11568/1354567
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact