Stream processing is a computing paradigm that enables the efficient extraction of insights and analytics from continuous data streams. These applications typically store large states, often represented as sliding temporal windows that are replicated for different keys, i.e., grouping stream items with the same key attribute(s). Larger-than-memory scenarios are common in real-world applications, especially on low-/mid-range servers or edge resources with limited memory, making efficient interaction with secondary storage crucial. This paper proposes strategies to represent the large state of stream processing applications on Key-Value Stores (KVSs). These strategies distinguish between fragment-based and window-centric layouts and manage archives by partitioning state data structures by keys or not. We highlight potential state caching optimizations to further improve throughput. We implemented all proposed strategies on WindFlow, a parallel library for stream processing on multicores. The experimental analysis highlights trade-offs between performance, memory footprint, and secondary storage requirements, as well as the effectiveness of optimizations in balancing performance and memory consumption. We also compare our strategies with Apache Flink and its support for external state backends, highlighting the better efficiency and effectiveness of our proposed approach.
Enabling large-state stream processing on memory-constrained multi-core systems via key-value stores
Mencagli G.
;Griebler D.
2026-01-01
Abstract
Stream processing is a computing paradigm that enables the efficient extraction of insights and analytics from continuous data streams. These applications typically store large states, often represented as sliding temporal windows that are replicated for different keys, i.e., grouping stream items with the same key attribute(s). Larger-than-memory scenarios are common in real-world applications, especially on low-/mid-range servers or edge resources with limited memory, making efficient interaction with secondary storage crucial. This paper proposes strategies to represent the large state of stream processing applications on Key-Value Stores (KVSs). These strategies distinguish between fragment-based and window-centric layouts and manage archives by partitioning state data structures by keys or not. We highlight potential state caching optimizations to further improve throughput. We implemented all proposed strategies on WindFlow, a parallel library for stream processing on multicores. The experimental analysis highlights trade-offs between performance, memory footprint, and secondary storage requirements, as well as the effectiveness of optimizations in balancing performance and memory consumption. We also compare our strategies with Apache Flink and its support for external state backends, highlighting the better efficiency and effectiveness of our proposed approach.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


