Data processing pipelines normally use lockless Single-Producer–Single-Consumer (SPSC) queues to efficiently decouple their processing threads and achieve high throughput, minimizing the cost of synchronization. SPSC queues have been widely studied, mostly for applications such as streaming data or network monitoring, where the main goal is maximizing throughput. There are now many applications, such as virtual-machine–virtual-machine communication, software-defined networking, and message-based kernels, where low latency is also important, and the tradeoffs between high-throughput and low-latency algorithms have not been studied equally well. Furthermore, at high or variable transaction rates, the effect of memory hierarchies and cache coherence subsystems may be dominant and yield surprising results. In this paper, we make two contributions. First, we provide a comprehensive study of the two main families of SPSC queues, namely, “Lamport” and “FastForward” queues, with a detailed analytical and experimental characterization of their behavior in terms of operating regimes, throughput, latency, and cache misses. Second, we propose two new queue variants, namely, improved FastForward and batched improved FastForward, which have better worst-case behavior than other variants in terms of cache misses, which is an important feature for a number of applications. Together, these two contributions provide practical guidelines to choose the best solution depending on the application requirements.

Cache-aware design of general-purpose Single-Producer–Single-Consumer queues

Maffione, Vincenzo
;
Lettieri, Giuseppe;Rizzo, Luigi
2019-01-01

Abstract

Data processing pipelines normally use lockless Single-Producer–Single-Consumer (SPSC) queues to efficiently decouple their processing threads and achieve high throughput, minimizing the cost of synchronization. SPSC queues have been widely studied, mostly for applications such as streaming data or network monitoring, where the main goal is maximizing throughput. There are now many applications, such as virtual-machine–virtual-machine communication, software-defined networking, and message-based kernels, where low latency is also important, and the tradeoffs between high-throughput and low-latency algorithms have not been studied equally well. Furthermore, at high or variable transaction rates, the effect of memory hierarchies and cache coherence subsystems may be dominant and yield surprising results. In this paper, we make two contributions. First, we provide a comprehensive study of the two main families of SPSC queues, namely, “Lamport” and “FastForward” queues, with a detailed analytical and experimental characterization of their behavior in terms of operating regimes, throughput, latency, and cache misses. Second, we propose two new queue variants, namely, improved FastForward and batched improved FastForward, which have better worst-case behavior than other variants in terms of cache misses, which is an important feature for a number of applications. Together, these two contributions provide practical guidelines to choose the best solution depending on the application requirements.
2019
Maffione, Vincenzo; Lettieri, Giuseppe; Rizzo, Luigi
File in questo prodotto:
File Dimensione Formato  
spe.2675.pdf

solo utenti autorizzati

Tipologia: Versione finale editoriale
Licenza: NON PUBBLICO - Accesso privato/ristretto
Dimensione 2.13 MB
Formato Adobe PDF
2.13 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
master.pdf

accesso aperto

Tipologia: Documento in Post-print
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 506.18 kB
Formato Adobe PDF
506.18 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11568/942147
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 6
  • ???jsp.display-item.citation.isi??? 5
social impact