Lempel-Ziv's LZ77 algorithm is the de facto choice for compressing massive datasets (see e.g., Snappy in BigTable, Lz4 in Cassandra) because its algorithmic structure is flexible enough to guarantee very fast decompression speed at reasonable compressed-space occupancy. Recent theoretical results have shown how to design a bit-optimal LZ77-compressor which minimizes the compress size and how to deploy it in order to design a bicriteria data compressor, namely an LZ77-compressor which trades compressed-space occupancy versus its decompression time in a smoothed and principled way. Preliminary experiments were promising but raised many algorithmic and engineering questions which have to be addressed in order to turn these algorithmic results into an effective and practical tool. In this paper we address these issues by first designing a novel bit-optimal LZ77-compressor which is simple, cache-aware and asymptotically optimal. We benchmark our approach by investigating several algorithmic and implementation issues over many dataset types and sizes, and against an ample class of classic (LZ-based, PPM-based and BWT-based) as well as engineered compressors (Snappy, Lz4, and Lzma2). We conclude noticing how our novel bicriteria LZ77-compressor improves the state-of-the-art of fast (de)compressors Snappy and Lz4. © 2014 Springer-Verlag Berlin Heidelberg.

Bicriteria Data Compression: Efficient and Usable

FARRUGGIA, ANDREA;FERRAGINA, PAOLO;VENTURINI, ROSSANO
2014

Abstract

Lempel-Ziv's LZ77 algorithm is the de facto choice for compressing massive datasets (see e.g., Snappy in BigTable, Lz4 in Cassandra) because its algorithmic structure is flexible enough to guarantee very fast decompression speed at reasonable compressed-space occupancy. Recent theoretical results have shown how to design a bit-optimal LZ77-compressor which minimizes the compress size and how to deploy it in order to design a bicriteria data compressor, namely an LZ77-compressor which trades compressed-space occupancy versus its decompression time in a smoothed and principled way. Preliminary experiments were promising but raised many algorithmic and engineering questions which have to be addressed in order to turn these algorithmic results into an effective and practical tool. In this paper we address these issues by first designing a novel bit-optimal LZ77-compressor which is simple, cache-aware and asymptotically optimal. We benchmark our approach by investigating several algorithmic and implementation issues over many dataset types and sizes, and against an ample class of classic (LZ-based, PPM-based and BWT-based) as well as engineered compressors (Snappy, Lz4, and Lzma2). We conclude noticing how our novel bicriteria LZ77-compressor improves the state-of-the-art of fast (de)compressors Snappy and Lz4. © 2014 Springer-Verlag Berlin Heidelberg.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11568/640064
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 6
  • ???jsp.display-item.citation.isi??? 5
social impact