We report on a new experimental analysis of high-order entropy-compressed suffix arrays, which retains the theoretical performance of previous work and represents an improvement in practice. Our experiments indicate that the resulting text index offers state-of-the-art compression. In particular, we require roughly 20% of the original text size---without requiring a separate instance of the text. We can additionally use a simple notion to encode and decode block-sorting transforms (such as the Burrows--Wheeler transform), achieving a compression ratio comparable to that of bzip2. We also provide a compressed representation of suffix trees (and their associated text) in a total space that is comparable to that of the text alone compressed with gzip.
|Autori:||FOSCHINI L; GROSSI R; GUPTA A; VITTER J.S|
|Titolo:||When indexing equals compression: Experiments with compressing suffix arrays and applications|
|Anno del prodotto:||2006|
|Digital Object Identifier (DOI):||10.1145/1198513.1198521|
|Appare nelle tipologie:||1.1 Articolo in rivista|