We show how to build several data structures of central importance to string processing by taking as input the Burrows-Wheeler transform (BWT) and using small extra working space. Let $n$ be the text length and $sigma$ be the alphabet size. We first provide two algorithms that enumerate all LCP values and suffix tree intervals in $O(nlogsigma)$ time using just $o(nlogsigma)$ bits of working space on top of the input re-writable BWT. Using these algorithms as building blocks, for any parameter $0 < epsilon leq 1$ we show how to build the PLCP bitvector and the balanced parentheses representation of the suffix tree topology in $Oleft(n(logsigma + epsilon^{-1}cdot loglog n) ight)$ time using at most $nlogsigma cdot(epsilon + o(1))$ bits of working space on top of the input re-writable BWT and the output. For example, we can build a compressed suffix tree from the BWT using just succinct working space (i.e. $o(nlogsigma)$ bits) and $Thetaleft(nlogsigma + n(loglog n)^{1+delta} ight)$ time, for any constant $delta>0$. This improves the previous most space-efficient algorithms, which worked in $O(n)$ bits and $O(nlog n)$ time. We also consider the problem of merging BWTs of string collections, and provide a solution running in $O(nlogsigma)$ time and using just $o(nlogsigma)$ bits of working space. An efficient implementation of our LCP construction and BWT merge algorithms uses (in RAM) as few as $n$ bits on top of a packed representation of the input/output and process data as fast as $2.92$ megabases per second.

Space-efficient construction of compressed suffix trees

Rosone, Giovanna
2021-01-01

Abstract

We show how to build several data structures of central importance to string processing by taking as input the Burrows-Wheeler transform (BWT) and using small extra working space. Let $n$ be the text length and $sigma$ be the alphabet size. We first provide two algorithms that enumerate all LCP values and suffix tree intervals in $O(nlogsigma)$ time using just $o(nlogsigma)$ bits of working space on top of the input re-writable BWT. Using these algorithms as building blocks, for any parameter $0 < epsilon leq 1$ we show how to build the PLCP bitvector and the balanced parentheses representation of the suffix tree topology in $Oleft(n(logsigma + epsilon^{-1}cdot loglog n) ight)$ time using at most $nlogsigma cdot(epsilon + o(1))$ bits of working space on top of the input re-writable BWT and the output. For example, we can build a compressed suffix tree from the BWT using just succinct working space (i.e. $o(nlogsigma)$ bits) and $Thetaleft(nlogsigma + n(loglog n)^{1+delta} ight)$ time, for any constant $delta>0$. This improves the previous most space-efficient algorithms, which worked in $O(n)$ bits and $O(nlog n)$ time. We also consider the problem of merging BWTs of string collections, and provide a solution running in $O(nlogsigma)$ time and using just $o(nlogsigma)$ bits of working space. An efficient implementation of our LCP construction and BWT merge algorithms uses (in RAM) as few as $n$ bits on top of a packed representation of the input/output and process data as fast as $2.92$ megabases per second.
2021
Prezza, Nicola; Rosone, Giovanna
File in questo prodotto:
File Dimensione Formato  
PR_TCS2021.pdf

Open Access dal 09/01/2023

Descrizione: Articolo
Tipologia: Documento in Post-print
Licenza: Creative commons
Dimensione 439.58 kB
Formato Adobe PDF
439.58 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11568/1062324
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 5
  • ???jsp.display-item.citation.isi??? 2
social impact