In this paper we are interested in the study of the combinatorial aspects connecting three important constructions in the field of string algorithms: the suffix array, the Burrows-Wheeler transform (BWT) and the extended Burrows-Wheeler transform (EBWT). Such constructions involve the notions of suffixes and conjugates of words and are based on two different order relations, denoted by <_lex and ≺_ω , that, even if strictly connected, are quite different from the computational point of view. In this study an important role is played by Lyndon words. In particular, we improve the upper bound on the number of symbol comparisons needed to establish the ≺  ω order between two primitive words by using a preliminary knowledge of the <_lex order of the corresponding Lyndon conjugates. Moreover, we propose an algorithm that efficiently sorts, according to the ≺_ω order, the list of conjugates of a multiset of Lyndon words. Finally, we show that the Lyndon factorization of a word helps the construction of its suffix array, allowing a reduction of the number of symbol comparisons needed to lexicographically sort the suffixes of the word.

Suffixes, Conjugates and Lyndon Words

ROSONE, GIOVANNA;
2013-01-01

Abstract

In this paper we are interested in the study of the combinatorial aspects connecting three important constructions in the field of string algorithms: the suffix array, the Burrows-Wheeler transform (BWT) and the extended Burrows-Wheeler transform (EBWT). Such constructions involve the notions of suffixes and conjugates of words and are based on two different order relations, denoted by <_lex and ≺_ω , that, even if strictly connected, are quite different from the computational point of view. In this study an important role is played by Lyndon words. In particular, we improve the upper bound on the number of symbol comparisons needed to establish the ≺  ω order between two primitive words by using a preliminary knowledge of the <_lex order of the corresponding Lyndon conjugates. Moreover, we propose an algorithm that efficiently sorts, according to the ≺_ω order, the list of conjugates of a multiset of Lyndon words. Finally, we show that the Lyndon factorization of a word helps the construction of its suffix array, allowing a reduction of the number of symbol comparisons needed to lexicographically sort the suffixes of the word.
2013
9783642387708
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11568/728489
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 10
  • ???jsp.display-item.citation.isi??? ND
social impact