A generalised degenerate string (GD string) S is a sequence of n sets of strings of total size N, where the i-th set contains strings of the same length k_i but this length can vary between different sets. We denote the sum of these lengths k_0, k_1,...,k_{n-1} by W. This type of uncertain sequence can represent, for example, a gapless multiple sequence alignment of width W in a compact form. Our first result in this paper is an O(N+M)-time algorithm for deciding whether the intersection of two GD strings of total sizes N and M, respectively, over an integer alphabet is non-empty. This result is based on a combinatorial result of independent interest: although the intersection of two GD strings can be exponential in the total size of the two strings, it can be represented in only linear space. We then apply our string comparison algorithm to compute palindromes in GD strings. We present an O(min{W,n^2}N)-time algorithm for computing all palindromes in S. Furthermore, we show a similar conditional lower bound for computing maximal palindromes in S. Finally, proof-of-concept experimental results are presented using real protein datasets.

Degenerate String Comparison and Applications

Roberto Grossi;Nadia Pisanti
;
Giovanna Rosone
2018-01-01

Abstract

A generalised degenerate string (GD string) S is a sequence of n sets of strings of total size N, where the i-th set contains strings of the same length k_i but this length can vary between different sets. We denote the sum of these lengths k_0, k_1,...,k_{n-1} by W. This type of uncertain sequence can represent, for example, a gapless multiple sequence alignment of width W in a compact form. Our first result in this paper is an O(N+M)-time algorithm for deciding whether the intersection of two GD strings of total sizes N and M, respectively, over an integer alphabet is non-empty. This result is based on a combinatorial result of independent interest: although the intersection of two GD strings can be exponential in the total size of the two strings, it can be represented in only linear space. We then apply our string comparison algorithm to compute palindromes in GD strings. We present an O(min{W,n^2}N)-time algorithm for computing all palindromes in S. Furthermore, we show a similar conditional lower bound for computing maximal palindromes in S. Finally, proof-of-concept experimental results are presented using real protein datasets.
2018
978-3-95977-082-8
File in questo prodotto:
File Dimensione Formato  
wabi2018-deg.pdf

accesso aperto

Tipologia: Versione finale editoriale
Licenza: Creative commons
Dimensione 551.62 kB
Formato Adobe PDF
551.62 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11568/924119
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 13
  • ???jsp.display-item.citation.isi??? ND
social impact