The Elastic Degenerate String Matching (EDSM) problem is defined as that of finding an occurrence of a pattern P of length m in an ED-text T . A D-text (Degenerate text) is a string that actually represents a set of similar and aligned strings (e.g. a pan-genome [5]) by collapsing common fragments into a standard string, and representing variants with sets of alternative substrings. When such substrings are not bound to have the same size, then we talk about elastic D-strings (ED-strings). In [6] we gave an O(nm2 + N ) time on-line algorithm for EDSM, where n is the length of T and N is its size, defined as the total number of letters. A fundamental toolkit of our algorithm is the O(m2+N ) time solution of the later called Active Prefixes problem (AP). In [2], a O(m1.5√log m+N ) solution for AP was shown, leading to a O(nm1.5√log m + N ) time solution for EDSM. The natural open problem was thus whether the 1.5 exponent could furtherly be decreased. In [3], we prove several properties that answer this and other questions: we give a conditional O(nm1.5 + N ) lower bound for EDSM, proving that a combinatorial algorithm solving EDSM in O(nm1.5−ε + N ) time would break the Boolean Matrix Multiplication (BMM) conjecture; we use this result as a hint to devise a non-combinatorial algorithm that solves EDSM in O(nm1.381 + N ) time; we do so by successfully combining Fast Fourier Transform and properties of string periodicity. In my talk I will overview the results above, as well as some interesting side results: the extension to a dictionary rather than a single pattern [7], the introduction of errors [4], and a notion of matching among D-strings with its linear time solution [1]

On-Line Pattern Matching on D-Texts (Invited Talk)

Nadia Pisanti
2021-01-01

Abstract

The Elastic Degenerate String Matching (EDSM) problem is defined as that of finding an occurrence of a pattern P of length m in an ED-text T . A D-text (Degenerate text) is a string that actually represents a set of similar and aligned strings (e.g. a pan-genome [5]) by collapsing common fragments into a standard string, and representing variants with sets of alternative substrings. When such substrings are not bound to have the same size, then we talk about elastic D-strings (ED-strings). In [6] we gave an O(nm2 + N ) time on-line algorithm for EDSM, where n is the length of T and N is its size, defined as the total number of letters. A fundamental toolkit of our algorithm is the O(m2+N ) time solution of the later called Active Prefixes problem (AP). In [2], a O(m1.5√log m+N ) solution for AP was shown, leading to a O(nm1.5√log m + N ) time solution for EDSM. The natural open problem was thus whether the 1.5 exponent could furtherly be decreased. In [3], we prove several properties that answer this and other questions: we give a conditional O(nm1.5 + N ) lower bound for EDSM, proving that a combinatorial algorithm solving EDSM in O(nm1.5−ε + N ) time would break the Boolean Matrix Multiplication (BMM) conjecture; we use this result as a hint to devise a non-combinatorial algorithm that solves EDSM in O(nm1.381 + N ) time; we do so by successfully combining Fast Fourier Transform and properties of string periodicity. In my talk I will overview the results above, as well as some interesting side results: the extension to a dictionary rather than a single pattern [7], the introduction of errors [4], and a notion of matching among D-strings with its linear time solution [1]
2021
978-3-95977-186-3
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11568/1120069
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact