An elastic-degenerate string is a sequence of $n$ sets of strings of total length $N$. It has been introduced to represent a multiple alignment of several closely-related sequences (e.g.~pan-genome) compactly. In this representation, substrings of these sequences that match exactly are collapsed, while in positions where the sequences differ, all possible variants observed at that location are listed. The natural problem that arises is finding all matches of a deterministic pattern of length $m$ in an elastic-degenerate text. There exists an $cO(nm^2 + N)$-time algorithm to solve the exact version of this problem on-line after a pre-processing stage with time and space $cO(m)$. In this paper, we study the same problem under the edit distance model and present an $cO(k^2mG+kN)$-time and $cO(m)$-space algorithm, where $G$ is the total number of strings in the elastic-degenerate text and $k$ is the maximum edit distance allowed. We also present a simple $cO(kmG+kN)$-time and $cO(m)$-space algorithm for Hamming distance.
Pattern Matching on Elastic-Degenerate Text with Errors
PISANTI, NADIA;ROSONE, GIOVANNA
2017-01-01
Abstract
An elastic-degenerate string is a sequence of $n$ sets of strings of total length $N$. It has been introduced to represent a multiple alignment of several closely-related sequences (e.g.~pan-genome) compactly. In this representation, substrings of these sequences that match exactly are collapsed, while in positions where the sequences differ, all possible variants observed at that location are listed. The natural problem that arises is finding all matches of a deterministic pattern of length $m$ in an elastic-degenerate text. There exists an $cO(nm^2 + N)$-time algorithm to solve the exact version of this problem on-line after a pre-processing stage with time and space $cO(m)$. In this paper, we study the same problem under the edit distance model and present an $cO(k^2mG+kN)$-time and $cO(m)$-space algorithm, where $G$ is the total number of strings in the elastic-degenerate text and $k$ is the maximum edit distance allowed. We also present a simple $cO(kmG+kN)$-time and $cO(m)$-space algorithm for Hamming distance.File | Dimensione | Formato | |
---|---|---|---|
BBPR_SPIRE2017.pdf
accesso aperto
Descrizione: Articolo principale
Tipologia:
Documento in Post-print
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
326.85 kB
Formato
Adobe PDF
|
326.85 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.