A widely used class of approximate pattern matching algorithms work in two stages, the first being a filtering stage that uses spaced seeds to quickly discards regions where a match is not likely to occur. The design of effective spaced seeds is known to be a hard problem. In this setting, we propose a family of lossless spaced seeds for matching with up to two errors based on mathematical objects known as perfect rulers. We analyze these seeds with respect to the trade-off they offer between seed weight and the minimum length of the pattern to be matched. We identify a specific property of rulers, namely their skewness, which is closely related to the minimum pattern length of the derived seeds. In this context, we study in depth the specific case of Wichmann rulers and investigate the generalization of our approach to the larger class of unrestricted rulers. Although our analysis is mainly of theoretical interest, we show that for pattern lengths of practical relevance our seeds have a larger weight, hence a better filtration efficiency, than the ones known in the literature.
Spaced seed design using perfect rulers
Manzini Giovanni
2014-01-01
Abstract
A widely used class of approximate pattern matching algorithms work in two stages, the first being a filtering stage that uses spaced seeds to quickly discards regions where a match is not likely to occur. The design of effective spaced seeds is known to be a hard problem. In this setting, we propose a family of lossless spaced seeds for matching with up to two errors based on mathematical objects known as perfect rulers. We analyze these seeds with respect to the trade-off they offer between seed weight and the minimum length of the pattern to be matched. We identify a specific property of rulers, namely their skewness, which is closely related to the minimum pattern length of the derived seeds. In this context, we study in depth the specific case of Wichmann rulers and investigate the generalization of our approach to the larger class of unrestricted rulers. Although our analysis is mainly of theoretical interest, we show that for pattern lengths of practical relevance our seeds have a larger weight, hence a better filtration efficiency, than the ones known in the literature.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.