We develop, analyze, and experiment with a new tool, called madmx, which extracts frequent motifs from biological sequences. We introduce the notion of density to single out the ‘‘significant’’ motifs. The density is a simple and flexible measure for bounding the number of don’t cares in a motif, defined as the fraction of solid (i.e., different from don’t care) characters in the motif. A maximal dense motif has density above a certain threshold, and any further specialization of a don’t care symbol in it or any extension of its boundaries decreases its number of occurrences in the input sequence. By extracting only maximal dense motifs, madmx reduces the output size and improves performance, while enhancing the quality of the discoveries. The efficiency of our approach relies on a newly defined combining operation, dubbed fusion, which allows for the construction of maximal dense motifs in a bottom-up fashion, while avoiding the generation of nonmaximal ones. We provide experimental evidence of the efficiency and the quality of the motifs returned by madmx.

MADMX: A Strategy for Maximal Dense Motif Extraction

GROSSI, ROBERTO;PISANTI, NADIA;
2011-01-01

Abstract

We develop, analyze, and experiment with a new tool, called madmx, which extracts frequent motifs from biological sequences. We introduce the notion of density to single out the ‘‘significant’’ motifs. The density is a simple and flexible measure for bounding the number of don’t cares in a motif, defined as the fraction of solid (i.e., different from don’t care) characters in the motif. A maximal dense motif has density above a certain threshold, and any further specialization of a don’t care symbol in it or any extension of its boundaries decreases its number of occurrences in the input sequence. By extracting only maximal dense motifs, madmx reduces the output size and improves performance, while enhancing the quality of the discoveries. The efficiency of our approach relies on a newly defined combining operation, dubbed fusion, which allows for the construction of maximal dense motifs in a bottom-up fashion, while avoiding the generation of nonmaximal ones. We provide experimental evidence of the efficiency and the quality of the motifs returned by madmx.
2011
Grossi, Roberto; Ppietracaprina, A; Pisanti, Nadia; Pucci, G; Upfal, E; Vandin, F.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11568/196000
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 14
  • ???jsp.display-item.citation.isi??? 12
social impact