CINECA IRIS Institutional Research Information System

In this paper we study the problem of estimating the number of occurrences of substrings in textual data: A text T on some alphabet Σ=[σ] of length n is preprocessed and an index I is built. The index is used in lieu of the text to answer queries of the form Count≈(P), returning an approximated number of the occurrences of an arbitrary pattern $$P$$P as a substring of T. The problem has its main application in selectivity estimation related to the LIKE predicate in textual databases. Our focus is on obtaining an algorithmic solution with guaranteed error rates and small footprint. To achieve that, we first enrich previous work in the area of compressed text-indexing providing an optimal data structure that, for a given additive error ℓ≥1, requires Θnℓlogσ bits. We also approach the issue of guaranteeing exact answers for sufficiently frequent patterns, providing a data structure whose size scales with the amount of such patterns. Our theoretical findings are supported by experiments showing the practical impact of our data structures.

Space-Efficient Substring Occurrence Estimation

Orlandi, Alessio;VENTURINI, ROSSANO

2016-01-01

Abstract

In this paper we study the problem of estimating the number of occurrences of substrings in textual data: A text T on some alphabet Σ=[σ] of length n is preprocessed and an index I is built. The index is used in lieu of the text to answer queries of the form Count≈(P), returning an approximated number of the occurrences of an arbitrary pattern $$P$$P as a substring of T. The problem has its main application in selectivity estimation related to the LIKE predicate in textual databases. Our focus is on obtaining an algorithmic solution with guaranteed error rates and small footprint. To achieve that, we first enrich previous work in the area of compressed text-indexing providing an optimal data structure that, for a given additive error ℓ≥1, requires Θnℓlogσ bits. We also approach the issue of guaranteeing exact answers for sufficiently frequent patterns, providing a data structure whose size scales with the amount of such patterns. Our theoretical findings are supported by experiments showing the practical impact of our data structures.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
			2016
		
	Codice DOI
	
			https://dx.doi.org/10.1007/s00453-014-9936-y
		
	Tutti gli autori
	
			Orlandi, Alessio; Venturini, Rossano
		
	Appare nelle tipologie:
	
			1.1 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
Paper.pdf accesso aperto Tipologia: Documento in Post-print Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 519.9 kB Formato Adobe PDF Visualizza/Apri	519.9 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11568/800763

Citazioni

ND

4

1

social impact