CINECA IRIS Institutional Research Information System

Encoding data structures store enough information to answer the queries they are meant to support but not enough to recover their underlying datasets. In this paper we give the first encoding data structure for the challenging problem of order-preserving pattern matching. This problem was introduced only a few years ago but has already attracted significant attention because of its applications in data analysis. Two strings are said to be an order-preserving match if the relative order of their characters is the same: E.g., 4, 1, 3, 2 and 10, 3, 7, 5 are an order preserving match. We show how, given a string S[1..n] over an arbitrary alphabet of size Ï and a constant c â¥ 1, we can build an O(n log log n)-bit encoding such that later, given a pattern P[1..m] with m â¤ logcn, we can return the number of order-preserving occurrences of P in S in O(m) time. Within the same time bound we can also return the starting position of some order preserving match for P in S (if such a match exists). We prove that our space bound is within a constant factor of optimal if logÏ=Ï (log log n); our query time is optimal if logÏ= (log n). Our space bound contrasts with the Ï (n log n) bits needed in the worst case to store S itself, an index for order-preserving pattern matching with no restrictions on the pattern length, or an index for standard pattern matching even with restrictions on the pattern length. Moreover, we can build our encoding knowing only how each character compares to O(logcn) neighbouring characters.

An encoding for order-preserving matching

Gagie, Travis;Manzini, Giovanni;Venturini, Rossano

2017-01-01

Abstract

Encoding data structures store enough information to answer the queries they are meant to support but not enough to recover their underlying datasets. In this paper we give the first encoding data structure for the challenging problem of order-preserving pattern matching. This problem was introduced only a few years ago but has already attracted significant attention because of its applications in data analysis. Two strings are said to be an order-preserving match if the relative order of their characters is the same: E.g., 4, 1, 3, 2 and 10, 3, 7, 5 are an order preserving match. We show how, given a string S[1..n] over an arbitrary alphabet of size Ï and a constant c â¥ 1, we can build an O(n log log n)-bit encoding such that later, given a pattern P[1..m] with m â¤ logcn, we can return the number of order-preserving occurrences of P in S in O(m) time. Within the same time bound we can also return the starting position of some order preserving match for P in S (if such a match exists). We prove that our space bound is within a constant factor of optimal if logÏ=Ï (log log n); our query time is optimal if logÏ= (log n). Our space bound contrasts with the Ï (n log n) bits needed in the worst case to store S itself, an index for order-preserving pattern matching with no restrictions on the pattern length, or an index for standard pattern matching even with restrictions on the pattern length. Moreover, we can build our encoding knowing only how each character compares to O(logcn) neighbouring characters.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2017
			
	Codice ISBN
	
				9783959770491
			
	Appare nelle tipologie:
	
				4.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
op-encoding-rev.pdf accesso aperto Tipologia: Documento in Post-print Licenza: Creative commons Dimensione 419.56 kB Formato Adobe PDF Visualizza/Apri	419.56 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11568/887207

Citazioni

ND

8

ND

social impact