Searching for similar objects in a collection is a core task of many applications in databases, pattern recognition, and information retrieval. As there exist similarity-preserving hash functions like SimHash, indexing these objects reduces to the solution of the Approximate Dictionary Queries problem. In this problem we have to index a collection of fixed-sized keys to efficiently retrieve all the keys which are at a Hamming distance at most κ from a query key. In this paper we propose new solutions for the approximate dictionary queries problem. These solutions combine the use of succinct data structures with an efficient representation of the keys to significantly reduce the space usage of the state-of-the-art solutions without introducing any time penalty. Finally, by exploiting triangle inequality, we can also significantly speed up the query time of the existing solutions.

Fast and compact hamming distance index

VENTURINI, ROSSANO
2016-01-01

Abstract

Searching for similar objects in a collection is a core task of many applications in databases, pattern recognition, and information retrieval. As there exist similarity-preserving hash functions like SimHash, indexing these objects reduces to the solution of the Approximate Dictionary Queries problem. In this problem we have to index a collection of fixed-sized keys to efficiently retrieve all the keys which are at a Hamming distance at most κ from a query key. In this paper we propose new solutions for the approximate dictionary queries problem. These solutions combine the use of succinct data structures with an efficient representation of the keys to significantly reduce the space usage of the state-of-the-art solutions without introducing any time penalty. Finally, by exploiting triangle inequality, we can also significantly speed up the query time of the existing solutions.
2016
978-1-4503-4290-2
978-1-4503-4069-4
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11568/800751
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 14
  • ???jsp.display-item.citation.isi??? ND
social impact