Query processing is one of the main bo.lenecks in large-scale search engines. Retrieving the top k most relevant documents for a given query can be extremely expensive, as it involves scoring large amounts of documents. Several dynamic pruning techniques have been introduced in the literature to tackle this problem, such as BlockMaxWAND, which splits the inverted index into constantsized blocks and stores the maximum document-Term scores per block; this information can be used during query execution to safely skip low-score documents, producing many-fold speedups over exhaustive methods. We introduce a re.nement for BlockMaxWANDthat uses variablesized blocks, rather than constant-sized. We set up the problem of deciding the block partitioning as an optimization problem which maximizes how accurately the block upper bounds represent the underlying scores, and describe an effcient algorithm to .nd an approximate solution, with provable approximation guarantees. .rough an extensive experimental analysis we show that our method significantly outperforms the state of the art roughly by a factor 2x. We also introduce a compressed data structure to represent the additional block information, providing a compression ratio of roughly 50%, while incurring only a small speed degradation, no more than 10% with respect to its uncompressed counterpart.

Faster blockmaxwand with variable-sized blocks

Mallia, Antonio;Ottaviano, Giuseppe;Tonellotto, Nicola;Venturini, Rossano
2017-01-01

Abstract

Query processing is one of the main bo.lenecks in large-scale search engines. Retrieving the top k most relevant documents for a given query can be extremely expensive, as it involves scoring large amounts of documents. Several dynamic pruning techniques have been introduced in the literature to tackle this problem, such as BlockMaxWAND, which splits the inverted index into constantsized blocks and stores the maximum document-Term scores per block; this information can be used during query execution to safely skip low-score documents, producing many-fold speedups over exhaustive methods. We introduce a re.nement for BlockMaxWANDthat uses variablesized blocks, rather than constant-sized. We set up the problem of deciding the block partitioning as an optimization problem which maximizes how accurately the block upper bounds represent the underlying scores, and describe an effcient algorithm to .nd an approximate solution, with provable approximation guarantees. .rough an extensive experimental analysis we show that our method significantly outperforms the state of the art roughly by a factor 2x. We also introduce a compressed data structure to represent the additional block information, providing a compression ratio of roughly 50%, while incurring only a small speed degradation, no more than 10% with respect to its uncompressed counterpart.
2017
9781450350228
File in questo prodotto:
File Dimensione Formato  
vbmw.pdf

solo utenti autorizzati

Tipologia: Documento in Post-print
Licenza: NON PUBBLICO - Accesso privato/ristretto
Dimensione 891.84 kB
Formato Adobe PDF
891.84 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
3077136.3080780.pdf

solo utenti autorizzati

Tipologia: Versione finale editoriale
Licenza: NON PUBBLICO - Accesso privato/ristretto
Dimensione 1.36 MB
Formato Adobe PDF
1.36 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11568/887216
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 47
  • ???jsp.display-item.citation.isi??? 33
social impact