Efficient Multi-vector Dense Retrieval with Bit Vectors

Nardini, F. M.; Rulli, C.; Venturini, R.

doi:10.1007/978-3-031-56060-6_1

Dense retrieval techniques employ pre-trained large language models to build a high-dimensional representation of queries and passages. These representations compute the relevance of a passage w.r.t. to a query using efficient similarity measures. In this line, multi-vector representations show improved effectiveness at the expense of a one-order-of-magnitude increase in memory footprint and query latency by encoding queries and documents on a per-token level. Recently, PLAID has tackled these problems by introducing a centroid-based term representation to reduce the memory impact of multi-vector systems. By exploiting a centroid interaction mechanism, PLAID filters out non-relevant documents, thus reducing the cost of the successive ranking stages. This paper proposes “Efficient Multi-Vector dense retrieval with Bit vectors” (EMVB), a novel framework for efficient query processing in multi-vector dense retrieval. First, EMVB employs a highly efficient pre-filtering step of passages using optimized bit vectors. Second, the computation of the centroid interaction happens column-wise, exploiting SIMD instructions, thus reducing its latency. Third, EMVB leverages Product Quantization (PQ) to reduce the memory footprint of storing vector representations while jointly allowing for fast late interaction. Fourth, we introduce a per-document term filtering method that further improves the efficiency of the last step. Experiments on MS MARCO and LoTTE show that EMVB is up to 2.8× faster while reducing the memory footprint by 1.8× with no loss in retrieval accuracy compared to PLAID.

Efficient Multi-vector Dense Retrieval with Bit Vectors

Nardini F. M.;Rulli C.;Venturini R.

2024-01-01

Abstract

Dense retrieval techniques employ pre-trained large language models to build a high-dimensional representation of queries and passages. These representations compute the relevance of a passage w.r.t. to a query using efficient similarity measures. In this line, multi-vector representations show improved effectiveness at the expense of a one-order-of-magnitude increase in memory footprint and query latency by encoding queries and documents on a per-token level. Recently, PLAID has tackled these problems by introducing a centroid-based term representation to reduce the memory impact of multi-vector systems. By exploiting a centroid interaction mechanism, PLAID filters out non-relevant documents, thus reducing the cost of the successive ranking stages. This paper proposes “Efficient Multi-Vector dense retrieval with Bit vectors” (EMVB), a novel framework for efficient query processing in multi-vector dense retrieval. First, EMVB employs a highly efficient pre-filtering step of passages using optimized bit vectors. Second, the computation of the centroid interaction happens column-wise, exploiting SIMD instructions, thus reducing its latency. Third, EMVB leverages Product Quantization (PQ) to reduce the memory footprint of storing vector representations while jointly allowing for fast late interaction. Fourth, we introduce a per-document term filtering method that further improves the efficiency of the last step. Experiments on MS MARCO and LoTTE show that EMVB is up to 2.8× faster while reducing the memory footprint by 1.8× with no loss in retrieval accuracy compared to PLAID.

Scheda breve

Scheda completa

Scheda completa (DC)

Anno

2024

Codice ISBN

9783031560590
9783031560606

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11568/1244609

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

16

10

CINECA IRIS Institutional Research Information System

Efficient Multi-vector Dense Retrieval with Bit Vectors

Nardini F. M.;Rulli C.;Venturini R.

2024-01-01

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

Attenzione

Citazioni

social impact

CINECA IRIS Institutional Research Information System

Efficient Multi-vector Dense Retrieval with Bit Vectors

Nardini F. M.;Rulli C.;Venturini R.

2024-01-01

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Informazioni

Attenzione

Citazioni

social impact

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)