CINECA IRIS Institutional Research Information System

Learning-to-Rank (LtR) techniques leverage machine learning algorithms and large amounts of training data to induce high-quality ranking functions. Given a set of docu- ments and a user query, these functions are able to precisely predict a score for each of the documents, in turn exploited to effectively rank them. Although the scoring efficiency of LtR models is critical in several applications – e.g., it directly impacts on response time and throughput of Web query processing – it has received relatively little attention so far. The goal of this work is to experimentally investigate the scoring efficiency of LtR models along with their ranking quality. Specifically, we show that machine-learned ranking mod- els exhibit a quality versus efficiency trade-off. For example, each family of LtR algorithms has tuning parameters that can influence both effectiveness and efficiency, where higher ranking quality is generally obtained with more complex and expensive models. Moreover, LtR algorithms that learn complex models, such as those based on forests of regression trees, are generally more expensive and more effective than other algorithms that induce simpler models like linear combination of features. We extensively analyze the quality versus efficiency trade-off of a wide spectrum of state- of-the-art LtR, and we propose a sound methodology to devise the most effective ranker given a time budget. To guarantee reproducibility, we used publicly available datasets and we contribute an open source C++ framework providing optimized, multi-threaded imple- mentations of the most effective tree-based learners: Gradient Boosted Regression Trees (GBRT), Lambda-Mart (λ-MART), and the first public-domain implementation of Oblivious Lambda-Mart (λ-MART), an algorithm that induces forests of oblivious regression trees. We investigate how the different training parameters impact on the quality versus effi- ciency trade-off, and provide a thorough comparison of several algorithms in the quality- cost space. The experiments conducted show that there is not an overall best algorithm, but the optimal choice depends on the time budget.

Quality versus efficiency in document scoring with learning-to-rank models

Capannini G.;Lucchese C.;Nardini F. M.;Orlando S.;Perego R.;Tonellotto N.

2016-01-01

Abstract

Learning-to-Rank (LtR) techniques leverage machine learning algorithms and large amounts of training data to induce high-quality ranking functions. Given a set of docu- ments and a user query, these functions are able to precisely predict a score for each of the documents, in turn exploited to effectively rank them. Although the scoring efficiency of LtR models is critical in several applications – e.g., it directly impacts on response time and throughput of Web query processing – it has received relatively little attention so far. The goal of this work is to experimentally investigate the scoring efficiency of LtR models along with their ranking quality. Specifically, we show that machine-learned ranking mod- els exhibit a quality versus efficiency trade-off. For example, each family of LtR algorithms has tuning parameters that can influence both effectiveness and efficiency, where higher ranking quality is generally obtained with more complex and expensive models. Moreover, LtR algorithms that learn complex models, such as those based on forests of regression trees, are generally more expensive and more effective than other algorithms that induce simpler models like linear combination of features. We extensively analyze the quality versus efficiency trade-off of a wide spectrum of state- of-the-art LtR, and we propose a sound methodology to devise the most effective ranker given a time budget. To guarantee reproducibility, we used publicly available datasets and we contribute an open source C++ framework providing optimized, multi-threaded imple- mentations of the most effective tree-based learners: Gradient Boosted Regression Trees (GBRT), Lambda-Mart (λ-MART), and the first public-domain implementation of Oblivious Lambda-Mart (λ-MART), an algorithm that induces forests of oblivious regression trees. We investigate how the different training parameters impact on the quality versus effi- ciency trade-off, and provide a thorough comparison of several algorithms in the quality- cost space. The experiments conducted show that there is not an overall best algorithm, but the optimal choice depends on the time budget.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2016
			
	Codice DOI
	
				https://dx.doi.org/10.1016/j.ipm.2016.05.004
			
	Tutti gli autori
	
						Capannini, G.; Lucchese, C.; Nardini, F. M.; Orlando, S.; Perego, R.; Tonellotto, N.
					
	Appare nelle tipologie:
	
				1.1 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
ltr-performance-ipm.pdf accesso aperto Tipologia: Documento in Post-print Licenza: Creative commons Dimensione 861.18 kB Formato Adobe PDF Visualizza/Apri	861.18 kB	Adobe PDF	Visualizza/Apri
1-s2.0-S0306457316301248-main.pdf solo utenti autorizzati Tipologia: Versione finale editoriale Licenza: NON PUBBLICO - Accesso privato/ristretto Dimensione 1.13 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	1.13 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11568/1014112

Citazioni

ND

56

35

social impact