CINECA IRIS Institutional Research Information System

In this paper, we claim that Vector Cosine – which is generally considered one of the mo st efficient unsupervised measures for identifying word similarity in Vector Space Models – can be outperformed by a completely unsupervised measure that evaluates the extent of the intersection among the most associated contexts of two target words , weighting such intersection according to the rank of the shared contexts in the dependency ranked lists . This claim comes from the hypothesis that similar words do not simply occur in similar contexts , but they share a larger portion of their most releva nt contexts compared to other related words. To prove it, we describe and evaluate APSyn , a variant of Average Precision that – independently of the adopted parameters – outperforms the Vector Cosine and the co - occurrence on the ESL and TOEFL test sets . In the best setting , APSyn reaches 0.73 accuracy on the ESL dataset and 0.70 accuracy in the TOEFL dataset , beating therefore the non - English US college applicants (whose average, as reported in the literature, is 64.50%) and several state - of - the - art approac hes .

What a Nerd! Beating Students and Vector Cosine in the ESL and TOEFL Datasets

Santus, Enrico^Primo;LENCI, ALESSANDRO^Co-primo;Chiu, Tin Shing^Co-primo;Lu, Qin^Co-primo;Huang, Chu Ren^Co-primo

2016-01-01

Abstract

In this paper, we claim that Vector Cosine – which is generally considered one of the mo st efficient unsupervised measures for identifying word similarity in Vector Space Models – can be outperformed by a completely unsupervised measure that evaluates the extent of the intersection among the most associated contexts of two target words , weighting such intersection according to the rank of the shared contexts in the dependency ranked lists . This claim comes from the hypothesis that similar words do not simply occur in similar contexts , but they share a larger portion of their most releva nt contexts compared to other related words. To prove it, we describe and evaluate APSyn , a variant of Average Precision that – independently of the adopted parameters – outperforms the Vector Cosine and the co - occurrence on the ESL and TOEFL test sets . In the best setting , APSyn reaches 0.73 accuracy on the ESL dataset and 0.70 accuracy in the TOEFL dataset , beating therefore the non - English US college applicants (whose average, as reported in the literature, is 64.50%) and several state - of - the - art approac hes .

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2016
			
	Codice ISBN
	
				978-2-9517408-9-1
			
	Appare nelle tipologie:
	
				4.1 Contributo in Atti di convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11568/843174

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

7

1

social impact