In this paper, we claim that Vector Cosine – which is generally considered one of the mo st efficient unsupervised measures for identifying word similarity in Vector Space Models – can be outperformed by a completely unsupervised measure that evaluates the extent of the intersection among the most associated contexts of two target words , weighting such intersection according to the rank of the shared contexts in the dependency ranked lists . This claim comes from the hypothesis that similar words do not simply occur in similar contexts , but they share a larger portion of their most releva nt contexts compared to other related words. To prove it, we describe and evaluate APSyn , a variant of Average Precision that – independently of the adopted parameters – outperforms the Vector Cosine and the co - occurrence on the ESL and TOEFL test sets . In the best setting , APSyn reaches 0.73 accuracy on the ESL dataset and 0.70 accuracy in the TOEFL dataset , beating therefore the non - English US college applicants (whose average, as reported in the literature, is 64.50%) and several state - of - the - art approac hes .

What a Nerd! Beating Students and Vector Cosine in the ESL and TOEFL Datasets

LENCI, ALESSANDRO
Co-primo
;
2016-01-01

Abstract

In this paper, we claim that Vector Cosine – which is generally considered one of the mo st efficient unsupervised measures for identifying word similarity in Vector Space Models – can be outperformed by a completely unsupervised measure that evaluates the extent of the intersection among the most associated contexts of two target words , weighting such intersection according to the rank of the shared contexts in the dependency ranked lists . This claim comes from the hypothesis that similar words do not simply occur in similar contexts , but they share a larger portion of their most releva nt contexts compared to other related words. To prove it, we describe and evaluate APSyn , a variant of Average Precision that – independently of the adopted parameters – outperforms the Vector Cosine and the co - occurrence on the ESL and TOEFL test sets . In the best setting , APSyn reaches 0.73 accuracy on the ESL dataset and 0.70 accuracy in the TOEFL dataset , beating therefore the non - English US college applicants (whose average, as reported in the literature, is 64.50%) and several state - of - the - art approac hes .
2016
978-2-9517408-9-1
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11568/843174
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact