This paper investigates how decoder-only instruction-tuned LLMs handle lexical ambiguity. Two distinct methodologies are employed: Eliciting rating scores from the model via prompting and analysing the cosine similarity between pairs of polysemous words in context. Ratings and embeddings are obtained by providing pairs of sentences from Haber and Poesio (2021) to the model. These ratings and cosine similarity scores are compared with each other and with the human similarity judgments in the dataset.Surprisingly, the model scores show only a moderate correlation with the subjects’ similarity judgments and no correlation with the target word embedding similarities. A vector space anisotropy inspection has also been performed, as a potential source of the experimental results. The analysis reveals that the embedding spaces of two out of the three analyzed models exhibit poor anisotropy, while the third model shows relatively moderate anisotropy compared to previous findings for models with similar architecture (Ethayarajh 2019). These findings offer new insights into the relationship between generation quality and vector representations in decoder-only LLMs.

Lost in Disambiguation: How Instruction-Tuned LLMs Master Lexical Ambiguity

Luca Capone;Serena Auriemma;Martina Miliani;Alessandro Bondielli;Alessandro Lenci
2024-01-01

Abstract

This paper investigates how decoder-only instruction-tuned LLMs handle lexical ambiguity. Two distinct methodologies are employed: Eliciting rating scores from the model via prompting and analysing the cosine similarity between pairs of polysemous words in context. Ratings and embeddings are obtained by providing pairs of sentences from Haber and Poesio (2021) to the model. These ratings and cosine similarity scores are compared with each other and with the human similarity judgments in the dataset.Surprisingly, the model scores show only a moderate correlation with the subjects’ similarity judgments and no correlation with the target word embedding similarities. A vector space anisotropy inspection has also been performed, as a potential source of the experimental results. The analysis reveals that the embedding spaces of two out of the three analyzed models exhibit poor anisotropy, while the third model shows relatively moderate anisotropy compared to previous findings for models with similar architecture (Ethayarajh 2019). These findings offer new insights into the relationship between generation quality and vector representations in decoder-only LLMs.
File in questo prodotto:
File Dimensione Formato  
2024 Capone et al - Lost in disambiguation.pdf

accesso aperto

Tipologia: Versione finale editoriale
Licenza: Creative commons
Dimensione 1.15 MB
Formato Adobe PDF
1.15 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11568/1327952
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact