CINECA IRIS Institutional Research Information System

Word embeddings are vectorial semantic representations built with either counting or predicting techniques aimed at capturing shades of meaning from word co-occurrences. Since their intro- duction, these representations have been criticized for lacking interpretable dimensions. This property of word embeddings limits our understanding of the semantic features they actually encode. Moreover, it contributes to the “black box” nature of the tasks in which they are used, since the reasons for word embedding performance often remain opaque to humans. In this contribution, we explore the semantic properties encoded in word embeddings by mapping them onto interpretable vectors, consisting of explicit and neurobiologically motivated semantic features (Binder et al. 2016). Our exploration takes into account different types of embeddings including factorized count vectors and predict models (Skip-Gram, GloVe, etc.), as well as the most recent contextualized representations (i.e., ELMo and BERT).

Decoding Word Embeddings with Brain-Based Semantic Features

Enrico Santus^Software;Chu-Ren Huang^Supervision;Alessandro Lenci^{Ultimo

Conceptualization}

2021-01-01

Abstract

Word embeddings are vectorial semantic representations built with either counting or predicting techniques aimed at capturing shades of meaning from word co-occurrences. Since their intro- duction, these representations have been criticized for lacking interpretable dimensions. This property of word embeddings limits our understanding of the semantic features they actually encode. Moreover, it contributes to the “black box” nature of the tasks in which they are used, since the reasons for word embedding performance often remain opaque to humans. In this contribution, we explore the semantic properties encoded in word embeddings by mapping them onto interpretable vectors, consisting of explicit and neurobiologically motivated semantic features (Binder et al. 2016). Our exploration takes into account different types of embeddings including factorized count vectors and predict models (Skip-Gram, GloVe, etc.), as well as the most recent contextualized representations (i.e., ELMo and BERT).

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2021
			
	Codice DOI
	
				https://dx.doi.org/10.1162/COLI_a_00412
			
	Tutti gli autori
	
						Chersoni, Emmanuele; Santus, Enrico; Huang, Chu-Ren; Lenci, Alessandro
					
	Appare nelle tipologie:
	
				1.1 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
Chersoni_etal_CL_2021.pdf accesso aperto Descrizione: Articolo principale Tipologia: Versione finale editoriale Licenza: Creative commons Dimensione 854.3 kB Formato Adobe PDF Visualizza/Apri	854.3 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11568/1134752

Citazioni

ND

31

20

social impact