CINECA IRIS Institutional Research Information System

Word co-occurrence patterns in language corpora contain a surprising amount of conceptual knowledge. Large language models (LLMs), trained to predict words in context, leverage these patterns to achieve impressive performance on diverse semantic tasks requiring world knowledge. An important but understudied question about LLMs’ semantic abilities is whether they acquire generalized knowledge of common events. Here, we test whether five pretrained LLMs (from 2018’s BERT to 2023’s MPT) assign a higher likelihood to plausible descriptions of agent−patient interactions than to minimally different implausible versions of the same event. Using three curated sets of minimal sentence pairs (total n = 1215), we found that pretrained LLMs possess substantial event knowledge, outperforming other distributional language models. In particular, they almost always assign a higher likelihood to possible versus impossible events (The teacher bought the laptop vs. The laptop bought the teacher).

Event Knowledge in Large Language Models: The Gap Between the Impossible and the Unlikely

Carina Kauf;Anna A. Ivanova;Emmanuele Chersoni;Jingyuan Selena She;Zawad Chowdhury;Evelina Fedorenko;Alessandro Lenci;Giulia Rambelli

2023-01-01

Abstract

Word co-occurrence patterns in language corpora contain a surprising amount of conceptual knowledge. Large language models (LLMs), trained to predict words in context, leverage these patterns to achieve impressive performance on diverse semantic tasks requiring world knowledge. An important but understudied question about LLMs’ semantic abilities is whether they acquire generalized knowledge of common events. Here, we test whether five pretrained LLMs (from 2018’s BERT to 2023’s MPT) assign a higher likelihood to plausible descriptions of agent−patient interactions than to minimally different implausible versions of the same event. Using three curated sets of minimal sentence pairs (total n = 1215), we found that pretrained LLMs possess substantial event knowledge, outperforming other distributional language models. In particular, they almost always assign a higher likelihood to possible versus impossible events (The teacher bought the laptop vs. The laptop bought the teacher).

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2023
			
	Codice DOI
	
				https://dx.doi.org/10.1111/cogs.13386
			
	Tutti gli autori
	
						Kauf, Carina; Ivanova, Anna A.; Chersoni, Emmanuele; Selena She, Jingyuan; Chowdhury, Zawad; Fedorenko, Evelina; Lenci, Alessandro; Rambelli, Giulia...espandi

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11568/1241207

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

40

25

social impact