We present a model for the Strict-Small track of the BabyLM Challenge 2024 (Choshen et al. 2024). We introduce a Curriculum Learning approach for training a specialized version of GPT-2 (Radford et al. 2019), that we name ConcreteGPT. We utilize the norms from (Brysbaert et al. 2014) which provide concreteness ratings for 40,000 English lexical items based on human subjects. Using these norms, we assign a concreteness score to each sentence in the training dataset and develop two curriculum strategies that progressively introduce more complex and abstract language patterns in the training data. Compared to the baselines, our best model shows lower performance on zero-shot tasks but demonstrates superior performance in fine-tuning tasks. Notably, our curriculum-trained models exhibit significant improvements over a non-curriculum based training of the same model.

ConcreteGPT: A Baby GPT-2 Based on Lexical Concreteness and Curriculum Learning

Capone L.
Primo
;
Bondielli A.;Lenci A.
2024-01-01

Abstract

We present a model for the Strict-Small track of the BabyLM Challenge 2024 (Choshen et al. 2024). We introduce a Curriculum Learning approach for training a specialized version of GPT-2 (Radford et al. 2019), that we name ConcreteGPT. We utilize the norms from (Brysbaert et al. 2014) which provide concreteness ratings for 40,000 English lexical items based on human subjects. Using these norms, we assign a concreteness score to each sentence in the training dataset and develop two curriculum strategies that progressively introduce more complex and abstract language patterns in the training data. Compared to the baselines, our best model shows lower performance on zero-shot tasks but demonstrates superior performance in fine-tuning tasks. Notably, our curriculum-trained models exhibit significant improvements over a non-curriculum based training of the same model.
File in questo prodotto:
File Dimensione Formato  
2024 Capone et al - ConcreteGPT.pdf

accesso aperto

Tipologia: Versione finale editoriale
Licenza: Creative commons
Dimensione 322.24 kB
Formato Adobe PDF
322.24 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11568/1327944
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact