In this work, we introduce BureauBERTo, the first transformer-based language model adapted to the Italian Public Administration (PA) and technical-bureaucratic domains. We further pre-trained the general-purpose Italian model UmBERTo on a corpus of PA, banking, and insurance documents, and we expanded UmBERTo’s vocabulary with domain-specific terms. We show that BureauBERTo benefitted from the adaptation by comparing it with UmBERTo in both an intrinsic and extrinsic evaluation. The intrinsic evaluation has been conducted through specific fill-mask experiments. The extrinsic one has been faced with a named entity recognition task on one of the sub-domains in BureauBERTo.

BureauBERTo: adapting UmBERTo to the Italian bureaucratic language

Serena Auriemma;Mauro Madeddu;Martina Miliani;Alessandro Bondielli;Lucia C. Passaro;Alessandro Lenci
2023-01-01

Abstract

In this work, we introduce BureauBERTo, the first transformer-based language model adapted to the Italian Public Administration (PA) and technical-bureaucratic domains. We further pre-trained the general-purpose Italian model UmBERTo on a corpus of PA, banking, and insurance documents, and we expanded UmBERTo’s vocabulary with domain-specific terms. We show that BureauBERTo benefitted from the adaptation by comparing it with UmBERTo in both an intrinsic and extrinsic evaluation. The intrinsic evaluation has been conducted through specific fill-mask experiments. The extrinsic one has been faced with a named entity recognition task on one of the sub-domains in BureauBERTo.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11568/1205508
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? ND
social impact