In recent years, Transformer-based models have been widely used in NLP for various downstream tasks and in different domains. However, a language model explicitly built for the Italian administrative language is still lacking. Therefore, in this paper, we decided to compare the performance of five different Transformer models, pre-trained on general purpose texts, on two main tasks in the Italian administrative domain: Name Entity Recognition and multi-label document classification on Public Administration (PA) documents. We evaluate the performance of each model on both tasks to identify the best model in this particular domain. We also discuss the effect of model size and pre-training data on the performances on domain data. Our evaluation identifies UmBERTo as the best-performing model, with an accuracy of 0.71, an F1 score of 0.89 for multi-label document classification, and an F1 score of 0.87 for NER-PA.

Evaluating Pre-Trained Transformers on Italian Administrative Texts

Auriemma S.;Bondielli A.;Passaro L. C.;Lenci A.
2022-01-01

Abstract

In recent years, Transformer-based models have been widely used in NLP for various downstream tasks and in different domains. However, a language model explicitly built for the Italian administrative language is still lacking. Therefore, in this paper, we decided to compare the performance of five different Transformer models, pre-trained on general purpose texts, on two main tasks in the Italian administrative domain: Name Entity Recognition and multi-label document classification on Public Administration (PA) documents. We evaluate the performance of each model on both tasks to identify the best model in this particular domain. We also discuss the effect of model size and pre-training data on the performances on domain data. Our evaluation identifies UmBERTo as the best-performing model, with an accuracy of 0.71, an F1 score of 0.89 for multi-label document classification, and an F1 score of 0.87 for NER-PA.
File in questo prodotto:
File Dimensione Formato  
paper4-AixPA.pdf

accesso aperto

Descrizione: Natural Language Processing, Evaluation of Neural Language Models, Domain Language, Public Administration
Tipologia: Versione finale editoriale
Licenza: Creative commons
Dimensione 1.09 MB
Formato Adobe PDF
1.09 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11568/1160772
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? ND
social impact