In recent years, Transformer-based models have been widely used in NLP for various downstream tasks and in different domains. However, a language model explicitly built for the Italian administrative language is still lacking. Therefore, in this paper, we decided to compare the performance of five different Transformer models, pre-trained on general purpose texts, on two main tasks in the Italian administrative domain: Name Entity Recognition and multi-label document classification on Public Administration (PA) documents. We evaluate the performance of each model on both tasks to identify the best model in this particular domain. We also discuss the effect of model size and pre-training data on the performances on domain data. Our evaluation identifies UmBERTo as the best-performing model, with an accuracy of 0.71, an F1 score of 0.89 for multi-label document classification, and an F1 score of 0.87 for NER-PA.
Evaluating Pre-Trained Transformers on Italian Administrative Texts
Auriemma S.;Bondielli A.;Passaro L. C.;Lenci A.
2022-01-01
Abstract
In recent years, Transformer-based models have been widely used in NLP for various downstream tasks and in different domains. However, a language model explicitly built for the Italian administrative language is still lacking. Therefore, in this paper, we decided to compare the performance of five different Transformer models, pre-trained on general purpose texts, on two main tasks in the Italian administrative domain: Name Entity Recognition and multi-label document classification on Public Administration (PA) documents. We evaluate the performance of each model on both tasks to identify the best model in this particular domain. We also discuss the effect of model size and pre-training data on the performances on domain data. Our evaluation identifies UmBERTo as the best-performing model, with an accuracy of 0.71, an F1 score of 0.89 for multi-label document classification, and an F1 score of 0.87 for NER-PA.File | Dimensione | Formato | |
---|---|---|---|
paper4-AixPA.pdf
accesso aperto
Descrizione: Natural Language Processing, Evaluation of Neural Language Models, Domain Language, Public Administration
Tipologia:
Versione finale editoriale
Licenza:
Creative commons
Dimensione
1.09 MB
Formato
Adobe PDF
|
1.09 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.