Voci della Grande Guerra (“Voices of the Great War”) is the first large corpus of Italian historical texts dating back to the period ofFirst World War. This corpus differs from other existing resources in several respects. First, from the linguistic point of view it givesaccount of the wide range of varieties in which Italian was articulated in that period, namely from a diastratic (educated vs. uneducatedwriters), diaphasic (low/informal vs. high/formal registers) and diatopic (regional varieties, dialects) points of view. From the historicalperspective, through a collection of texts belonging to different genres it represents different views on the war and the various styles ofnarrating war events and experiences. The final corpus is balanced along various dimensions, corresponding to the textual genre, thelanguage variety used, the author type and the typology of conveyed contents. The corpus is annotated with lemmas, part-of-speech,terminology, and named entities. Significant corpus samples representative of the different “voices” have also been enriched withmeta-linguistic and syntactic information. The layer of syntactic annotation forms the first nucleus of an Italian historical treebankcomplying with the Universal Dependencies standard. The paper illustrates the final resource, the methodology and tools used to buildit, and the Web Interface for navigating it.

Voices of the great war: A richly annotated corpus of Italian texts on the first world war

Alessandro Lenci
;
Lucia Passaro;
2020-01-01

Abstract

Voci della Grande Guerra (“Voices of the Great War”) is the first large corpus of Italian historical texts dating back to the period ofFirst World War. This corpus differs from other existing resources in several respects. First, from the linguistic point of view it givesaccount of the wide range of varieties in which Italian was articulated in that period, namely from a diastratic (educated vs. uneducatedwriters), diaphasic (low/informal vs. high/formal registers) and diatopic (regional varieties, dialects) points of view. From the historicalperspective, through a collection of texts belonging to different genres it represents different views on the war and the various styles ofnarrating war events and experiences. The final corpus is balanced along various dimensions, corresponding to the textual genre, thelanguage variety used, the author type and the typology of conveyed contents. The corpus is annotated with lemmas, part-of-speech,terminology, and named entities. Significant corpus samples representative of the different “voices” have also been enriched withmeta-linguistic and syntactic information. The layer of syntactic annotation forms the first nucleus of an Italian historical treebankcomplying with the Universal Dependencies standard. The paper illustrates the final resource, the methodology and tools used to buildit, and the Web Interface for navigating it.
2020
979-10-95546-34-4
File in questo prodotto:
File Dimensione Formato  
Lenci-et-al.-2020-Voices-of-the-great-war-A-richly-annotated-corpus-of-Italian-texts-on-the-first-world-war-annotated.pdf

accesso aperto

Descrizione: Articolo principale
Tipologia: Versione finale editoriale
Licenza: Creative commons
Dimensione 1.28 MB
Formato Adobe PDF
1.28 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11568/1070127
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? ND
social impact