The paper describes the creation of a reference corpus of nearly 1200 Web forum posts in Italian. The corpus was created evaluating and customizing a previous proposal for Xml standard encoding; a revised version of the relevant DTD is now proposed as reference for the structural features of Web forum posts and a set of correspondences, with little loss of information, is given for the TEI P5 encoding system. Preliminary results about syntactic features of the language of the posts are also included to sample the linguistic variability of this textual genre.

Building a corpus of Italian Web forums: standard encoding issues and linguistic features

TAVOSANIS, MIRKO LUIGI AURELIO
2009-01-01

Abstract

The paper describes the creation of a reference corpus of nearly 1200 Web forum posts in Italian. The corpus was created evaluating and customizing a previous proposal for Xml standard encoding; a revised version of the relevant DTD is now proposed as reference for the structural features of Web forum posts and a set of correspondences, with little loss of information, is given for the TEI P5 encoding system. Preliminary results about syntactic features of the language of the posts are also included to sample the linguistic variability of this textual genre.
2009
Silvia, Petri; Tavosanis, MIRKO LUIGI AURELIO
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11568/130432
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact