The paper describes the creation of a reference corpus of nearly 1200 Web forum posts in Italian. The corpus was created evaluating and customizing a previous proposal for Xml standard encoding; a revised version of the relevant DTD is now proposed as reference for the structural features of Web forum posts and a set of correspondences, with little loss of information, is given for the TEI P5 encoding system. Preliminary results about syntactic features of the language of the posts are also included to sample the linguistic variability of this textual genre.
|Autori interni:||TAVOSANIS, MIRKO LUIGI AURELIO|
|Autori:||Silvia Petri; TAVOSANIS M|
|Titolo:||Building a corpus of Italian Web forums: standard encoding issues and linguistic features|
|Anno del prodotto:||2009|
|Appare nelle tipologie:||1.1 Articolo in rivista|