Intent mining is a special kind of document analysis whose goal is to assess the attitude of the document author with respect to a given subject. Opinion mining is a kind of intent mining where the attitude is a positive or negative opinion. Most systems tackle the problem with a two step approach, an information retrieval followed by a postprocess or filter phase to identify opinionated blogs. We explored a single stage approach to opinion mining, retrieving opinionated documents ranked with a special ranking function which exploits an index enriched with opinion tags. A set of subjective words are used as tags for identifying opinionated sentences. Subjective words are marked as “opinionated” and are used in the retrieval phase to boost the rank of documents containing them. In indexing the collection, we recovered the relevant content from the blog permalink pages, exploiting HTML metadata about the generator and heuristics to remove irrelevant parts from the body. The index also contains information about the occurrence of opinionated words, extracted from an analysis of WordNet glosses. The experiments compared the precision of normal queries with respect to queries which included as constraint the proximity to an opinionated word. The results show a significant improvement in precision for both topic relevance and opinion relevance.

Blog Mining Through Opinionated Words

ATTARDI, GIUSEPPE;SIMI, MARIA
2006-01-01

Abstract

Intent mining is a special kind of document analysis whose goal is to assess the attitude of the document author with respect to a given subject. Opinion mining is a kind of intent mining where the attitude is a positive or negative opinion. Most systems tackle the problem with a two step approach, an information retrieval followed by a postprocess or filter phase to identify opinionated blogs. We explored a single stage approach to opinion mining, retrieving opinionated documents ranked with a special ranking function which exploits an index enriched with opinion tags. A set of subjective words are used as tags for identifying opinionated sentences. Subjective words are marked as “opinionated” and are used in the retrieval phase to boost the rank of documents containing them. In indexing the collection, we recovered the relevant content from the blog permalink pages, exploiting HTML metadata about the generator and heuristics to remove irrelevant parts from the body. The index also contains information about the occurrence of opinionated words, extracted from an analysis of WordNet glosses. The experiments compared the precision of normal queries with respect to queries which included as constraint the proximity to an opinionated word. The results show a significant improvement in precision for both topic relevance and opinion relevance.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11568/186353
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 3
  • ???jsp.display-item.citation.isi??? ND
social impact