The problem of fake news detection is becoming increasingly interesting for several research fields. Different approaches have been proposed, based on either the content of the news itself or the context and properties of its spread over time, specifically on social media. In the literature, it does not exist a widely accepted general-purpose dataset for fake news detection, due to the complexity of the task and the increasing ability to produce fake news appearing credible in particular moments. In this paper, we propose a methodology to collect and label news pertinent to specific topics and subjects. Our methodology focuses on collecting data from social media about real-world events which are known to trigger fake news. We propose a labelling method based on crowdsourcing that is fast, reliable, and able to approximate expert human annotation. The proposed method exploits both the content of the data (i.e., the texts) and contextual information about fake news for a particular real-world event. The methodology is applied to collect and annotate the Notre-Dame Fire Dataset and to annotate part of the PHEME dataset. Evaluation is performed with fake news classifiers based on Transformers and fine-tuning. Results show that context-based annotation outperforms traditional crowdsourcing out-of-context annotation.
In-context annotation of Topic-Oriented Datasets of Fake News: A Case study on the Notre-Dame Fire Event
Passaro, Lucia C.
Primo
;Bondielli, Alessandro;Lenci, Alessandro;Marcelloni, Francesco
2022-01-01
Abstract
The problem of fake news detection is becoming increasingly interesting for several research fields. Different approaches have been proposed, based on either the content of the news itself or the context and properties of its spread over time, specifically on social media. In the literature, it does not exist a widely accepted general-purpose dataset for fake news detection, due to the complexity of the task and the increasing ability to produce fake news appearing credible in particular moments. In this paper, we propose a methodology to collect and label news pertinent to specific topics and subjects. Our methodology focuses on collecting data from social media about real-world events which are known to trigger fake news. We propose a labelling method based on crowdsourcing that is fast, reliable, and able to approximate expert human annotation. The proposed method exploits both the content of the data (i.e., the texts) and contextual information about fake news for a particular real-world event. The methodology is applied to collect and annotate the Notre-Dame Fire Dataset and to annotate part of the PHEME dataset. Evaluation is performed with fake news classifiers based on Transformers and fine-tuning. Results show that context-based annotation outperforms traditional crowdsourcing out-of-context annotation.File | Dimensione | Formato | |
---|---|---|---|
1-s2.0-S0020025522008167-main.pdf
non disponibili
Descrizione: Dataset collection and annotation; Fake news; Machine learning
Tipologia:
Versione finale editoriale
Licenza:
NON PUBBLICO - accesso privato/ristretto
Dimensione
2.5 MB
Formato
Adobe PDF
|
2.5 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.