CINECA IRIS Institutional Research Information System

Proliferation of online political hate speech through social media has been a persisting problem and is being recently compounded by the arrival of AI-boosted content. This can lead to wanton dissemination of misinformation/disinformation and can cause extremist radicalisation or influence national electoral processes. Given the high stakes of negative social impact, it is becoming increasingly important to address the sensitive topic of content moderation on social media platforms, the debate being the dichotomy of free speech versus content harm. From that perspective, it is crucial to establish a nuanced definition and categorisation of harmful content that is sensitive to the culture and language of the place of dissemination, which is different from the current one-size-fits-All approach, where content moderation is performed by social media companies behind closed doors. In this paper, we present a democratized solution to this problem through a crowdsourced annotation process that may be used to have a transparent method of identifying harmful content, which can then be used to make moderation decisions like contextually weighted downranking of harmful content. We present proof of concept case studies in the Indian political electoral discourse. We introduce a curated dataset of tweets labeled by multiple annotators from diverse backgrounds and visualize insightful statistical patterns emerging therefrom. This is the first stage of a multi-year Global Partnership on AI (GPAI) project on responsible AI for social media governance. In 2024 and beyond, we plan to expand the work to include both memes and tweets, that are multilingual (a mixture of Hindi/Bengali, English, and romanised Hindi/Bengali).

Towards a crowdsourced framework for online hate speech moderation - a case study in the Indian political scenario

Bhattacharya, Avigyan;Chakrabarti, Tapabrata;Basu, Subhadip;Knott, Alistair;Pedreschi, Dino;Chatila, Raja;Leavy, Susan;Eyers, David;Teal, Paul D.;Biecek, Przemyslaw

2024-01-01

Abstract

Proliferation of online political hate speech through social media has been a persisting problem and is being recently compounded by the arrival of AI-boosted content. This can lead to wanton dissemination of misinformation/disinformation and can cause extremist radicalisation or influence national electoral processes. Given the high stakes of negative social impact, it is becoming increasingly important to address the sensitive topic of content moderation on social media platforms, the debate being the dichotomy of free speech versus content harm. From that perspective, it is crucial to establish a nuanced definition and categorisation of harmful content that is sensitive to the culture and language of the place of dissemination, which is different from the current one-size-fits-All approach, where content moderation is performed by social media companies behind closed doors. In this paper, we present a democratized solution to this problem through a crowdsourced annotation process that may be used to have a transparent method of identifying harmful content, which can then be used to make moderation decisions like contextually weighted downranking of harmful content. We present proof of concept case studies in the Indian political electoral discourse. We introduce a curated dataset of tweets labeled by multiple annotators from diverse backgrounds and visualize insightful statistical patterns emerging therefrom. This is the first stage of a multi-year Global Partnership on AI (GPAI) project on responsible AI for social media governance. In 2024 and beyond, we plan to expand the work to include both memes and tweets, that are multilingual (a mixture of Hindi/Bengali, English, and romanised Hindi/Bengali).

Scheda breve

Scheda completa

Scheda completa (DC)

Anno

2024

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11568/1281642

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

0

ND

social impact