CINECA IRIS Institutional Research Information System

This study presents a comparative analysis of various machine learning models for hate speech and stereotype detection in Italian texts. The research utilises datasets from the HaSpeeDe task proposed by EVALITA in 2020. Multiple text representation techniques are evaluated, including non-lexical linguistic information, bag of words, n-grams (characters, words, and part-of-speech tags), word embeddings, and a neural language model (BERT). The study compares the performance of these models in different metrics such as accuracy, precision, recall, and F1-score. The results indicate that character n-grams and the neural language model (BERT) generally outperform other techniques, with BERT achieving the highest accuracy (76%) for the detection of hate speech and character n-grams performing the best for the detection of stereotypes (72% accuracy). The research highlights the challenges in detecting stereotypes compared to hate speech and emphasises the importance of context in classification tasks.

A comparative study of machine learning models for hate speech and stereotype detection in Italian texts

Sammartino, Vincenzo

2024-01-01

Abstract

This study presents a comparative analysis of various machine learning models for hate speech and stereotype detection in Italian texts. The research utilises datasets from the HaSpeeDe task proposed by EVALITA in 2020. Multiple text representation techniques are evaluated, including non-lexical linguistic information, bag of words, n-grams (characters, words, and part-of-speech tags), word embeddings, and a neural language model (BERT). The study compares the performance of these models in different metrics such as accuracy, precision, recall, and F1-score. The results indicate that character n-grams and the neural language model (BERT) generally outperform other techniques, with BERT achieving the highest accuracy (76%) for the detection of hate speech and character n-grams performing the best for the detection of stereotypes (72% accuracy). The research highlights the challenges in detecting stereotypes compared to hate speech and emphasises the importance of context in classification tasks.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2024
			
	Codice DOI
	
				https://dx.doi.org/10.1504/ijcast.2024.143880
			
	Tutti gli autori
	
						Sammartino, Vincenzo
					
	Appare nelle tipologie:
	
				1.1 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
Hate_Speech_Article.pdf accesso aperto Tipologia: Documento in Post-print Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 478.84 kB Formato Adobe PDF Visualizza/Apri	478.84 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11568/1319407

Citazioni

ND

ND

ND

social impact