In subjective tasks like stance detection, diverse human perspectives are often simplified into a single ground truth through label aggregation i.e. majority voting, potentially marginalizing minority viewpoints. This paper presents a Multi-Perspective framework for stance detection that explicitly incorporates annotation diversity by using soft labels derived from both human and large language model (LLM) annotations. Building on a stance detection dataset focused on controversial topics, we augment it with document summaries and new LLM-generated labels. We then compare two approaches: a baseline using aggregated hard labels, and a multi-perspective model trained on disaggregated soft labels that capture annotation distributions. Our findings show that multi-perspective models consistently outperform traditional baselines (higher F1-scores), with lower model confidence, reflecting task subjectivity. This work highlights the importance of modeling disagreement and promotes a shift toward more inclusive, perspective-aware NLP systems.
Embracing Diversity: A Multi-Perspective Approach with Soft Labels
Benedetta Muscato
;Praveen Bushipaka;Gizem Gezici;Lucia Passaro;Fosca Giannotti;Tommaso Cucinotta
2025-01-01
Abstract
In subjective tasks like stance detection, diverse human perspectives are often simplified into a single ground truth through label aggregation i.e. majority voting, potentially marginalizing minority viewpoints. This paper presents a Multi-Perspective framework for stance detection that explicitly incorporates annotation diversity by using soft labels derived from both human and large language model (LLM) annotations. Building on a stance detection dataset focused on controversial topics, we augment it with document summaries and new LLM-generated labels. We then compare two approaches: a baseline using aggregated hard labels, and a multi-perspective model trained on disaggregated soft labels that capture annotation distributions. Our findings show that multi-perspective models consistently outperform traditional baselines (higher F1-scores), with lower model confidence, reflecting task subjectivity. This work highlights the importance of modeling disagreement and promotes a shift toward more inclusive, perspective-aware NLP systems.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


