The widespread deployment of transformers in text classification creates the need for interpretable AI systems, particularly in regulatory-sensitive domains where transparency is mandatory. This pilot study presents the first adaptation of the Similarity Difference and Uniqueness (SIDU) method, originally developed for CNN explainability, to transformer architectures. We address the challenge of bridging spatial feature maps to sequential token representations by exploring two masking strategies: Persistent Homology masking, which utilizes angular distances between the [CLS] token and context tokens, and Cosine Similarity masking, based on semantic relationships. Our approach operates on final hidden layer representations, requiring multiple forward passes to evaluate different mask configurations and compute similarity-uniqueness scores for token-level explanations. Through quantitative and qualitative evaluation across diverse text classification scenarios, from movie reviews to legal document processing, we investigate how transformer hidden states can be leveraged for explainability. To support this evaluation, we introduce a novel metric called Average Token Activation, which captures the mean activation of individual tokens without relying on any threshold mechanisms typical of XAI plausibility evaluation metrics. Our findings reveal robust performance across different domains and classification setups, providing the first insights into the potential and limitations of this cross-domain XAI adaptation approach.

Explaining Transformers Through Similarity Difference and Uniqueness Masks – A Pilot Study

Marco Parola;Mario G. C. A. Cimino;
In corso di stampa

Abstract

The widespread deployment of transformers in text classification creates the need for interpretable AI systems, particularly in regulatory-sensitive domains where transparency is mandatory. This pilot study presents the first adaptation of the Similarity Difference and Uniqueness (SIDU) method, originally developed for CNN explainability, to transformer architectures. We address the challenge of bridging spatial feature maps to sequential token representations by exploring two masking strategies: Persistent Homology masking, which utilizes angular distances between the [CLS] token and context tokens, and Cosine Similarity masking, based on semantic relationships. Our approach operates on final hidden layer representations, requiring multiple forward passes to evaluate different mask configurations and compute similarity-uniqueness scores for token-level explanations. Through quantitative and qualitative evaluation across diverse text classification scenarios, from movie reviews to legal document processing, we investigate how transformer hidden states can be leveraged for explainability. To support this evaluation, we introduce a novel metric called Average Token Activation, which captures the mean activation of individual tokens without relying on any threshold mechanisms typical of XAI plausibility evaluation metrics. Our findings reveal robust performance across different domains and classification setups, providing the first insights into the potential and limitations of this cross-domain XAI adaptation approach.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11568/1345589
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact