The widespread deployment of transformers in text classification creates the need for interpretable AI systems, particularly in regulatory-sensitive domains where transparency is mandatory. This pilot study presents the first adaptation of the Similarity Difference and Uniqueness (SIDU) method, originally developed for CNN explainability, to transformer architectures. We address the challenge of bridging spatial feature maps to sequential token representations by exploring two masking strategies: Persistent Homology masking, which utilizes angular distances between the [CLS] token and context tokens, and Cosine Similarity masking, based on semantic relationships. Our approach operates on final hidden layer representations, requiring multiple forward passes to evaluate different mask configurations and compute similarity-uniqueness scores for token-level explanations. Through quantitative and qualitative evaluation across diverse text classification scenarios, from movie reviews to legal document processing, we investigate how transformer hidden states can be leveraged for explainability. To support this evaluation, we introduce a novel metric called Average Token Activation, which captures the mean activation of individual tokens without relying on any threshold mechanisms typical of XAI plausibility evaluation metrics. Our findings reveal robust performance across different domains and classification setups, providing the first insights into the potential and limitations of this cross-domain XAI adaptation approach.
Explaining Transformers Through Similarity Difference and Uniqueness Masks – A Pilot Study
Marco Parola;Mario G. C. A. Cimino;
In corso di stampa
Abstract
The widespread deployment of transformers in text classification creates the need for interpretable AI systems, particularly in regulatory-sensitive domains where transparency is mandatory. This pilot study presents the first adaptation of the Similarity Difference and Uniqueness (SIDU) method, originally developed for CNN explainability, to transformer architectures. We address the challenge of bridging spatial feature maps to sequential token representations by exploring two masking strategies: Persistent Homology masking, which utilizes angular distances between the [CLS] token and context tokens, and Cosine Similarity masking, based on semantic relationships. Our approach operates on final hidden layer representations, requiring multiple forward passes to evaluate different mask configurations and compute similarity-uniqueness scores for token-level explanations. Through quantitative and qualitative evaluation across diverse text classification scenarios, from movie reviews to legal document processing, we investigate how transformer hidden states can be leveraged for explainability. To support this evaluation, we introduce a novel metric called Average Token Activation, which captures the mean activation of individual tokens without relying on any threshold mechanisms typical of XAI plausibility evaluation metrics. Our findings reveal robust performance across different domains and classification setups, providing the first insights into the potential and limitations of this cross-domain XAI adaptation approach.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


