Drifting explanations in continual learning

Cossu, Andrea; Spinnato, Francesco; Guidotti, Riccardo; Bacciu, Davide

doi:10.1016/j.neucom.2024.127960

Continual Learning (CL) trains models on streams of data, with the aim of learning new information without forgetting previous knowledge. However, many of these models lack interpretability, making it difficult to understand or explain how they make decisions. This lack of interpretability becomes even more challenging given the non-stationary nature of the data streams in CL. Furthermore, CL strategies aimed at mitigating forgetting directly impact the learned representations. We study the behavior of different explanation methods in CL and propose CLEX (ContinuaL EXplanations), an evaluation protocol to robustly assess the change of explanations in Class-Incremental scenarios, where forgetting is pronounced. We observed that models with similar predictive accuracy do not generate similar explanations. Replay-based strategies, well-known to be some of the most effective ones in class-incremental scenarios, are able to generate explanations that are aligned to the ones of a model trained offline. On the contrary, naive fine-tuning often results in degenerate explanations that drift from the ones of an offline model. Finally, we discovered that even replay strategies do not always operate at best when applied to fully-trained recurrent models. Instead, randomized recurrent models (leveraging on an untrained recurrent component) clearly reduce the drift of the explanations. This discrepancy between fully-trained and randomized recurrent models, previously known only in the context of their predictive continual performance, is more general, including also continual explanations.