This paper presents a critical perspective on the ecological validity challenges in evaluating AI-assisted decision-making tools for healthcare, illustrated through insights from a case study on oral cancer diagnosis. We argue that current experimental approaches often fail to capture the complexities of clinical environments in three critical dimensions: the temporal dynamics of decision-making, the holistic nature of clinical reasoning, and the multifaceted requirements for performance evaluation. Our case study with ten dental care specialists of varying experience levels revealed significant misalignments between our controlled experimental design and the realities of clinical practice. Participants’ qualitative feedback highlighted how real-world diagnosis involves contextual information beyond images, follows different temporal patterns than rapid experimental tasks, and requires evaluation metrics beyond simple accuracy. Based on these observations, we suggest pathways for enhancing ecological validity in AI healthcare research: incorporating longitudinal evaluation approaches, designing systems that integrate multiple information streams, and developing nuanced performance metrics that reflect clinical priorities. This work contributes to the ongoing dialogue about bridging the gap between AI research and its practical implementation in high-stakes medical settings.
Ecological Validity Missing in AI-Assisted Clinical Decision Support Research: Why Real-World Context Matters
Tommaso Turchi
;Daria Mikhaylova;Alessio Malizia;Mario Giovanni C. A. Cimino;Federico Andrea Galatolo;
2025-01-01
Abstract
This paper presents a critical perspective on the ecological validity challenges in evaluating AI-assisted decision-making tools for healthcare, illustrated through insights from a case study on oral cancer diagnosis. We argue that current experimental approaches often fail to capture the complexities of clinical environments in three critical dimensions: the temporal dynamics of decision-making, the holistic nature of clinical reasoning, and the multifaceted requirements for performance evaluation. Our case study with ten dental care specialists of varying experience levels revealed significant misalignments between our controlled experimental design and the realities of clinical practice. Participants’ qualitative feedback highlighted how real-world diagnosis involves contextual information beyond images, follows different temporal patterns than rapid experimental tasks, and requires evaluation metrics beyond simple accuracy. Based on these observations, we suggest pathways for enhancing ecological validity in AI healthcare research: incorporating longitudinal evaluation approaches, designing systems that integrate multiple information streams, and developing nuanced performance metrics that reflect clinical priorities. This work contributes to the ongoing dialogue about bridging the gap between AI research and its practical implementation in high-stakes medical settings.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


