CINECA IRIS Institutional Research Information System

This paper presents a critical perspective on the ecological validity challenges in evaluating AI-assisted decision-making tools for healthcare, illustrated through insights from a case study on oral cancer diagnosis. We argue that current experimental approaches often fail to capture the complexities of clinical environments in three critical dimensions: the temporal dynamics of decision-making, the holistic nature of clinical reasoning, and the multifaceted requirements for performance evaluation. Our case study with ten dental care specialists of varying experience levels revealed significant misalignments between our controlled experimental design and the realities of clinical practice. Participants’ qualitative feedback highlighted how real-world diagnosis involves contextual information beyond images, follows different temporal patterns than rapid experimental tasks, and requires evaluation metrics beyond simple accuracy. Based on these observations, we suggest pathways for enhancing ecological validity in AI healthcare research: incorporating longitudinal evaluation approaches, designing systems that integrate multiple information streams, and developing nuanced performance metrics that reflect clinical priorities. This work contributes to the ongoing dialogue about bridging the gap between AI research and its practical implementation in high-stakes medical settings.

Ecological Validity Missing in AI-Assisted Clinical Decision Support Research: Why Real-World Context Matters

Tommaso Turchi;Daria Mikhaylova;Miriana Troccoli;Alessio Malizia;Mario Giovanni C. A. Cimino;Federico Andrea Galatolo;Gaetano La Mantia;Giuseppina Campisi;Olga Di Fede

2025-01-01

Abstract

This paper presents a critical perspective on the ecological validity challenges in evaluating AI-assisted decision-making tools for healthcare, illustrated through insights from a case study on oral cancer diagnosis. We argue that current experimental approaches often fail to capture the complexities of clinical environments in three critical dimensions: the temporal dynamics of decision-making, the holistic nature of clinical reasoning, and the multifaceted requirements for performance evaluation. Our case study with ten dental care specialists of varying experience levels revealed significant misalignments between our controlled experimental design and the realities of clinical practice. Participants’ qualitative feedback highlighted how real-world diagnosis involves contextual information beyond images, follows different temporal patterns than rapid experimental tasks, and requires evaluation metrics beyond simple accuracy. Based on these observations, we suggest pathways for enhancing ecological validity in AI healthcare research: incorporating longitudinal evaluation approaches, designing systems that integrate multiple information streams, and developing nuanced performance metrics that reflect clinical priorities. This work contributes to the ongoing dialogue about bridging the gap between AI research and its practical implementation in high-stakes medical settings.

Scheda breve

Scheda completa

Scheda completa (DC)

Anno

2025

Codice ISBN

979-8-4007-2102-1

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11568/1327567

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

0

ND

social impact