Three LLMs - ChatGPT-4, Claude 3.5 Sonnet and Gemini 1.5 Advanced - were evaluated on COPD questions from the GOLD recommendations. Sixty-one pulmonologists from 6 continents rated 90 AI responses for completeness, accuracy, terminology, accessibility, and safety. Gemini outperformed in completeness, Claude in accuracy and terminology, with no differences in accessibility or safety. While promising, clinical use requires caution and further validation to ensure safe, accurate patient education.
Valutazione comparativa di modelli linguistici di grandi dimensioni per il supporto all’educazione sanitaria del paziente con BPCO: uno studio pneumologico internazionale delle risposte generate da ChatGPT-4, Claude 3.5 Sonnet e Gemini 1.5 Advanced
Marchi, Guido
;Gambini, Giulia;Guglielmi, Giacomo;Pistelli, Francesco;Carrozzi, Laura
2025-01-01
Abstract
Three LLMs - ChatGPT-4, Claude 3.5 Sonnet and Gemini 1.5 Advanced - were evaluated on COPD questions from the GOLD recommendations. Sixty-one pulmonologists from 6 continents rated 90 AI responses for completeness, accuracy, terminology, accessibility, and safety. Gemini outperformed in completeness, Claude in accuracy and terminology, with no differences in accessibility or safety. While promising, clinical use requires caution and further validation to ensure safe, accurate patient education.File in questo prodotto:
Non ci sono file associati a questo prodotto.
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


