評估ChatGPT對泌尿癌症提供之治療建議與臨床指引的一致性及回應品質:
V4 超越 V3.5
蔡宗佑1,2*、兪錫全1,3、鄭百諭1,3、徐任廷1
1亞東紀念醫院 外科部 泌尿外科 2元智大學電機工程學系 3台灣大學醫工所
ChatGPT v4 Outperforming v3.5 on Cancer Treatment Recommendations: Quality Assessment and Concordance with Clinical Guidelines, Expert Opinions
Chung-You Tsai1,2*, Shyi-Chun Yii1,3, Pai-Yu Cheng1,3, Jen-Ting Hsu1
1Divisions of Urology, Department of Surgery, Far Eastern Memorial Hospital, New Taipei, Taiwan
2Department of Electrical Engineering, Yuan Ze University, Taoyuan, Taiwan
3Department of Biomedical Engineering, College of Medicine and College of Engineering, National Taiwan University, Taipei, Taiwan
*Correspondence to: Chung-You Tsai, MD. PhD.
Purpose: To assess the quality and alignment of ChatGPT’s cancer treatment recommendations (RECs) with National Comprehensive Cancer Network (NCCN) guidelines and expert opinions.
Materials and Methods: Three urologists performed a quantitative and qualitative assessment in October 2023, analyzing responses from ChatGPT-4 and ChatGPT-3.5 to 108 prostate, kidney, and bladder cancer prompts using two zero-shot prompt templates. Performance evaluation involved calculating five ratios: expert-approved/expert-disagreed, and NCCN-aligned RECs against total ChatGPT RECs, plus coverage and adherence rates to NCCN. Experts rated response’s quality on a 1-5 scale considering correctness, comprehensiveness, specificity, and appropriateness.
Results: ChatGPT-4 outperformed v3.5 for prostate cancer inquiries, with a higher word count (317.3 vs. 124.4; p<0.001) and total RECs (6.1 vs. 3.9; p<0.001). It’ s Rater-approved REC ratio (96.1% vs. 89.4%), alignment with NCCN guidelines (76.8% vs. 49.1%, p=0.001) were superior and scored significantly better on all quality dimensions. Over 108 prompts across three cancers, ChatGPT-4 produced an average of 6.0 RECs per case, with an 88.5% approval rate from raters, 86.7% NCCN concordance, and only a 9.5% disagreement rate. It achieved high marks in correctness (4.5), comprehensiveness (4.4), specificity (4.0), and appropriateness (4.4). Subgroup analyses across cancer types, disease statuses, and different prompt templates were reported.
Conclusions: ChatGPT-4 demonstrated significant improvement in providing accurate and detailed treatment recommendations for urological cancers in line with clinical guidelines and expert opinion. However, it is vital to recognize that AI-tools are not without flaws and should be utilized with caution. ChatGPT could supplement, but not replace, personalized advice from healthcare professionals.