PD07-05: ChatGPT v4 outperforming v3.5 on cancer treatment recommendations: Quality assessment and concordance with clinical guidelines, expert opinions

PD07-05: ChatGPT v4 outperforming v3.5 on cancer treatment recommendations: Quality assessment and concordance with clinical guidelines, expert opinions

分享

QR code

列印

我要推薦

瀏覽: 570, 最近修訂: 2024-06-11

PD07-05: ChatGPT v4 outperforming v3.5 on cancer treatment recommendations: Quality assessment and concordance with clinical guidelines, expert opinions

瀏覽: 570, 最近修訂: 2024-06-11

評估ChatGPT對泌尿癌症提供之治療建議與臨床指引的一致性及回應品質：
V4 超越 V3.5

蔡宗佑^1,2*、兪錫全^1,3、鄭百諭^1,3、徐任廷¹

¹亞東紀念醫院外科部泌尿外科 ²元智大學電機工程學系 ³台灣大學醫工所

ChatGPT v4 Outperforming v3.5 on Cancer Treatment Recommendations: Quality Assessment and Concordance with Clinical Guidelines, Expert Opinions

Chung-You Tsai^1,2*, Shyi-Chun Yii^1,3, Pai-Yu Cheng^1,3, Jen-Ting Hsu¹

¹Divisions of Urology, Department of Surgery, Far Eastern Memorial Hospital, New Taipei, Taiwan

²Department of Electrical Engineering, Yuan Ze University, Taoyuan, Taiwan

³Department of Biomedical Engineering, College of Medicine and College of Engineering, National Taiwan University, Taipei, Taiwan

*Correspondence to: Chung-You Tsai, MD. PhD.

Purpose: To assess the quality and alignment of ChatGPT’s cancer treatment recommendations (RECs) with National Comprehensive Cancer Network (NCCN) guidelines and expert opinions.

Materials and Methods: Three urologists performed a quantitative and qualitative assessment in October 2023, analyzing responses from ChatGPT-4 and ChatGPT-3.5 to 108 prostate, kidney, and bladder cancer prompts using two zero-shot prompt templates. Performance evaluation involved calculating five ratios: expert-approved/expert-disagreed, and NCCN-aligned RECs against total ChatGPT RECs, plus coverage and adherence rates to NCCN. Experts rated response’s quality on a 1-5 scale considering correctness, comprehensiveness, specificity, and appropriateness.

Results: ChatGPT-4 outperformed v3.5 for prostate cancer inquiries, with a higher word count (317.3 vs. 124.4; p<0.001) and total RECs (6.1 vs. 3.9; p<0.001). It’ s Rater-approved REC ratio (96.1% vs. 89.4%), alignment with NCCN guidelines (76.8% vs. 49.1%, p=0.001) were superior and scored significantly better on all quality dimensions. Over 108 prompts across three cancers, ChatGPT-4 produced an average of 6.0 RECs per case, with an 88.5% approval rate from raters, 86.7% NCCN concordance, and only a 9.5% disagreement rate. It achieved high marks in correctness (4.5), comprehensiveness (4.4), specificity (4.0), and appropriateness (4.4). Subgroup analyses across cancer types, disease statuses, and different prompt templates were reported.

Conclusions: ChatGPT-4 demonstrated significant improvement in providing accurate and detailed treatment recommendations for urological cancers in line with clinical guidelines and expert opinion. However, it is vital to recognize that AI-tools are not without flaws and should be utilized with caution. ChatGPT could supplement, but not replace, personalized advice from healthcare professionals.

未登入或權限不足!

位置

資料夾名稱

摘要

發表人

TUA線上教育_家琳

單位

台灣泌尿科醫學會

建立

2024-06-11 17:33:22

最近修訂

2024-06-11 17:34:54

PD07-05: ChatGPT v4 outperforming v3.5 on cancer treatment recommendations: Quality assessment and concordance with clinical guidelines, expert opinions 分享 QR code 列印 我要推薦

PD07-05: ChatGPT v4 outperforming v3.5 on cancer treatment recommendations: Quality assessment and concordance with clinical guidelines, expert opinions

PD07-05: ChatGPT v4 outperforming v3.5 on cancer treatment recommendations: Quality assessment and concordance with clinical guidelines, expert opinions

分享

QR code

列印

我要推薦