促進泌尿科病患教育:對ChatGPT-4回應非癌症問題的深入評析

黃君平陳進利高建璋楊明昕曹智惟蒙恩江佩璋

國防醫學院三軍總醫院外科部泌尿外科

Advancing Patient Education in Urology: An In-Depth Assessment of ChatGPT-4's Responses to Non-Cancer Queries

Chun-Ping Huang, Chin-Li Chen, Chien-Chang Kao, Ming-Hsin Yang, Chih-Wei Tsao, En Meng, Pei-Jhang Chiang

Division of Urology, Department of Surgery, Tri-Service General Hospital, National Defense Medical Center, Taipei 114024, Taiwan

 

Purpose:

To assess the effectiveness and quality of ChatGPT-4's responses to non-cancer-related urological queries, as a resource for patient education.

 

Materials and Methods:

A total of 130 urology-related questions not related to cancer were selected, covering six distinct categories including prostate benign disease, stone disease, voiding dysfunction and female urology, andrology and reproductive medicine, infections and inflammatory disease, and pediatric urology. These questions were presented to ChatGPT-4, and the responses were rigorously evaluated using criteria such as comprehensibility, accuracy, applicability, readability, length, and their similarity to the European Association of Urology (EAU) patient information. To measure the likeness between ChatGPT-4's responses and the EAU patient information, we computed the cosine similarity employing the bag-of-words model. The quality of ChatGPT-4's responses was independently assessed by two specialist urologists using the Section 2 criteria of the DISCERN instrument, a tool for judging the quality of written consumer health information.

 

Results:

ChatGPT-4 consistently demonstrated a high level of comprehensibility across all 130 questions, achieving a score of 91.7%. The quality of ChatGPT-4's responses, as measured by the Section 2 of DISCERN score, varied from 18 to 32 with a median score of 25. However, its actionability, an indicator of the practical applicability of the information, averaged at 40.0%. The median values for characters, word count, sentences, and paragraphs are 2179, 384.5, 22, and 17, respectively.

Readability metrics showed slight fluctuations, with Flesch Reading Ease scores ranging from 11.8 (in the pediatric urology category) to 55.6 (in stone disease category). The Flesch-Kincaid Grade Level consistently categorized the text's complexity at the college level, with scores ranging from 8.6 to 17.4. Notably, the misinformation score was consistently low across all categories, underscoring the accuracy of the content generated. Passive sentences, important for gauging active reader engagement, varied ranging from 0 to 78.5% across these six categories. Cosine similarity was generally consistent at 65.3% between ChatGPT-4 and EAU patient information.

 

Conclusion:

ChatGPT-4 exhibits significant promise as a tool for educating urology patients, due to its consistently high comprehensibility and low misinformation score. However, the content's varying degrees of applicability and text complexity suggest a need for further refinement. This study underscores both the potential and the challenges associated with employing large language models in medical education platforms.

    位置
    資料夾名稱
    摘要
    發表人
    TUA線上教育_家琳
    單位
    台灣泌尿科醫學會
    建立
    2024-01-10 11:25:54
    最近修訂
    2024-01-10 11:26:31
    1. 1.
      Podium 1
    2. 2.
      Podium 2
    3. 3.
      Podium 3
    4. 4.
      Moderated Poster 01
    5. 5.
      Moderated Poster 02
    6. 6.
      Non-Discussion Poster