PubMed內大量的論文摘要,測試文字探戡分析是否能掌握醫學研究的趨勢
許家禎
嘉義市 陽明醫院 泌尿科
Testing the text mining analysis on large amount literature from PubMed to describe current medical scientific research trends
Jia-Jen Shee
Department of Urology, Yang Ming Hospital, Chia-Yi City, Taiwan
 
Purpose: Picking up the trends of specific research fields become a serious issue for every specialist. Analyzing human communications to figure out most mentioned trends on an assigned topic is well accepted in social science research but never on medical study groups. We had developed a software, STRAYL, to analyze medical literature and calculate the TF (Term Frequency, a word used in all articles) and IDF (Inverse Document Frequency, how many articles use that word) of each keyword. We demonstrate a recently successful analyze work on a large amount of PubMed literature to meet the general situation of the real world. 
Materials and Methods: The text-mining program STRAYL, which had ever been published in middle year meeting of TUA 2019, was used for this study. We used PubMed to search the cancer immune therapy related literature with the keyword "(immune) and (therapy) and ((cancer) or (neoplasm))” in the recent 5 years period.  A medical dictionary of 97663 words was used as the mining target words for analysis. Additional STRAYL analysis on subgroups of those 24081 collections had been performed, too. The subgroups were mentioned particularly on the urologic organ. The added PubMed search keywords to “(immune) and (therapy) and ((cancer) or (neoplasm))” were: “((urothelial) or (bladder) or (ureter))”, “((kidney) or(renal))” and “(prostate)” for each subgroup.  After the STRAYL’s works, the results were collected and export into Excel for further examinations.
Results: TF and IDF score (IDFS: IDF/number of total analyzed literature) were well calculated and recorded by STRAYL. The program took 13 hours and 18 minutes to complete all 4 analysis. The first result data tells that the kidney is most mentioned GU organ (“renal” IDFS=0.036294, “kidney” IDFS=0.014543). The bladder (“bladder” IDFS=0.017257, “urothelial” IDFS=0.010770) have been paid more attention than prostate (“prostate” IDFS=0.025609, “prostatic” IDFS=0.001696). While analysis on those 3 subgroups, the word "trial" had been more frequently mentioned in bladder group (IDFS=0.286458), following the kidney group (IDFS=0.215803) and the prostate group (IDFS=0.203988). By the way, the "metastasis" and "melanoma" both have higher IDF score than any GU organs in the current analysis. The ‘treatment” had high IDFS in subgroups but not in the main group.
Conclusions: The text-mining data from recent PubMed literature present that immune therapy currently was applied mainly in GU, and more on urothelial malignancy and were researched more on renal cancer. The result is very matched to the recent situation of I/O therapy in TUA.  We can conclude that the newly developed text-mining program “STRAYL” can help us to figure out the current medical trend in assigned topics through PubMed literature in a concise time.
    位置
    資料夾名稱
    摘要
    發表人
    TUA人資客服組
    單位
    台灣泌尿科醫學會
    建立
    2019-06-27 21:16:14
    最近修訂
    2019-07-04 15:32:43
    更多