NDP34: Developing and test a text-mining application as a decision assistant tool to analyze research trend of assigned study topics.
嘉義市 陽明醫院 泌尿科;1嘉義長庚紀念醫院 外科部 泌尿外科
Developing and Test a Text-mining Application as a Decision Assistant Tool to Analyze Research Trend of Assigned Study Topics.
Jia-Jen Shee, Dong-Ru Ho 1, Yun-Ching Huang 1, Chih-Shou Chen 1
Department of Urology , Yang Ming Hospital, Chia-Yi City, Taiwan
Divisions of Urology 1, Department of Surgery, Chiayi Chang Gung Memorial Hospital, Chia-Yi, Taiwan
Purpose: Catching the leading trends in different research fields is always a serious issue for researchers. In order to help scientist to know the trend of diffident fields from his expert, we try to analysis the published literature by keywords through text-mining methods.
Material and Method: Frequency of scientific words used in literature is a major character of the research trend. A program, named STRAYL, was developed for text-mining analysis. A medical dictionary was used as the source of keywords files. Those keywords were used to compare to each abstract of selected PubMed literature. TF (Term Frequency, a word used in all articles) and IDF (Inverse Document Frequency, how many articles use that word) were well calculated, recorded and outputted.
Result:  TF and IDF were well calculated and recorded by STRAYL after comparing a 97664 words dictionary file to certain literature collections. With further sorting and charting the STRAYL data, most frequently used keywords were well presented. Our first sample was to use “urology gene therapy” as search term form PubMed and collect literature year by years from 2014 to 2018. We found the keyword “cancer” been mentioned by 62.3% of literature (IDF 0.623) in 2014 and raising gradually till 70.5% in 2018. The keyword “prostate” is more frequently mentioned than “bladder” (39.8% vs 20.1% at 2014 and 41.1% vs 21.8% at 2018) and “renal” (16.7% at 2014 and 20.1% at 2018). Those data told that prostate cancer gene therapy is a currently more weighted trend then bladder or renal fields. 
Conclusion: Our newly developed text-mining program “STRAYL” can dig published literature to calculate the keyword TF and IDF and can help to figure the scientific research trend in each assigned study topics. 
