利用機器學習模型預測腎結石疾病風險
張聲浩1、耿俊閎2,3
1高雄醫學大學 學士後醫學系; 2高雄市立小港醫院 泌尿科; 3高雄醫學大學附設中和紀念醫院 泌尿科
Kidney Stone Disease Risk Prediction Using Machine Learning Models
Sheng-Hao Chang1, Jiun-Hung Geng2,3
1 Department of Post Baccalaureate Medicine, Kaohsiung Medical University, Kaohsiung, Taiwan;
2 Department of Urology, Kaohsiung Municipal Siaogang Hospital, Kaohsiung, Taiwan;
3 Department of Urology, Kaohsiung Medical University Hospital, Kaohsiung Medical University, Kaohsiung, Taiwan
Purpose: Kidney stone disease (KSD) is a prevalent urological disorder affecting approximately 10% of the population in Taiwan. It is associated with significant morbidity, recurrence, and healthcare burden. Identifying individuals at high risk for KSD is crucial for effective prevention and early intervention. Recent advances in artificial intelligence (AI) and machine learning (ML) have provided new opportunities for improving disease prediction in clinical settings.
Materials and Methods: Data were obtained from 121,802 participants in the Taiwan Biobank. After excluding individuals with incomplete data, demographic, lifestyle, anthropometric, and biochemical variables were included for model training. Five ML algorithms, Logistic Regression (LR), Support Vector Machine (SVM), Random Forest (RF), AdaBoost, and Extreme Gradient Boosting (XGBoost), were developed to predict the risk of KSD. The dataset was randomly divided into training (80%) and testing (20%) subsets. The Synthetic Minority Oversampling Technique (SMOTE) was applied to address class imbalance. Model performance was evaluated using 10-fold cross-validation and metrics including accuracy, area under the receiver operating characteristic curve (AUROC), F1-score, precision, and recall.
Results: Among the participants, 7,749 (6.4%) had a history of KSD. Significant predictors included male sex, older age, higher BMI, increased waist circumference, elevated uric acid and creatinine levels, and lower albumin and HDL-C levels (all p < 0.001). Among the tested models, AdaBoost achieved the best predictive performance, with a mean accuracy of 0.889, AUROC of 0.949, F1-score of 0.887, precision of 0.899, and recall of 0.876 in the testing dataset, outperforming RF (AUROC = 0.831) and LR (AUROC = 0.723).
Conclusions: This study successfully established a machine learning–based predictive model for kidney stone disease using large-scale population data. The AdaBoost algorithm demonstrated superior accuracy and discrimination compared with conventional methods. Incorporating ML-driven prediction into clinical workflows may facilitate personalized risk assessment, optimize preventive strategies, and reduce the disease burden of kidney stones.