A Comparative Analysis of the Machine Learning Methods for Predicting Diabetes
DOI:
https://doi.org/10.31181/jopi21202421Keywords:
Diabetes Prediction, Machine Learning, Classification Models, Healthcare, Comparative AnalysisAbstract
Diabetes can lead to various health problems and complications, such as cardiovascular disease, kidney damage (nephropathy), eye issues, neuropathy, and foot ailments. Therefore, early diagnosis of diabetes can be immensely beneficial in preventing the development of these conditions. Utilizing machine learning is one method to detect diabetes in individuals at an early stage. In this study, we compare the performance of nine machine-learning classification models in predicting diabetes. These models include XGBoost, gradient boosting, AdaBoost, logistic regression, decision tree, KNN, perceptron, random forest, and naïve bayes. We utilize several evaluation metrics, focusing on the f1-score, area under the curve (AUC), and computational runtime. Our comparison reveals that complex tree-based models exhibit the highest f1-score and AUC, albeit with longer execution times.
Downloads
References
Heald, A. H., Stedman, M., Davies, M., Livingston, M., Alshames, R., Lunt, M., ... & Gadsby, R. (2020). Estimating life years lost to diabetes: outcomes from analysis of National Diabetes Audit and Office of National Statistics data. Cardiovascular endocrinology & metabolism, 9(4), 183-185. https://doi.org/10.1097/XCE.0000000000000210
Centers for Disease Control and Prevention. Diabetes. (2023). https://www.cdc.gov/diabetes/basics/diabetes.html/Accessed September 5, 2023.
IDF Diabetes Atlas. Diabetes around the world in 2021. (2021). https://www.diabetesatlas.org/ Accessed March 1, 2023.
American Diabetes Association. Statistics about diabetes. (2023). https://diabetes.org/about-diabetes/statistics/about-diabetes/ Accessed November 2, 2023.
Sun, C., van Soest, J., Koster, A., Eussen, S. J., Schram, M. T., Stehouwer, C. D., ... & Dumontier, M. (2022). Studying the association of diabetes and healthcare cost on distributed data from the Maastricht Study and Statistics Netherlands using a privacy-preserving federated learning infrastructure. Journal of Biomedical Informatics, 134, 104194. https://doi.org/10.1016/j.jbi.2022.104194
Ong, K. L., Stafford, L. K., McLaughlin, S. A., Boyko, E. J., Vollset, S. E., Smith, A. E., ... & Brauer, M. (2023). Global, regional, and national burden of diabetes from 1990 to 2021, with projections of prevalence to 2050: a systematic analysis for the Global Burden of Disease Study 2021. The Lancet, 402(10397), 203-234. https://doi.org/10.1016/S0140-6736(23)01301-6
Jamali, H., Karimi, A., & Haghighizadeh, M. (2018, February). A new method of cloud-based computation model for mobile devices: energy consumption optimization in mobile-to-mobile computation offloading. In Proceedings of the 6th International Conference on Communications and Broadband Networking (pp. 32-37). https://doi.org/10.1145/3193092.3193103
Schön, M., Prystupa, K., Mori, T., Zaharia, O. P., Bódis, K., Bombrich, M., ... & Schrauwen-Hinderling, V. (2024). Analysis of type 2 diabetes heterogeneity with a tree-like representation: insights from the prospective German Diabetes Study and the LURIC cohort. The Lancet Diabetes & Endocrinology, 12(2), 119-131. https://doi.org/10.1016/S2213-8587(23)00329-7
Garbin, C., Marques, N., & Marques, O. (2023). Machine learning for predicting opioid use disorder from healthcare data: a systematic review. Computer Methods and Programs in Biomedicine, 107573. https://doi.org/10.1016/j.cmpb.2023.107573
Gaudelet, T., Day, B., Jamasb, A. R., Soman, J., Regep, C., Liu, G., ... & Taylor-King, J. P. (2021). Utilizing graph machine learning within drug discovery and development. Briefings in bioinformatics, 22(6), bbab159. https://doi.org/10.1093/bib/bbab159
Mbunge, E., & Batani, J. (2023). Application of deep learning and machine learning models to improve healthcare in sub-Saharan Africa: Emerging opportunities, trends and implications. Telematics and Informatics Reports, 100097. https://doi.org/10.1016/j.teler.2023.100097
Ibrahim, Z., Tulay, P., & Abdullahi, J. (2023). Multi-region machine learning-based novel ensemble approaches for predicting COVID-19 pandemic in Africa. Environmental Science and Pollution Research, 30(2), 3621-3643. https://doi.org/10.1007/s11356-022-22373-6
Soltaninejad, M., Aghazadeh, R., Shaghaghi, S., & Zarei, M. (2024). Using Machine Learning Techniques to Forecast Mehram Company's Sales: A Case Study. Journal of Business and Management Studies, 6(2), 42-53.
https://doi.org/10.32996/jbms.2024.6.2.4
Shill, P. C., Wu, R., Jamali, H., Hutchins, B., Dascalu, S., Harris, F. C., & Feil-Seifer, D. (2023). WIP: Development of a Student-Centered Personalized Learning Framework to Advance Undergraduate Robotics Education. In 2023 IEEE Frontiers in Education Conference (FIE) (pp. 1-5). IEEE. https://doi.org/10.1109/FIE58773.2023.10343234
Abubakar, A., Ajuji, M., & Yahya, I. U. (2021). DeepFMD: computational analysis for malaria detection in blood-smear images using deep-learning features. Applied System Innovation, 4(4), 82. https://doi.org/10.3390/asi4040082
Maydanchi, M., Ziaei, A., Basiri, M., Azad, A. N., Pouya, S., Ziaei, M., ... & Sargolzaei, S. (2023). Comparative Study of decision tree, adaboost, random forest, Naïve Bayes, KNN, and perceptron for heart disease prediction. In SoutheastCon 2023 (pp. 204-208). IEEE. https://doi.org/10.1109/SoutheastCon51012.2023.10115189
Haseli, G., Sheikh, R., & Sana, S. S. (2020). Base-criterion on multi-criteria decision-making method and its applications. International journal of management science and engineering management, 15(2), 79-88. https://doi.org/10.1080/17509653.2019.1633964
Haseli, G., & Sheikh, R. (2022). Base criterion method (BCM). In Multiple criteria decision making: Techniques, Analysis and Applications (pp. 17-38). Singapore: Springer Nature Singapore. https://doi.org/10.1007/978-981-16-7414-3_2
Hennebelle, A., Materwala, H., & Ismail, L. (2023). HealthEdge: a machine learning-based smart healthcare framework for prediction of type 2 diabetes in an integrated IoT, edge, and cloud computing system. Procedia Computer Science, 220, 331-338. https://doi.org/10.1016/j.procs.2023.03.043
Toscano-Pulido, G., Razavi, H., Nejadhashemi, A. P., Deb, K., & Linker, L. (2024). Large-Scale Multiobjective Optimization for Watershed Planning and Assessment. IEEE Transactions on Systems, Man, and Cybernetics: Systems. https://doi.org/10.1109/TSMC.2024.3361679
MacKay, C., Klement, W., Vanberkel, P., Lamond, N., Urquhart, R., & Rigby, M. (2023). A framework for implementing machine learning in healthcare based on the concepts of preconditions and postconditions. Healthcare Analytics, 3, 100155. https://doi.org/10.1016/j.health.2023.100155
Jangir, S. K., Joshi, N., Kumar, M., Choubey, D. K., Singh, S., & Verma, M. (2021). Functional link convolutional neural network for the classification of diabetes mellitus. International Journal for Numerical Methods in Biomedical Engineering, 37(8), e3496. https://doi.org/10.1002/cnm.3496
Modak, S. K. S., & Jha, V. K. (2023). Diabetes prediction model using machine learning techniques. Multimedia Tools and Applications, 1-27. https://doi.org/10.1007/s11042-023-16745-4
Tasin, I., Nabil, T. U., Islam, S., & Khan, R. (2023). Diabetes prediction using machine learning and explainable AI techniques. Healthcare technology letters, 10(1-2), 1-10. https://doi.org/10.1049/htl2.12039
Mujumdar, A., & Vaidehi, V. (2019). Diabetes prediction using machine learning algorithms. Procedia Computer Science, 165, 292-299. https://doi.org/10.1016/j.procs.2020.01.047
Lai, H., Huang, H., Keshavjee, K., Guergachi, A., & Gao, X. (2019). Predictive models for diabetes mellitus using machine learning techniques. BMC endocrine disorders, 19, 1-9. https://doi.org/10.1186/s12902-019-0436-6
Sarwar, M. A., Kamal, N., Hamid, W., & Shah, M. A. (2018). Prediction of diabetes using machine learning algorithms in healthcare. In 2018 24th international conference on automation and computing (ICAC) (pp. 1-6). IEEE. https://doi.org/10.23919/IConAC.2018.8748992
Saru, S., & Subashree, S. (2019). Analysis and prediction of diabetes using machine learning. International journal of emerging technology and innovative engineering, 5(4).
Minsky, M., & Papert, S. (1969). An introduction to computational geometry. Cambridge tiass., HIT, 479(480), 104.
Zhang, S., Li, X., Zong, M., Zhu, X., & Cheng, D. (2017). Learning k for knn classification. ACM Transactions on Intelligent Systems and Technology (TIST), 8(3), 1-19. https://doi.org/10.1145/2990508
Langarizadeh, M., & Moghbeli, F. (2016). Applying naive bayesian networks to disease prediction: a systematic review. Acta Informatica Medica, 24(5), 364. https://doi.org/10.5455/aim.2016.24.364-369
Demir, S., & Sahin, E. K. (2023). An investigation of feature selection methods for soil liquefaction prediction based on tree-based ensemble algorithms using AdaBoost, gradient boosting, and XGBoost. Neural Computing and Applications, 35(4), 3173-3190. https://doi.org/10.1007/s00521-022-07856-4
Speiser, J. L., Miller, M. E., Tooze, J., & Ip, E. (2019). A comparison of random forest variable selection methods for classification prediction modeling. Expert systems with applications, 134, 93-101. https://doi.org/10.1016/j.eswa.2019.05.028
Song, Y. Y., & Ying, L. U. (2015). Decision tree methods: applications for classification and prediction. Shanghai archives of psychiatry, 27(2), 130. https://doi.org/ 10.11919/j.issn.1002-0829.215044
Rymarczyk, T., Kozłowski, E., Kłosowski, G., & Niderla, K. (2019). Logistic regression for machine learning in process tomography. Sensors, 19(15), 3400. https://doi.org/10.3390/s19153400
LaValley, M. P. (2008). Logistic regression. Circulation, 117(18), 2395-2399. https://doi.org/10.1161/CIRCULATIONAHA.106.682658
Haseli, G., Ranjbarzadeh, R., Hajiaghaei-Keshteli, M., Ghoushchi, S. J., Hasani, A., Deveci, M., & Ding, W. (2023). HECON: Weight assessment of the product loyalty criteria considering the customer decision's halo effect using the convolutional neural networks. Information Sciences, 623, 184-205. https://doi.org/10.1016/j.ins.2022.12.027
Downloads
Published
Issue
Section
License
Copyright (c) 2024 Journal of Operations Intelligence
This work is licensed under a Creative Commons Attribution 4.0 International License.