A Comparative Analysis of the Machine Learning Methods for Predicting  Diabetes

Authors

DOI:

https://doi.org/10.31181/jopi21202421

Keywords:

Diabetes Prediction, Machine Learning, Classification Models, Healthcare, Comparative Analysis

Abstract

Diabetes can lead to various health problems and complications, such as cardiovascular disease, kidney damage (nephropathy), eye issues, neuropathy, and foot ailments. Therefore, early diagnosis of diabetes can be immensely beneficial in preventing the development of these conditions. Utilizing machine learning is one method to detect diabetes in individuals at an early stage. In this study, we compare the performance of nine machine-learning classification models in predicting diabetes. These models include XGBoost, gradient boosting, AdaBoost, logistic regression, decision tree, KNN, perceptron, random forest, and naïve bayes. We utilize several evaluation metrics, focusing on the f1-score, area under the curve (AUC), and computational runtime. Our comparison reveals that complex tree-based models exhibit the highest f1-score and AUC, albeit with longer execution times.

Downloads

Download data is not yet available.

References

Heald, A. H., Stedman, M., Davies, M., Livingston, M., Alshames, R., Lunt, M., ... & Gadsby, R. (2020). Estimating life years lost to diabetes: outcomes from analysis of National Diabetes Audit and Office of National Statistics data. Cardiovascular endocrinology & metabolism, 9(4), 183-185. https://doi.org/10.1097/XCE.0000000000000210

Centers for Disease Control and Prevention. Diabetes. (2023). https://www.cdc.gov/diabetes/basics/diabetes.html/Accessed September 5, 2023.

IDF Diabetes Atlas. Diabetes around the world in 2021. (2021). https://www.diabetesatlas.org/ Accessed March 1, 2023.

American Diabetes Association. Statistics about diabetes. (2023). https://diabetes.org/about-diabetes/statistics/about-diabetes/ Accessed November 2, 2023.

Sun, C., van Soest, J., Koster, A., Eussen, S. J., Schram, M. T., Stehouwer, C. D., ... & Dumontier, M. (2022). Studying the association of diabetes and healthcare cost on distributed data from the Maastricht Study and Statistics Netherlands using a privacy-preserving federated learning infrastructure. Journal of Biomedical Informatics, 134, 104194. https://doi.org/10.1016/j.jbi.2022.104194

Ong, K. L., Stafford, L. K., McLaughlin, S. A., Boyko, E. J., Vollset, S. E., Smith, A. E., ... & Brauer, M. (2023). Global, regional, and national burden of diabetes from 1990 to 2021, with projections of prevalence to 2050: a systematic analysis for the Global Burden of Disease Study 2021. The Lancet, 402(10397), 203-234. https://doi.org/10.1016/S0140-6736(23)01301-6

Jamali, H., Karimi, A., & Haghighizadeh, M. (2018, February). A new method of cloud-based computation model for mobile devices: energy consumption optimization in mobile-to-mobile computation offloading. In Proceedings of the 6th International Conference on Communications and Broadband Networking (pp. 32-37). https://doi.org/10.1145/3193092.3193103

Schön, M., Prystupa, K., Mori, T., Zaharia, O. P., Bódis, K., Bombrich, M., ... & Schrauwen-Hinderling, V. (2024). Analysis of type 2 diabetes heterogeneity with a tree-like representation: insights from the prospective German Diabetes Study and the LURIC cohort. The Lancet Diabetes & Endocrinology, 12(2), 119-131. https://doi.org/10.1016/S2213-8587(23)00329-7

Garbin, C., Marques, N., & Marques, O. (2023). Machine learning for predicting opioid use disorder from healthcare data: a systematic review. Computer Methods and Programs in Biomedicine, 107573. https://doi.org/10.1016/j.cmpb.2023.107573

Gaudelet, T., Day, B., Jamasb, A. R., Soman, J., Regep, C., Liu, G., ... & Taylor-King, J. P. (2021). Utilizing graph machine learning within drug discovery and development. Briefings in bioinformatics, 22(6), bbab159. https://doi.org/10.1093/bib/bbab159

Mbunge, E., & Batani, J. (2023). Application of deep learning and machine learning models to improve healthcare in sub-Saharan Africa: Emerging opportunities, trends and implications. Telematics and Informatics Reports, 100097. https://doi.org/10.1016/j.teler.2023.100097

Ibrahim, Z., Tulay, P., & Abdullahi, J. (2023). Multi-region machine learning-based novel ensemble approaches for predicting COVID-19 pandemic in Africa. Environmental Science and Pollution Research, 30(2), 3621-3643. https://doi.org/10.1007/s11356-022-22373-6

Soltaninejad, M., Aghazadeh, R., Shaghaghi, S., & Zarei, M. (2024). Using Machine Learning Techniques to Forecast Mehram Company's Sales: A Case Study. Journal of Business and Management Studies, 6(2), 42-53.

https://doi.org/10.32996/jbms.2024.6.2.4

Shill, P. C., Wu, R., Jamali, H., Hutchins, B., Dascalu, S., Harris, F. C., & Feil-Seifer, D. (2023). WIP: Development of a Student-Centered Personalized Learning Framework to Advance Undergraduate Robotics Education. In 2023 IEEE Frontiers in Education Conference (FIE) (pp. 1-5). IEEE. https://doi.org/10.1109/FIE58773.2023.10343234

Abubakar, A., Ajuji, M., & Yahya, I. U. (2021). DeepFMD: computational analysis for malaria detection in blood-smear images using deep-learning features. Applied System Innovation, 4(4), 82. https://doi.org/10.3390/asi4040082

Maydanchi, M., Ziaei, A., Basiri, M., Azad, A. N., Pouya, S., Ziaei, M., ... & Sargolzaei, S. (2023). Comparative Study of decision tree, adaboost, random forest, Naïve Bayes, KNN, and perceptron for heart disease prediction. In SoutheastCon 2023 (pp. 204-208). IEEE. https://doi.org/10.1109/SoutheastCon51012.2023.10115189

Haseli, G., Sheikh, R., & Sana, S. S. (2020). Base-criterion on multi-criteria decision-making method and its applications. International journal of management science and engineering management, 15(2), 79-88. https://doi.org/10.1080/17509653.2019.1633964

Haseli, G., & Sheikh, R. (2022). Base criterion method (BCM). In Multiple criteria decision making: Techniques, Analysis and Applications (pp. 17-38). Singapore: Springer Nature Singapore. https://doi.org/10.1007/978-981-16-7414-3_2

Hennebelle, A., Materwala, H., & Ismail, L. (2023). HealthEdge: a machine learning-based smart healthcare framework for prediction of type 2 diabetes in an integrated IoT, edge, and cloud computing system. Procedia Computer Science, 220, 331-338. https://doi.org/10.1016/j.procs.2023.03.043

Toscano-Pulido, G., Razavi, H., Nejadhashemi, A. P., Deb, K., & Linker, L. (2024). Large-Scale Multiobjective Optimization for Watershed Planning and Assessment. IEEE Transactions on Systems, Man, and Cybernetics: Systems. https://doi.org/10.1109/TSMC.2024.3361679

MacKay, C., Klement, W., Vanberkel, P., Lamond, N., Urquhart, R., & Rigby, M. (2023). A framework for implementing machine learning in healthcare based on the concepts of preconditions and postconditions. Healthcare Analytics, 3, 100155. https://doi.org/10.1016/j.health.2023.100155

Jangir, S. K., Joshi, N., Kumar, M., Choubey, D. K., Singh, S., & Verma, M. (2021). Functional link convolutional neural network for the classification of diabetes mellitus. International Journal for Numerical Methods in Biomedical Engineering, 37(8), e3496. https://doi.org/10.1002/cnm.3496

Modak, S. K. S., & Jha, V. K. (2023). Diabetes prediction model using machine learning techniques. Multimedia Tools and Applications, 1-27. https://doi.org/10.1007/s11042-023-16745-4

Tasin, I., Nabil, T. U., Islam, S., & Khan, R. (2023). Diabetes prediction using machine learning and explainable AI techniques. Healthcare technology letters, 10(1-2), 1-10. https://doi.org/10.1049/htl2.12039

Mujumdar, A., & Vaidehi, V. (2019). Diabetes prediction using machine learning algorithms. Procedia Computer Science, 165, 292-299. https://doi.org/10.1016/j.procs.2020.01.047

Lai, H., Huang, H., Keshavjee, K., Guergachi, A., & Gao, X. (2019). Predictive models for diabetes mellitus using machine learning techniques. BMC endocrine disorders, 19, 1-9. https://doi.org/10.1186/s12902-019-0436-6

Sarwar, M. A., Kamal, N., Hamid, W., & Shah, M. A. (2018). Prediction of diabetes using machine learning algorithms in healthcare. In 2018 24th international conference on automation and computing (ICAC) (pp. 1-6). IEEE. https://doi.org/10.23919/IConAC.2018.8748992

Saru, S., & Subashree, S. (2019). Analysis and prediction of diabetes using machine learning. International journal of emerging technology and innovative engineering, 5(4).

Minsky, M., & Papert, S. (1969). An introduction to computational geometry. Cambridge tiass., HIT, 479(480), 104.

Zhang, S., Li, X., Zong, M., Zhu, X., & Cheng, D. (2017). Learning k for knn classification. ACM Transactions on Intelligent Systems and Technology (TIST), 8(3), 1-19. https://doi.org/10.1145/2990508

Langarizadeh, M., & Moghbeli, F. (2016). Applying naive bayesian networks to disease prediction: a systematic review. Acta Informatica Medica, 24(5), 364. https://doi.org/10.5455/aim.2016.24.364-369

Demir, S., & Sahin, E. K. (2023). An investigation of feature selection methods for soil liquefaction prediction based on tree-based ensemble algorithms using AdaBoost, gradient boosting, and XGBoost. Neural Computing and Applications, 35(4), 3173-3190. https://doi.org/10.1007/s00521-022-07856-4

Speiser, J. L., Miller, M. E., Tooze, J., & Ip, E. (2019). A comparison of random forest variable selection methods for classification prediction modeling. Expert systems with applications, 134, 93-101. https://doi.org/10.1016/j.eswa.2019.05.028

Song, Y. Y., & Ying, L. U. (2015). Decision tree methods: applications for classification and prediction. Shanghai archives of psychiatry, 27(2), 130. https://doi.org/ 10.11919/j.issn.1002-0829.215044

Rymarczyk, T., Kozłowski, E., Kłosowski, G., & Niderla, K. (2019). Logistic regression for machine learning in process tomography. Sensors, 19(15), 3400. https://doi.org/10.3390/s19153400

LaValley, M. P. (2008). Logistic regression. Circulation, 117(18), 2395-2399. https://doi.org/10.1161/CIRCULATIONAHA.106.682658

Haseli, G., Ranjbarzadeh, R., Hajiaghaei-Keshteli, M., Ghoushchi, S. J., Hasani, A., Deveci, M., & Ding, W. (2023). HECON: Weight assessment of the product loyalty criteria considering the customer decision's halo effect using the convolutional neural networks. Information Sciences, 623, 184-205. https://doi.org/10.1016/j.ins.2022.12.027

Published

2024-05-18

How to Cite

Maydanchi, M., Ziaei, M. ., Mohammadi, . M. ., Ziaei, . A. ., Basiri, M. ., Haji, F. ., & Gharibi, K. . (2024). A Comparative Analysis of the Machine Learning Methods for Predicting  Diabetes. Journal of Operations Intelligence, 2(1), 230-251. https://doi.org/10.31181/jopi21202421