Comparison of Machine Learning Approaches for Detecting COVID-19-Lockdown-Related Discussions During Recovery and Lockdown Periods

Authors

DOI:

https://doi.org/10.31181/jopi1120233

Keywords:

COVID-19, Sentiment analysis, Lockdown, Machine Learning, Data balancing, SMOTE

Abstract

Ever since COVID-19 was declared a pandemic, governments around the world have implemented numerous phases of lockdown measures to curb the spread of the virus. These lockdown tactics manifest themselves in the form of widespread fear and panic driven by social media discussions. Given that individuals hold diverse opinions about these lockdown measures during and after their completion, positive and negative lockdown-related discussions should be differentiated to further understand the major related issues and to make appropriate messaging and policy choices in the future. We conduct a sentiment analysis (SA) of COVID-19 lockdown-related tweets by using different machine learning (ML) classifiers and then evaluate their performance before and after using the synthetic minority oversampling technique (SMOTE). This research is performed in five phases, starting with data collection, followed by pre-processing the dataset, preparing the dataset by annotation, applying SMOTE, and using ML classifiers. We observe an improvement in accuracy (Acc), as confirmed by the Matthews correlation coefficient (MCC), across most classifiers, except for the k-nearest neighbour (KNN), whose Acc decreased from 0.82 to 0.59 and MCC decreased from 0.544 to 0.279 before and after SMOTE was applied. Despite the potential of SMOTE with some classifiers, this technique cannot be considered an ultimate solution, especially with other classifiers and datasets. The study provides insights into the need to evaluate and benchmark the integration of data balancing approaches with ML classifiers, in addition to considering additional metrics, such as MCC, for binary classification problems, especially in SA.

Downloads

Download data is not yet available.

Author Biographies

  • Mohammed Rashad Baker, Software Department, College of Computer Science and Information Technology, University of Kirkuk, Kirkuk, Iraq

    .

  • A.H. Alamoodi, Faculty of Computing and Meta-Technology (FKMT), Universiti Pendidikan Sultan Idris (UPSI), Perak, Malaysia

    .

  • O.S. Albahri, Victorian Institute of Technology, Australia

    .

  • A.S. Albahri , Department of Computer Technology Engineering, College of Information Technology, Imam Ja'afar Al-Sadiq University, Baghdad, Iraq

    .

  • Salem Garfan, Faculty of Computing and Meta-Technology (FKMT), Universiti Pendidikan Sultan Idris (UPSI), Perak, Malaysia

    .

  • Amneh Alamleh , Department of Artificial Intelligence, Faculty of Information Technology, Zarqa University, Zarqa, Jordan

    .

  • Moceheb Lazam Shuwandy, Computer Science Department, College of Computer Science and Mathematics, Tikrit University (TU), Tikrit, Iraq

    .

  • Ibrahim Alshakhatreh, College of management, department of business administration, National Yunlin University of Science and Technology, Yunlin, Taiwan

    .

References

Al-Ahmad, B., Al-Zoubi, A. M., Abu Khurma, R., & Aljarah, I. (2021). An evolutionary fake news detection method for COVID-19 pandemic information. Symmetry, 13(6), 1091. https://doi.org/10.3390/sym13061091

Yao, Z., Yang, J., Liu, J., Keith, M., & Guan, C. (2021). Comparing tweet sentiments in megacities using machine learning techniques: In the midst of COVID-19. Cities, 116, 103273.. https://doi.org/10.1016/j.cities.2021.103273

Aljameel, S. S., Alabbad, D. A., Alzahrani, N. A., Alqarni, S. M., Alamoudi, F. A., Babili, L. M., ... & Alshamrani, F. M. (2021). A sentiment analysis approach to predict an individual’s awareness of the precautionary procedures to prevent COVID-19 outbreaks in Saudi Arabia. International journal of environmental research and public health, 18(1), 218.. https://doi.org/10.3390/ijerph18010218

Wahl, B., Cossy-Gantner, A., Germann, S., & Schwalbe, N. R. (2018). Artificial intelligence (AI) and global health: how can AI contribute to health in resource-poor settings?. BMJ global health, 3(4), e000798.. http://dx.doi.org/10.1136/bmjgh-2018-000798

Blakely, T., Thompson, J., Bablani, L., Andersen, P., Ouakrim, D. A., Carvalho, N., ... & Stevenson, M. (2021, July). Association of simulated COVID-19 policy responses for social restrictions and lockdowns with health-adjusted life-years and costs in Victoria, Australia. In JAMA Health Forum (Vol. 2, No. 7, pp. e211749-e211749). American Medical Association. 10.1001/jamahealthforum.2021.1749

Blakely, T., Thompson, J., Bablani, L., Andersen, P., Ouakrim, D. A., Carvalho, N., ... & Stevenson, M. (2021, July). Association of simulated COVID-19 policy responses for social restrictions and lockdowns with health-adjusted life-years and costs in Victoria, Australia. In JAMA Health Forum (Vol. 2, No. 7, pp. e211749-e211749). American Medical Association.. 10.1001/jamahealthforum.2021.1749

Basile, V., Cauteruccio, F., & Terracina, G. (2021). How dramatic events can affect emotionality in social posting: The impact of COVID-19 on Reddit. Future Internet, 13(2), 29. https://doi.org/10.3390/fi13020029

Antonakaki, D., Fragopoulou, P., & Ioannidis, S. (2021). A survey of Twitter research: Data model, graph structure, sentiment analysis and attacks. Expert Systems with Applications, 164, 114006. https://doi.org/10.1016/j.eswa.2020.114006

Haque, M., Haque, I. E., Ziku, M. N. E. A., Ahamed, N., & Hossain, M. S. (2021). COVID-19 Pandemic and Its Effects on Youth Mental Health in Bangladesh. Malaysian Journal of Social Sciences and Humanities (MJSSH), 6(10), 365-377. https://doi.org/10.47405/mjssh.v6i10.1071

I. Lauriola, A. Lavelli, and F. Aiolli, "An Introduction to Deep Learning in Natural Language Processing: Models, Techniques, and Tools," Neurocomputing, 2021. https://doi.org/10.1016/j.neucom.2021.05.103

P. Tyagi and R. Tripathi, "A review towards the sentiment analysis techniques for the analysis of twitter data," in Proceedings of 2nd international conference on advanced computing and software engineering (ICACSE), 2019. https://dx.doi.org/10.2139/ssrn.3349569

Lauriola, I., Lavelli, A., & Aiolli, F. (2022). An introduction to deep learning in natural language processing: Models, techniques, and tools. Neurocomputing, 470, 443-456.. https://doi.org/10.1007/s10115-018-1236-4

Alamoodi, A. H., Zaidan, B. B., Zaidan, A. A., Albahri, O. S., Mohammed, K. I., Malik, R. Q., ... & Alaa, M. (2021). Sentiment analysis and its applications in fighting COVID-19 and infectious diseases: A systematic review. Expert systems with applications, 167, 114155. https://doi.org/10.1016/j.eswa.2020.114155

Keramatfar, A., & Amirkhani, H. (2019). Bibliometrics of sentiment analysis literature. Journal of Information Science, 45(1), 3-15. https://doi.org/10.1177/0165551518761013

Roccetti, M., Marfia, G., Salomoni, P., Prandi, C., Zagari, R. M., Kengni, F. L. G., ... & Montagnani, M. (2017). Attitudes of Crohn’s disease patients: infodemiology case study and sentiment analysis of Facebook and Twitter posts. JMIR public health and surveillance, 3(3), e7004. https://doi.org/10.2196/publichealth.7004

Alamoodi, A. H., Zaidan, B. B., Al-Masawa, M., Taresh, S. M., Noman, S., Ahmaro, I. Y., ... & Salahaldin, A. (2021). Multi-perspectives systematic review on the applications of sentiment analysis for vaccine hesitancy. Computers in Biology and Medicine, 139, 104957. https://doi.org/10.1016/j.compbiomed.2021.104957

Jang, H., Rempel, E., Roth, D., Carenini, G., & Janjua, N. Z. (2021). Tracking COVID-19 discourse on twitter in North America: Infodemiology study using topic modeling and aspect-based sentiment analysis. Journal of medical Internet research, 23(2), e25431. https://doi.org/10.2196/25431

Samuel, J., Ali, G. M. N., Rahman, M. M., Esawi, E., & Samuel, Y. (2020). Covid-19 public sentiment insights and machine learning for tweets classification. Information, 11(6), 314. https://doi.org/10.3390/info11060314

Ghasiya, P., & Okamura, K. (2021). Investigating COVID-19 news across four nations: A topic modeling and sentiment analysis approach. Ieee Access, 9, 36645-36656. https://doi.org/10.1109/ACCESS.2021.3062875

Obiedat, R., Harfoushi, O., Qaddoura, R., Al-Qaisi, L., & Al-Zoubi, A. M. (2021). An evolutionary-based sentiment analysis approach for enhancing government decisions during COVID-19 pandemic: The case of jordan. Applied Sciences, 11(19), 9080. https://doi.org/10.3390/app11199080

Cotfas, L. A., Delcea, C., Roxin, I., Ioanăş, C., Gherai, D. S., & Tajariol, F. (2021). The longest month: analyzing COVID-19 vaccination opinions dynamics from tweets in the month following the first vaccine announcement. Ieee Access, 9, 33203-33223. https://doi.org/10.1109/ACCESS.2021.3059821

Mujahid, M., Lee, E., Rustam, F., Washington, P. B., Ullah, S., Reshi, A. A., & Ashraf, I. (2021). Sentiment analysis and topic modeling on tweets about online education during COVID-19. Applied Sciences, 11(18), 8438. https://doi.org/10.3390/app11188438

To, Q. G., To, K. G., Huynh, V. A. N., Nguyen, N. T., Ngo, D. T., Alley, S. J., ... & Vandelanotte, C. (2021). Applying machine learning to identify anti-vaccination tweets during the COVID-19 pandemic. International journal of environmental research and public health, 18(8), 4069. https://doi.org/10.3390/ijerph18084069

Aljabri, M., Chrouf, S. M. B., Alzahrani, N. A., Alghamdi, L., Alfehaid, R., Alqarawi, R., ... & Alduhailan, N. (2021). Sentiment analysis of Arabic tweets regarding distance learning in Saudi Arabia during the COVID-19 pandemic. Sensors, 21(16), 5431. https://doi.org/10.3390/s21165431

Rustam, F., Khalid, M., Aslam, W., Rupapara, V., Mehmood, A., & Choi, G. S. (2021). A performance comparison of supervised machine learning models for Covid-19 tweets sentiment analysis. Plos one, 16(2), e0245909. https://doi.org/10.1371/journal.pone.0245909

Gulati, K., Kumar, S. S., Boddu, R. S. K., Sarvakar, K., Sharma, D. K., & Nomani, M. Z. M. (2022). Comparative analysis of machine learning-based classification models using sentiment classification of tweets related to COVID-19 pandemic. Materials Today: Proceedings, 51, 38-41. https://doi.org/10.1016/j.matpr.2021.04.364

Shahana, P. H., & Omman, B. (2015). Evaluation of features on sentimental analysis. Procedia Computer Science, 46, 1585-1592. https://doi.org/10.1016/j.procs.2015.02.088

Nielsen, F. Å. (2011). A new ANEW: Evaluation of a word list for sentiment analysis in microblogs. arXiv preprint arXiv:1103.2903. https://doi.org/10.48550/arXiv.1103.2903

Ali, K., Dong, H., Bouguettaya, A., Erradi, A., & Hadjidj, R. (2017, June). Sentiment analysis as a service: a social media based sentiment analysis framework. In 2017 IEEE international conference on web services (ICWS) (pp. 660-667). IEEE. https://doi.org/10.1109/ICWS.2017.79

Abraham, J., Higdon, D., Nelson, J., & Ibarra, J. (2018). Cryptocurrency price prediction using tweet volumes and sentiment analysis. SMU Data Science Review, 1(3), 1. https://scholar.smu.edu/datasciencereview/vol1/iss3/1

Sakaki, T., Okazaki, M., & Matsuo, Y. (2010, April). Earthquake shakes twitter users: real-time event detection by social sensors. In Proceedings of the 19th international conference on World wide web (pp. 851-860). https://doi.org/10.1145/1772690.1772777

Chew, C., & Eysenbach, G. (2010). Pandemics in the age of Twitter: content analysis of Tweets during the 2009 H1N1 outbreak. PloS one, 5(11), e14118. https://doi.org/10.1371/journal.pone.0014118

Jain, V. K., & Kumar, S. (2015). An effective approach to track levels of influenza-A (H1N1) pandemic in India using twitter. Procedia Computer Science, 70, 801-807. https://doi.org/10.1016/j.procs.2015.10.120

Ainin, S., Feizollah, A., Anuar, N. B., & Abdullah, N. A. (2020). Sentiment analyses of multilingual tweets on halal tourism. Tourism Management Perspectives, 34, 100658. https://doi.org/10.1016/j.tmp.2020.100658

Reyes-Menendez, A., Saura, J. R., & Filipe, F. (2020). Marketing challenges in the# MeToo era: Gaining business insights using an exploratory sentiment analysis. Heliyon, 6(3). https://doi.org/10.1016/j.heliyon.2020.e03626

Hassan, S. U., Aljohani, N. R., Idrees, N., Sarwar, R., Nawaz, R., Martínez-Cámara, E., ... & Herrera, F. (2020). Predicting literature’s early impact with sentiment analysis in Twitter. Knowledge-Based Systems, 192, 105383. https://doi.org/10.1016/j.knosys.2019.105383

Al-Hashedi, A., Al-Fuhaidi, B., Mohsen, A. M., Ali, Y., Gamal Al-Kaf, H. A., Al-Sorori, W., & Maqtary, N. (2022). Ensemble classifiers for Arabic sentiment analysis of social network (Twitter data) towards COVID-19-related conspiracy theories. Applied Computational Intelligence and Soft Computing, 2022, 1-10. https://doi.org/10.1155/2022/6614730

Alenezi, M. N., & Alqenaei, Z. M. (2021). Machine learning in detecting covid-19 misinformation on twitter. Future Internet, 13(10), 244. https://doi.org/10.3390/fi13100244

Alabrah, A., Alawadh, H. M., Okon, O. D., Meraj, T., & Rauf, H. T. (2022). Gulf countries’ citizens’ acceptance of COVID-19 vaccines—A machine learning approach. Mathematics, 10(3), 467. https://doi.org/10.3390/math10030467

Fabian, P. (2011). Scikit-learn: Machine learning in Python. Journal of machine learning research 12, 2825-2830. https://cir.nii.ac.jp/crid/1370005891170856713

Kowsari, K., Jafari Meimandi, K., Heidarysafa, M., Mendu, S., Barnes, L., & Brown, D. (2019). Text classification algorithms: A survey. Information, 10(4), 150. https://doi.org/10.3390/info10040150

Tyagi, A., & Sharma, N. (2018). Sentiment analysis using logistic regression and effective word score heuristic. International Journal of Engineering and Technology (UAE), 7(2), 20-23. https://www.researchgate.net/publication/325101249

Jalal, N., Mehmood, A., Choi, G. S., & Ashraf, I. (2022). A novel improved random forest for text classification using feature ranking and optimal number of trees. Journal of King Saud University-Computer and Information Sciences, 34(6), 2733-2742. https://doi.org/10.1016/j.jksuci.2022.03.012

Al Amrani, Y., Lazaar, M., & El Kadiri, K. E. (2018). Random forest and support vector machine based hybrid approach to sentiment analysis. Procedia Computer Science, 127, 511-520. https://doi.org/10.1016/j.procs.2018.01.150

Misra, S., Li, H., & He, J. (2020). Noninvasive fracture characterization based on the classification of sonic wave travel times.Machine Learning for Subsurface Characterization, 243-287.

Bayhaqy, A., Sfenrianto, S., Nainggolan, K., & Kaburuan, E. R. (2018, October). Sentiment analysis about E-commerce from tweets using decision tree, K-nearest neighbor, and naïve bayes. In 2018 international conference on orange technologies (ICOT) (pp. 1-6). IEEE. https://doi.org/10.1109/ICOT.2018.8705796

Vijayan, V. K., Bindu, K. R., & Parameswaran, L. (2017, September). A comprehensive study of text classification algorithms. In 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI) (pp. 1109-1113). IEEE. https://doi.org/10.1109/ICACCI.2017.8125990

Buldin, I. D., & Ivanov, N. S. (2020, January). Text classification of illegal activities on onion sites. In 2020 IEEE Conference of Russian Young Researchers in Electrical and Electronic Engineering (EIConRus) (pp. 245-247). IEEE. https://doi.org/10.1109/EIConRus49466.2020.9039341

Chan, L., & Martens, B. (2007). Openness in Digital Publishing: Awareness, Discovery and Access in ELPUB, 2007, 349-360.

Sharma, A., & Dey, S. (2012, October). A comparative study of feature selection and machine learning techniques for sentiment analysis. In Proceedings of the 2012 ACM research in applied computation symposium (pp. 1-7). https://doi.org/10.1145/2401603.2401605

Sharma, A., & Dey, S. (2012, October). A comparative study of feature selection and machine learning techniques for sentiment analysis. In Proceedings of the 2012 ACM research in applied computation symposium (pp. 1-7). https://doi.org/10.1109/MLBDBI48998.2019.00062

Hama Aziz, R. H., & Dimililer, N. (2021). SentiXGboost: enhanced sentiment analysis in social media posts with ensemble XGBoost classifier. Journal of the Chinese Institute of Engineers, 44(6), 562-572. https://doi.org/10.1080/02533839.2021.1933598

Wang, C., Deng, C., & Wang, S. (2020). Imbalance-XGBoost: leveraging weighted and focal losses for binary label-imbalanced classification with XGBoost. Pattern Recognition Letters, 136, 190-197. https://doi.org/10.1016/j.patrec.2020.05.035

Wang, C., Deng, C., & Wang, S. (2020). Imbalance-XGBoost: leveraging weighted and focal losses for binary label-imbalanced classification with XGBoost. Pattern Recognition Letters, 136, 190-197. https://doi.org/10.1109/BigData.2017.8258507

Erbek, F. S., Özkan, C., & Taberner, M. (2004). Comparison of maximum likelihood classification method with supervised artificial neural network algorithms for land use activities. International journal of remote sensing, 25(9), 1733-1748. https://doi.org/10.1080/0143116031000150077

Almaghrabi, M., & Chetty, G. (2020, October). Improving sentiment analysis in Arabic and English languages by using multi-layer perceptron model (MLP). In 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA) (pp. 745-746). IEEE. https://doi.org/10.1109/DSAA49011.2020.00095

Agarwal, S. (2013). Data mining: Data mining concepts and techniques. In 2013 international conference on machine intelligence and research advancement (pp. 203-207). IEEE. https://doi.org/10.1109/ICMIRA.2013.45

Chicco, D., Tötsch, N., & Jurman, G. (2021). The Matthews correlation coefficient (MCC) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation. BioData mining, 14(1), 1-22. https://doi.org/10.1186/s13040-021-00244-z

Lohar, P., Xie, G., Bendechache, M., Brennan, R., Celeste, E., Trestian, R., & Tal, I. (2021, August). Irish attitudes toward COVID tracker app & privacy: sentiment analysis on Twitter and survey data. In Proceedings of the 16th International Conference on Availability, Reliability and Security (pp. 1-8). https://doi.org/10.1145/3465481.3469193

Yimam, S. M., Alemayehu, H. M., Ayele, A., & Biemann, C. (2020, December). Exploring amharic sentiment analysis from social media texts: Building annotation tools and classification models. In Proceedings of the 28th International Conference on Computational Linguistics (pp. 1048-1060). http://dx.doi.org/10.18653/v1/2020.coling-main.91

Rupapara, V., Rustam, F., Shahzad, H. F., Mehmood, A., Ashraf, I., & Choi, G. S. (2021). Impact of SMOTE on imbalanced text features for toxic comments classification using RVVC model. IEEE Access, 9, 78621-78634. https://doi.org/10.1109/ACCESS.2021.3083638

Zhang, W., Yoshida, T., & Tang, X. (2011). A comparative study of TF* IDF, LSI and multi-words for text classification. Expert systems with applications, 38(3), 2758-2765. https://doi.org/10.1016/j.eswa.2010.08.066

Chicco, D., & Jurman, G. (2020). The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC genomics, 21(1), 1-13.. https://doi.org/10.1186/s12864-019-6413-7

Published

2023-10-24

Data Availability Statement

Dataset is available on request

How to Cite

Baker, M. R. ., Alamoodi, A., Albahri, O. ., Albahri , A. ., Garfan, S. ., Alamleh , A. ., Shuwandy, M. L. ., & Alshakhatreh, I. . (2023). Comparison of Machine Learning Approaches for Detecting COVID-19-Lockdown-Related Discussions During Recovery and Lockdown Periods. Journal of Operations Intelligence, 1(1), 11-29. https://doi.org/10.31181/jopi1120233