Comparison of Machine Learning Approaches for Detecting COVID-19-Lockdown-Related Discussions During Recovery and Lockdown Periods
DOI:
https://doi.org/10.31181/jopi1120233Keywords:
COVID-19, Sentiment analysis, Lockdown, Machine Learning, Data balancing, SMOTEAbstract
Ever since COVID-19 was declared a pandemic, governments around the world have implemented numerous phases of lockdown measures to curb the spread of the virus. These lockdown tactics manifest themselves in the form of widespread fear and panic driven by social media discussions. Given that individuals hold diverse opinions about these lockdown measures during and after their completion, positive and negative lockdown-related discussions should be differentiated to further understand the major related issues and to make appropriate messaging and policy choices in the future. We conduct a sentiment analysis (SA) of COVID-19 lockdown-related tweets by using different machine learning (ML) classifiers and then evaluate their performance before and after using the synthetic minority oversampling technique (SMOTE). This research is performed in five phases, starting with data collection, followed by pre-processing the dataset, preparing the dataset by annotation, applying SMOTE, and using ML classifiers. We observe an improvement in accuracy (Acc), as confirmed by the Matthews correlation coefficient (MCC), across most classifiers, except for the k-nearest neighbour (KNN), whose Acc decreased from 0.82 to 0.59 and MCC decreased from 0.544 to 0.279 before and after SMOTE was applied. Despite the potential of SMOTE with some classifiers, this technique cannot be considered an ultimate solution, especially with other classifiers and datasets. The study provides insights into the need to evaluate and benchmark the integration of data balancing approaches with ML classifiers, in addition to considering additional metrics, such as MCC, for binary classification problems, especially in SA.
Downloads
References
Al-Ahmad, B., Al-Zoubi, A. M., Abu Khurma, R., & Aljarah, I. (2021). An evolutionary fake news detection method for COVID-19 pandemic information. Symmetry, 13(6), 1091. https://doi.org/10.3390/sym13061091
Yao, Z., Yang, J., Liu, J., Keith, M., & Guan, C. (2021). Comparing tweet sentiments in megacities using machine learning techniques: In the midst of COVID-19. Cities, 116, 103273.. https://doi.org/10.1016/j.cities.2021.103273
Aljameel, S. S., Alabbad, D. A., Alzahrani, N. A., Alqarni, S. M., Alamoudi, F. A., Babili, L. M., ... & Alshamrani, F. M. (2021). A sentiment analysis approach to predict an individual’s awareness of the precautionary procedures to prevent COVID-19 outbreaks in Saudi Arabia. International journal of environmental research and public health, 18(1), 218.. https://doi.org/10.3390/ijerph18010218
Wahl, B., Cossy-Gantner, A., Germann, S., & Schwalbe, N. R. (2018). Artificial intelligence (AI) and global health: how can AI contribute to health in resource-poor settings?. BMJ global health, 3(4), e000798.. http://dx.doi.org/10.1136/bmjgh-2018-000798
Blakely, T., Thompson, J., Bablani, L., Andersen, P., Ouakrim, D. A., Carvalho, N., ... & Stevenson, M. (2021, July). Association of simulated COVID-19 policy responses for social restrictions and lockdowns with health-adjusted life-years and costs in Victoria, Australia. In JAMA Health Forum (Vol. 2, No. 7, pp. e211749-e211749). American Medical Association. 10.1001/jamahealthforum.2021.1749
Blakely, T., Thompson, J., Bablani, L., Andersen, P., Ouakrim, D. A., Carvalho, N., ... & Stevenson, M. (2021, July). Association of simulated COVID-19 policy responses for social restrictions and lockdowns with health-adjusted life-years and costs in Victoria, Australia. In JAMA Health Forum (Vol. 2, No. 7, pp. e211749-e211749). American Medical Association.. 10.1001/jamahealthforum.2021.1749
Basile, V., Cauteruccio, F., & Terracina, G. (2021). How dramatic events can affect emotionality in social posting: The impact of COVID-19 on Reddit. Future Internet, 13(2), 29. https://doi.org/10.3390/fi13020029
Antonakaki, D., Fragopoulou, P., & Ioannidis, S. (2021). A survey of Twitter research: Data model, graph structure, sentiment analysis and attacks. Expert Systems with Applications, 164, 114006. https://doi.org/10.1016/j.eswa.2020.114006
Haque, M., Haque, I. E., Ziku, M. N. E. A., Ahamed, N., & Hossain, M. S. (2021). COVID-19 Pandemic and Its Effects on Youth Mental Health in Bangladesh. Malaysian Journal of Social Sciences and Humanities (MJSSH), 6(10), 365-377. https://doi.org/10.47405/mjssh.v6i10.1071
I. Lauriola, A. Lavelli, and F. Aiolli, "An Introduction to Deep Learning in Natural Language Processing: Models, Techniques, and Tools," Neurocomputing, 2021. https://doi.org/10.1016/j.neucom.2021.05.103
P. Tyagi and R. Tripathi, "A review towards the sentiment analysis techniques for the analysis of twitter data," in Proceedings of 2nd international conference on advanced computing and software engineering (ICACSE), 2019. https://dx.doi.org/10.2139/ssrn.3349569
Lauriola, I., Lavelli, A., & Aiolli, F. (2022). An introduction to deep learning in natural language processing: Models, techniques, and tools. Neurocomputing, 470, 443-456.. https://doi.org/10.1007/s10115-018-1236-4
Alamoodi, A. H., Zaidan, B. B., Zaidan, A. A., Albahri, O. S., Mohammed, K. I., Malik, R. Q., ... & Alaa, M. (2021). Sentiment analysis and its applications in fighting COVID-19 and infectious diseases: A systematic review. Expert systems with applications, 167, 114155. https://doi.org/10.1016/j.eswa.2020.114155
Keramatfar, A., & Amirkhani, H. (2019). Bibliometrics of sentiment analysis literature. Journal of Information Science, 45(1), 3-15. https://doi.org/10.1177/0165551518761013
Roccetti, M., Marfia, G., Salomoni, P., Prandi, C., Zagari, R. M., Kengni, F. L. G., ... & Montagnani, M. (2017). Attitudes of Crohn’s disease patients: infodemiology case study and sentiment analysis of Facebook and Twitter posts. JMIR public health and surveillance, 3(3), e7004. https://doi.org/10.2196/publichealth.7004
Alamoodi, A. H., Zaidan, B. B., Al-Masawa, M., Taresh, S. M., Noman, S., Ahmaro, I. Y., ... & Salahaldin, A. (2021). Multi-perspectives systematic review on the applications of sentiment analysis for vaccine hesitancy. Computers in Biology and Medicine, 139, 104957. https://doi.org/10.1016/j.compbiomed.2021.104957
Jang, H., Rempel, E., Roth, D., Carenini, G., & Janjua, N. Z. (2021). Tracking COVID-19 discourse on twitter in North America: Infodemiology study using topic modeling and aspect-based sentiment analysis. Journal of medical Internet research, 23(2), e25431. https://doi.org/10.2196/25431
Samuel, J., Ali, G. M. N., Rahman, M. M., Esawi, E., & Samuel, Y. (2020). Covid-19 public sentiment insights and machine learning for tweets classification. Information, 11(6), 314. https://doi.org/10.3390/info11060314
Ghasiya, P., & Okamura, K. (2021). Investigating COVID-19 news across four nations: A topic modeling and sentiment analysis approach. Ieee Access, 9, 36645-36656. https://doi.org/10.1109/ACCESS.2021.3062875
Obiedat, R., Harfoushi, O., Qaddoura, R., Al-Qaisi, L., & Al-Zoubi, A. M. (2021). An evolutionary-based sentiment analysis approach for enhancing government decisions during COVID-19 pandemic: The case of jordan. Applied Sciences, 11(19), 9080. https://doi.org/10.3390/app11199080
Cotfas, L. A., Delcea, C., Roxin, I., Ioanăş, C., Gherai, D. S., & Tajariol, F. (2021). The longest month: analyzing COVID-19 vaccination opinions dynamics from tweets in the month following the first vaccine announcement. Ieee Access, 9, 33203-33223. https://doi.org/10.1109/ACCESS.2021.3059821
Mujahid, M., Lee, E., Rustam, F., Washington, P. B., Ullah, S., Reshi, A. A., & Ashraf, I. (2021). Sentiment analysis and topic modeling on tweets about online education during COVID-19. Applied Sciences, 11(18), 8438. https://doi.org/10.3390/app11188438
To, Q. G., To, K. G., Huynh, V. A. N., Nguyen, N. T., Ngo, D. T., Alley, S. J., ... & Vandelanotte, C. (2021). Applying machine learning to identify anti-vaccination tweets during the COVID-19 pandemic. International journal of environmental research and public health, 18(8), 4069. https://doi.org/10.3390/ijerph18084069
Aljabri, M., Chrouf, S. M. B., Alzahrani, N. A., Alghamdi, L., Alfehaid, R., Alqarawi, R., ... & Alduhailan, N. (2021). Sentiment analysis of Arabic tweets regarding distance learning in Saudi Arabia during the COVID-19 pandemic. Sensors, 21(16), 5431. https://doi.org/10.3390/s21165431
Rustam, F., Khalid, M., Aslam, W., Rupapara, V., Mehmood, A., & Choi, G. S. (2021). A performance comparison of supervised machine learning models for Covid-19 tweets sentiment analysis. Plos one, 16(2), e0245909. https://doi.org/10.1371/journal.pone.0245909
Gulati, K., Kumar, S. S., Boddu, R. S. K., Sarvakar, K., Sharma, D. K., & Nomani, M. Z. M. (2022). Comparative analysis of machine learning-based classification models using sentiment classification of tweets related to COVID-19 pandemic. Materials Today: Proceedings, 51, 38-41. https://doi.org/10.1016/j.matpr.2021.04.364
Shahana, P. H., & Omman, B. (2015). Evaluation of features on sentimental analysis. Procedia Computer Science, 46, 1585-1592. https://doi.org/10.1016/j.procs.2015.02.088
Nielsen, F. Å. (2011). A new ANEW: Evaluation of a word list for sentiment analysis in microblogs. arXiv preprint arXiv:1103.2903. https://doi.org/10.48550/arXiv.1103.2903
Ali, K., Dong, H., Bouguettaya, A., Erradi, A., & Hadjidj, R. (2017, June). Sentiment analysis as a service: a social media based sentiment analysis framework. In 2017 IEEE international conference on web services (ICWS) (pp. 660-667). IEEE. https://doi.org/10.1109/ICWS.2017.79
Abraham, J., Higdon, D., Nelson, J., & Ibarra, J. (2018). Cryptocurrency price prediction using tweet volumes and sentiment analysis. SMU Data Science Review, 1(3), 1. https://scholar.smu.edu/datasciencereview/vol1/iss3/1
Sakaki, T., Okazaki, M., & Matsuo, Y. (2010, April). Earthquake shakes twitter users: real-time event detection by social sensors. In Proceedings of the 19th international conference on World wide web (pp. 851-860). https://doi.org/10.1145/1772690.1772777
Chew, C., & Eysenbach, G. (2010). Pandemics in the age of Twitter: content analysis of Tweets during the 2009 H1N1 outbreak. PloS one, 5(11), e14118. https://doi.org/10.1371/journal.pone.0014118
Jain, V. K., & Kumar, S. (2015). An effective approach to track levels of influenza-A (H1N1) pandemic in India using twitter. Procedia Computer Science, 70, 801-807. https://doi.org/10.1016/j.procs.2015.10.120
Ainin, S., Feizollah, A., Anuar, N. B., & Abdullah, N. A. (2020). Sentiment analyses of multilingual tweets on halal tourism. Tourism Management Perspectives, 34, 100658. https://doi.org/10.1016/j.tmp.2020.100658
Reyes-Menendez, A., Saura, J. R., & Filipe, F. (2020). Marketing challenges in the# MeToo era: Gaining business insights using an exploratory sentiment analysis. Heliyon, 6(3). https://doi.org/10.1016/j.heliyon.2020.e03626
Hassan, S. U., Aljohani, N. R., Idrees, N., Sarwar, R., Nawaz, R., Martínez-Cámara, E., ... & Herrera, F. (2020). Predicting literature’s early impact with sentiment analysis in Twitter. Knowledge-Based Systems, 192, 105383. https://doi.org/10.1016/j.knosys.2019.105383
Al-Hashedi, A., Al-Fuhaidi, B., Mohsen, A. M., Ali, Y., Gamal Al-Kaf, H. A., Al-Sorori, W., & Maqtary, N. (2022). Ensemble classifiers for Arabic sentiment analysis of social network (Twitter data) towards COVID-19-related conspiracy theories. Applied Computational Intelligence and Soft Computing, 2022, 1-10. https://doi.org/10.1155/2022/6614730
Alenezi, M. N., & Alqenaei, Z. M. (2021). Machine learning in detecting covid-19 misinformation on twitter. Future Internet, 13(10), 244. https://doi.org/10.3390/fi13100244
Alabrah, A., Alawadh, H. M., Okon, O. D., Meraj, T., & Rauf, H. T. (2022). Gulf countries’ citizens’ acceptance of COVID-19 vaccines—A machine learning approach. Mathematics, 10(3), 467. https://doi.org/10.3390/math10030467
Fabian, P. (2011). Scikit-learn: Machine learning in Python. Journal of machine learning research 12, 2825-2830. https://cir.nii.ac.jp/crid/1370005891170856713
Kowsari, K., Jafari Meimandi, K., Heidarysafa, M., Mendu, S., Barnes, L., & Brown, D. (2019). Text classification algorithms: A survey. Information, 10(4), 150. https://doi.org/10.3390/info10040150
Tyagi, A., & Sharma, N. (2018). Sentiment analysis using logistic regression and effective word score heuristic. International Journal of Engineering and Technology (UAE), 7(2), 20-23. https://www.researchgate.net/publication/325101249
Jalal, N., Mehmood, A., Choi, G. S., & Ashraf, I. (2022). A novel improved random forest for text classification using feature ranking and optimal number of trees. Journal of King Saud University-Computer and Information Sciences, 34(6), 2733-2742. https://doi.org/10.1016/j.jksuci.2022.03.012
Al Amrani, Y., Lazaar, M., & El Kadiri, K. E. (2018). Random forest and support vector machine based hybrid approach to sentiment analysis. Procedia Computer Science, 127, 511-520. https://doi.org/10.1016/j.procs.2018.01.150
Misra, S., Li, H., & He, J. (2020). Noninvasive fracture characterization based on the classification of sonic wave travel times.Machine Learning for Subsurface Characterization, 243-287.
Bayhaqy, A., Sfenrianto, S., Nainggolan, K., & Kaburuan, E. R. (2018, October). Sentiment analysis about E-commerce from tweets using decision tree, K-nearest neighbor, and naïve bayes. In 2018 international conference on orange technologies (ICOT) (pp. 1-6). IEEE. https://doi.org/10.1109/ICOT.2018.8705796
Vijayan, V. K., Bindu, K. R., & Parameswaran, L. (2017, September). A comprehensive study of text classification algorithms. In 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI) (pp. 1109-1113). IEEE. https://doi.org/10.1109/ICACCI.2017.8125990
Buldin, I. D., & Ivanov, N. S. (2020, January). Text classification of illegal activities on onion sites. In 2020 IEEE Conference of Russian Young Researchers in Electrical and Electronic Engineering (EIConRus) (pp. 245-247). IEEE. https://doi.org/10.1109/EIConRus49466.2020.9039341
Chan, L., & Martens, B. (2007). Openness in Digital Publishing: Awareness, Discovery and Access in ELPUB, 2007, 349-360.
Sharma, A., & Dey, S. (2012, October). A comparative study of feature selection and machine learning techniques for sentiment analysis. In Proceedings of the 2012 ACM research in applied computation symposium (pp. 1-7). https://doi.org/10.1145/2401603.2401605
Sharma, A., & Dey, S. (2012, October). A comparative study of feature selection and machine learning techniques for sentiment analysis. In Proceedings of the 2012 ACM research in applied computation symposium (pp. 1-7). https://doi.org/10.1109/MLBDBI48998.2019.00062
Hama Aziz, R. H., & Dimililer, N. (2021). SentiXGboost: enhanced sentiment analysis in social media posts with ensemble XGBoost classifier. Journal of the Chinese Institute of Engineers, 44(6), 562-572. https://doi.org/10.1080/02533839.2021.1933598
Wang, C., Deng, C., & Wang, S. (2020). Imbalance-XGBoost: leveraging weighted and focal losses for binary label-imbalanced classification with XGBoost. Pattern Recognition Letters, 136, 190-197. https://doi.org/10.1016/j.patrec.2020.05.035
Wang, C., Deng, C., & Wang, S. (2020). Imbalance-XGBoost: leveraging weighted and focal losses for binary label-imbalanced classification with XGBoost. Pattern Recognition Letters, 136, 190-197. https://doi.org/10.1109/BigData.2017.8258507
Erbek, F. S., Özkan, C., & Taberner, M. (2004). Comparison of maximum likelihood classification method with supervised artificial neural network algorithms for land use activities. International journal of remote sensing, 25(9), 1733-1748. https://doi.org/10.1080/0143116031000150077
Almaghrabi, M., & Chetty, G. (2020, October). Improving sentiment analysis in Arabic and English languages by using multi-layer perceptron model (MLP). In 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA) (pp. 745-746). IEEE. https://doi.org/10.1109/DSAA49011.2020.00095
Agarwal, S. (2013). Data mining: Data mining concepts and techniques. In 2013 international conference on machine intelligence and research advancement (pp. 203-207). IEEE. https://doi.org/10.1109/ICMIRA.2013.45
Chicco, D., Tötsch, N., & Jurman, G. (2021). The Matthews correlation coefficient (MCC) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation. BioData mining, 14(1), 1-22. https://doi.org/10.1186/s13040-021-00244-z
Lohar, P., Xie, G., Bendechache, M., Brennan, R., Celeste, E., Trestian, R., & Tal, I. (2021, August). Irish attitudes toward COVID tracker app & privacy: sentiment analysis on Twitter and survey data. In Proceedings of the 16th International Conference on Availability, Reliability and Security (pp. 1-8). https://doi.org/10.1145/3465481.3469193
Yimam, S. M., Alemayehu, H. M., Ayele, A., & Biemann, C. (2020, December). Exploring amharic sentiment analysis from social media texts: Building annotation tools and classification models. In Proceedings of the 28th International Conference on Computational Linguistics (pp. 1048-1060). http://dx.doi.org/10.18653/v1/2020.coling-main.91
Rupapara, V., Rustam, F., Shahzad, H. F., Mehmood, A., Ashraf, I., & Choi, G. S. (2021). Impact of SMOTE on imbalanced text features for toxic comments classification using RVVC model. IEEE Access, 9, 78621-78634. https://doi.org/10.1109/ACCESS.2021.3083638
Zhang, W., Yoshida, T., & Tang, X. (2011). A comparative study of TF* IDF, LSI and multi-words for text classification. Expert systems with applications, 38(3), 2758-2765. https://doi.org/10.1016/j.eswa.2010.08.066
Chicco, D., & Jurman, G. (2020). The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC genomics, 21(1), 1-13.. https://doi.org/10.1186/s12864-019-6413-7
Published
Data Availability Statement
Dataset is available on request
Issue
Section
License
Copyright (c) 2023 Mohammed Rashad Baker, A.H. Alamoodi, O.S. Albahri, A.S. Albahri , Salem Garfan, Amneh Alamleh , Moceheb Lazam Shuwandy, Ibrahim Alshakhatreh (Author)

This work is licensed under a Creative Commons Attribution 4.0 International License.