Lekia Nkpordee1, Ibinabo Magnus Ogolo2
1Department of Mathematics and Statistics, Kampala International University, Kampala, Uganda.
2School of Foundation Studies, Rivers State College of Health Science and Management Technology.
Abstract
This research uses sophisticated statistical and machine learning techniques to investigate and mitigate the impact of the 5G Corona virus spread misinformation on the Twitter platform using the COVID-19 Misinformation Tweets Labelled Dataset. The research analyzes the temporal distribution characteristics of the spread of misinformation for specific periods in the day, including morning, afternoon, evening, and night, using both categorical and probabilistic approaches. Sentiment analysis using natural language processing (NLP) is done to establish the emotional content analysis of the tweet, while logistic regression, Random Forest, and Naïve Bayes classifiers are used to establish the predictive model for the likelihood that the tweets are either malicious with the help of predictors such as the number of followers, the number of friends, and the hour the tweets are made. The results indicate the highest number of malicious activities takes place late in the night and early morning periods, with the highest levels taking place in the morning periods, with the highest proportionality levels taking place in the morning periods. The results also indicate that the malicious tweets are slightly more negative than their corresponding counterparts in the emotional content analysis results. Of the three classifiers, Random Forest has the highest classification accuracy (AUC = 0.916) in accurately determining the level of the malicious content spread on the Twitter platform, with the highest efficiency in the spread of misinformation on the Twitter platform.
Keywords: Health misinformation, Social media analytics, Sentiment analysis, Probabilistic modeling, COVID-19 infodemic
References
- Ahmed, W., Vidal-Alaball, J., Downing, J. and López Seguí, F. (2020) ‘COVID-19 and the 5G conspiracy theory: social network analysis of Twitter data’, Journal of Medical Internet Research, 22(5), pp. e19458. https://doi.org/10.2196/19458
- Arashnic, N. (2021) COVID-19 Misinformation Tweets Labeled Dataset [Data set]. Kaggle. Available at: https://www.kaggle.com/datasets/arashnic/misinfo-graph
- do Nascimento, I.J.B., Pizarro, A.B., Almeida, J.M., Azzopardi-Muscat, N., Gonçalves, M.A., Björklund, M. and Novillo-Ortiz, D. (2022) ‘Infodemics and health misinformation: A systematic review of reviews’, Bulletin of the World Health Organization, 100(9), pp. 544–561. https://doi.org/10.2471/BLT.21.287654
- Flaherty, E., Sturm, T. and Farries, E. (2022) ‘The conspiracy of Covid-19 and 5G: spatial analysis fallacies in the age of data democratization’, Social Science & Medicine, 293, 114546. https://doi.org/10.1016/j.socscimed.2021.114546
- Kyabaggu, R., Marshall, D., Ebuwei, P. and Ikenyei, U. (2022) ‘Health literacy, equity, and communication in the COVID-19 era of misinformation: Emergence of health information professionals in infodemic management’, JMIR Infodemiology, 2(1), e35012. https://doi.org/10.2196/35012
- Mouratidis, D. (2025) ‘From misinformation to insight: Machine learning strategies for fake news detection’, Information, 16(3), 189. https://doi.org/10.3390/info16030189
- Ishizumi, A., Kolis, J., Abad, N., Prybylski, D., Brookmeyer, K. A., Voegeli, C., Wardle, C. and Chiou, H. (2024) ‘Beyond misinformation: Developing a public health prevention framework for managing information ecosystems’, Lancet Public Health, 9(6), pp. e397–e406. https://doi.org/10.1016/S2468-2667(24)00031-8
- Townsend, L. and Wallace, C. (2022) ‘Social media research ethics and the law: Considerations for social data analysis’, Journal of Information, Communication and Ethics in Society, 20(3), pp. 345–358.
