Abstract
Despite an abundance of scientific evidence supporting the effectiveness of COVID-19 vaccines, there has been a recent global surge in vaccine hesitancy, primarily driven by the spread of misinformation on social media platforms. It is crucial to address this issue and raise awareness about the importance of vaccination in combating the deadly COVID-19 virus. Predicting community sentiment through social media platforms can provide valuable insights into vaccine hesitancy, aiding health workers and medical professionals in taking necessary precautionary measures. However, the lack of high-quality labeled data presents a challenge for building an effective COVID-19 sentiment classifier. Additionally, the available labeled datasets suffer from severe class imbalance. To address these challenges, this article presents an effective COVID-19 sentiment prediction framework. Firstly, a deep adversarial active learning framework leverages abundant unlabeled data by training autoencoder and discriminator components adversarially to select the most informative unlabeled samples. Secondly, to mitigate the effects of imbalanced labeled datasets, a resampling phase is incorporated into the adversarial training loop. The proposed framework, named Resampling Supported Deep Adversarial Active Learning (RS-DAAL), is rigorously evaluated using two different datasets comprising social media posts from Twitter and Reddit. Various resampling techniques, including undersampling, oversampling, and hybrid methods, are assessed, with oversampling techniques further tested at different levels of resampling. Comparative studies are conducted against a baseline model without any resampling layer and with current state-of-the-art methods as well. Experimental results and statistical analysis demonstrate the superiority of the proposed RS-DAAL method in identifying COVID-19 sentiments on social media platforms.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Abdelwahab, M., & Busso, C. (2019). Active learning for speech emotion recognition using deep neural network. In 2019 8th International conference on affective computing and intelligent interaction (ACII) (pp. 1–7). IEEE.
Aggarwal, U., Popescu, A., & Hudelot, C. (2020). Active learning for imbalanced datasets. In Proceedings of the IEEE/CVF winter conference on applications of computer vision (pp. 1428–1437).
Akpatsa, S. K., Li, Xiaoyu, L., Hang, & Obeng, V.-H. K. S. (2022). Evaluating public sentiment of covid-19 vaccine tweets using machine learning techniques. Informatica 46(1).
Al-Hajri, S., Al-Kuwari, M. G., & Al-Thani, M. H. (2021). The covid-19 vaccine social media challenge: Strategies for addressing vaccine hesitancy in the age of misinformation. Vaccine, 39(29), 3859–3861.
Alam, K. N., Khan, M. S., Dhruba, A. R., Khan, M. M., Al-Amri, J. F., Masud, M, & Rawashdeh, M. (2021). Deep learning-based sentiment analysis of covid-19 vaccination responses from twitter data. Computational and Mathematical Methods in Medicine, 2021.
Alamoodi, A. H., Zaidan, B. B., Al-Masawa, M., Taresh, S. M., Noman, S., Ahmaro, I. Y. Y., Garfan, S., Chen, J., Ahmed, M. A., Zaidan, A. A., et al. (2021a). Multi-perspectives systematic review on the applications of sentiment analysis for vaccine hesitancy. Computers in Biology and Medicine, 139, 104957.
Alamoodi, A. H., Zaidan, B. B., Zaidan, A. A., Albahri, O. S., Mohammed, K. I., Malik, R. Q., Almahdi, E. M., Chyad, M. A., Tareq, Z., Albahri, A. S., et al. (2021b). Sentiment analysis and its applications in fighting covid-19 and infectious diseases: A systematic review. Expert Systems with Applications, 167, 114155.
Alanazi, N. (2021). Opinion mining challenges and case study: Using twitter for sentiment analysis towards Pfizer/BioNTech, Moderna, AstraZeneca/Oxford, and Sputnik COVID-19 Vaccines. Ph.D. thesis, Lamar University-Beaumont.
Amjad, A., Qaiser, S., Anwar, A., Ali, R., et al. (2021). Analysing public sentiments regarding covid-19 vaccines: A sentiment analysis approach. In 2021 IEEE international smart cities conference (ISC2) (pp. 1–7). IEEE.
Ash, J. T., Zhang, C., Krishnamurthy, A., Langford, J., & Agarwal, A. (2019). Deep batch active learning by diverse, uncertain gradient lower bounds. arXiv:1906.03671
Bashar, M. A., & Nayak, R. (2021). Active learning for effectively fine-tuning transfer learning to downstream task. ACM Transactions on Intelligent Systems and Technology (TIST), 12(2), 1–24.
Basiri, M. E., Nemati, S., Abdar, M., Asadi, S., & Rajendra Acharrya, U. (2021). A novel fusion-based deep learning model for sentiment analysis of covid-19 tweets. Knowledge-Based Systems, 228, 107242.
Beck, N., Sivasubramanian, D., Dani, A., Ramakrishnan, G., & Iyer, R. (2021). Effective evaluation of deep active learning on image classification tasks. arXiv:2106.15324
Bhoj, N., Khari, M., & Pandey, B. (2021). Improved identification of negative tweets related to covid-19 vaccination by mitigating class imbalance. In 2021 13th International conference on computational intelligence and communication networks (CICN) (pp. 23–28). IEEE.
Borowska, K., & Stepaniuk, J. (2022). Rough-granular approach in imbalanced bankruptcy data analysis. Procedia Computer Science, 207, 1832–1841.
Cao, P., Zhao, D., & Zaiane, O. R. (2013). An optimized cost-sensitive svm for imbalanced data learning. In Advances in knowledge discovery and data mining: 17th Pacific-Asia conference, PAKDD 2013, Gold Coast, Australia, April 14–17, 2013, proceedings, Part II 17 (pp. 280–292). Springer.
Chakraborty, K., Bhatia, S., Bhattacharyya, S., Platos, J., Bag, R., & Hassanien, A. E. (2020). Sentiment analysis of covid-19 tweets by deep learning classifiers—a study to show how popularity is affecting accuracy in social media. Applied Soft Computing, 97, 106754.
Dash, A., Gamboa, J. C. B., Ahmed, S., Liwicki, M., & Afzal, M. Z. (2017). Tac-gan-text conditioned auxiliary classifier generative adversarial network. arXiv:1703.06412.
Dhiman, G., Vignesh Kumar, A., Nirmalan, R., Sujitha, S., Srihari, K., Yuvaraj, N., Arulprakash, P., & Arshath Raja, R. (2023). Multi-modal active learning with deep reinforcement learning for target feature extraction in multi-media image processing applications. Multimedia Tools and Applications, 82(4), 5343–5367.
Dong, S. (2021). Multi class svm algorithm with active learning for network traffic classification. Expert Systems with Applications, 176, 114885.
Dor, L. E., Halfon, A., Gera, A., Shnarch, E., Dankin, L., Choshen, L., Danilevsky, M., Aharonov, R., Katz, Y., & Slonim, N. (2020). Active learning for bert: An empirical study. In Proceedings of the 2020 conference on empirical methods in natural language processing (EMNLP) (pp. 7949–7962).
Dozat, T. (2016). Incorporating nesterov momentum into adam.
Du, J., Jun, X., Song, H., Liu, X., & Tao, C. (2017). Optimization on machine learning based approaches for sentiment analysis on hpv vaccines related tweets. Journal of Biomedical Semantics, 8(1), 1–7.
Figueroa, R. L., Zeng-Treitler, Q., Ngo, L. H., Goryachev, S., & Wiechmann, E. P. (2012). Active learning for clinical text classification: Is it better than random sampling? Journal of the American Medical Informatics Association, 19(5), 809–816.
Geifman, Y., & El-Yaniv, R. (2017). Deep active learning over the long tail. arXiv:1711.00941
Gissin, D., & Shalev-Shwartz, S. (2019). Discriminative active learning. arXiv:1907.06347
Goudjil, M., Koudil, M., Bedda, M., & Ghoggali, N. (2018). A novel active learning method using svm for text classification. International Journal of Automation and Computing, 15(3), 290–298.
Hacohen, G., Ben-David, S., & Shalev-Shwartz, S. (2022). Active learning on a budget: Opposite strategies suit high and low budgets. In Proceedings of the 38th international conference on machine learning.
Han, W., Fan, R., Wang, L., Feng, R., Li, F., Deng, Z., & Chen, X. (2020). Improving training instance quality in aerial image object detection with a sampling-balance-based multistage network. IEEE Transactions on Geoscience and Remote Sensing.
Huang, Y., Liu, Z., Jiang, M., Xian, Yu., & Ding, X. (2019). Cost-effective vehicle type recognition in surveillance images with deep active learning and web data. IEEE Transactions on Intelligent Transportation Systems, 21(1), 79–86.
Imran, A. S., Daudpota, S. M., Kastrati, Z., & Batra, R. (2020). Cross-cultural polarity and emotion detection using sentiment analysis and deep learning on covid-19 related tweets. IEEE Access, 8, 181074–181090.
Joloudari, J. H., Hussain, S., Nematollahi, M. A., Bagheri, R., Fazl, F., Alizadehsani, R., Lashgari, R., & Talukder, A. (2023). Bert-deep cnn: State of the art for sentiment analysis of covid-19 tweets. Social Network Analysis and Mining, 13(1), 99.
Kim, K., Park, D., Kim, K. I., & Chun, S. Y. (2021). Task-aware variational adversarial active learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 8166–8175).
Kingma, D. P., & Welling, M. (2013). Auto-encoding variational bayes. arXiv:1312.6114
Kuncheva, L. I., Arnaiz-González, Á., Díez-Pastor, J.-F., & Gunn, I. A. D. (2019). Instance selection improves geometric mean accuracy: A study on imbalanced data classification. Progress in Artificial Intelligence, 8(2), 215–228.
Kunneman, F., Lambooij, M., Wong, A., van den Bosch, A., & Mollema, L. (2020). Monitoring stance towards vaccination in twitter messages. BMC Medical Informatics and Decision Making, 20(1), 1–14.
Kwolek, B., Koziarski, M., Bukała, A., Antosz, Z., Olborski, B., Wąsowicz, P., Swadźba, J., & Cyganek, B. (2019). Breast cancer classification on histopathological images affected by data imbalance using active learning and deep convolutional neural network. In International conference on artificial neural networks (pp. 299–312). Springer.
Li, Y., Fan, B., Zhang, W., Ding, W., & Yin, J. (2021). Deep active learning for object detection. Information Sciences, 579, 418–433.
Liu, J., Cao, L., & Tian, Y. (2020a). Deep active learning for effective pulmonary nodule detection. In International conference on medical image computing and computer-assisted intervention (pp. 609–618). Springer.
Liu, M., Tu, Z., Wang, Z., & Xu, X. (2020b). Ltp: A new active learning strategy for bert-crf based named entity recognition. arXiv:2001.02524
Liu, Y., Li, Z., Zhou, C., Jiang, Y., Sun, J., Wang, M., & He, X. (2019). Generative adversarial active learning for unsupervised outlier detection. IEEE Transactions on Knowledge and Data Engineering, 32(8), 1517–1528.
Longpre, S., Reisler, J., Huang, E. Greg, L., Yi, F., Andrew, R., Nikhil, & DuBois, C. (2022). Active learning over multiple domains in natural language tasks. arXiv:2202.00254
Luo, J., Wang, J., Cheng, N., & Xiao, J. (2021). Loss prediction: End-to-end active learning approach for speech recognition. In 2021 International joint conference on neural networks (IJCNN) (pp. 1–7). IEEE.
Lwin, M. O., Jiahui, L., Sheldenkar, A., Schulz, P. J., Shin, W., Gupta, R., & Yang, Y. (2020). Global sentiments surrounding the covid-19 pandemic on twitter: Analysis of twitter trends. JMIR Public Health and Surveillance, 6(2), e19447.
Mayer, C., & Timofte, R. (2020). Adversarial sampling for active learning. In Proceedings of the IEEE/CVF winter conference on applications of computer vision (pp. 3071–3079).
Miller, B., Linder, F., & Mebane, W. R. (2020). Active learning approaches for labeling text: Review and assessment of the performance of active learning approaches. Political Analysis, 28(4), 532–551.
Mitchell, A., Jurkowitz, M., Baxter Oliphant, J., & Shearer, E. (2021). The connection between social media use and vaccine hesitancy. Salon.
Mittal, S., Tatarchenko, M., Çiçek, Ö., & Brox, T. (2019). Parting with illusions about deep active learning. arXiv:1912.05361
Mottaghi, A, & Yeung, S. (2019) Adversarial representation active learning. arXiv:1912.09720
Müller, M., Salathé, M., & Kummervold, P. E. (2020). Covid-twitter-bert: A natural language processing model to analyse covid-19 content on twitter. arXiv:2005.07503
Muqtadiroh, F. A., Purwitasari, D., Yuniarno, E. M., Nugroho, S. M. S., & Purnomo, M. H. (2021). Analysis the opinion of school-from-home during the covid-19 pandemic using lstm approach. In 2021 International seminar on intelligent technology and its applications (ISITIA) (pp. 408–413). IEEE.
Nam, J. G., Park, S., Hwang, E. J., Lee, J. H., Jin, K.-N., Lim, K. Y., Vu, T. H., Sohn, J. H., Hwang, S., Goo, J. M., et al. (2019). Development and validation of deep learning-based automatic detection algorithm for malignant pulmonary nodules on chest radiographs. Radiology, 290(1), 218–228.
Naseem, U., Khushi, M., Khan, S. K., Shaukat, K., & Moni, M. A. (2021). A comparative analysis of active learning for biomedical text mining. Applied System Innovation, 4(1), 23.
Naseem, U., Razzak, I., Khushi, M., Eklund, P. W., & Kim, J. (2021). Covidsenti: A large-scale benchmark twitter data set for covid-19 sentiment analysis. IEEE Transactions on Computational Social Systems.
Noor, S., Guo, Y., Shah, S. H. H., Fournier-Viger, P., & Saqib Nawaz, M. (2020). Analysis of public reactions to the novel coronavirus (covid-19) outbreak on twitter. Kybernetes.
Nwafor, E., Vaughan, R., & Kolimago, C.. (2021). Covid vaccine sentiment analysis by geographic region. In 2021 IEEE international conference on big data (big data) (pp. 4401–4404). IEEE.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, E. (2011). Scikit-learn: Machine learning in python. Journal of Machine Learning Research, 12, 2825–2830.
Peris, A., & Casacuberta, F. (2018). Active learning for interactive neural machine translation of data streams. arXiv:1807.11243
Prabhu, S., Mohamed, M., & Misra, H. (2021). Multi-class text classification using bert-based active learning. arXiv:2104.14289
Prabucki, T. P. (2021). Sentiment analysis of sars-cov-2 vaccination tweets using deep neural networks.
Preda, G. (2021). All covid-19 vaccines tweets.
Rahman, Md., Islam, M. N., et al. (2022). Exploring the performance of ensemble machine learning classifiers for sentiment analysis of covid-19 tweets. In Sentimental analysis and deep learning (pp. 383–396). Springer.
Ren, J., Wang, Y., Mao, M., & Cheung, Y. (2022). Equalization ensemble for large scale highly imbalanced data classification. Knowledge-Based Systems, 242, 108295.
Ren, P., Xiao, Y., Chang, X., Huang, P.-Y., Li, Z., Gupta, B. B., Chen, X., & Wang, X. (2021). A survey of deep active learning. ACM Computing Surveys (CSUR), 54(9), 1–40.
Sahan, M., Smidl, V., & Marik, R.. (2021). Active learning for text classification and fake news detection. In 2021 International symposium on computer science and intelligent controls (ISCSIC) (pp. 87–94). IEEE.
Sattar, N. S., & Arifuzzaman, S. (2021). Covid-19 vaccination awareness and aftermath: Public sentiment analysis on twitter data and vaccinated population prediction in the USA. Applied Sciences, 11(13), 6128.
Shui, C., Zhou, F., Gagné, C., & Wang, B.. (2020). Deep active learning: Unified and principled method for query and training. In International conference on artificial intelligence and statistics (pp. 1308–1318). PMLR.
Siddhant, A., & Lipton, Z. C. (2018). Deep Bayesian active learning for natural language processing: Results of a large-scale empirical study. arXiv:1808.05697
Sinha, S., Ebrahimi, S., & Darrell, T. (2019). Variational adversarial active learning. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 5972–5981).
Sourbier, N., Bonnot, J., Majorczyk, F., Gesny, O., Guyet, T., & Pelcat, M. (2022). Imbalanced classification with tpg genetic programming: Impact of problem imbalance and selection mechanisms. In Proceedings of the genetic and evolutionary computation conference companion (pp. 608–611).
Stafanovičs, A., Bergmanis, T., & Pinnis, M. (2020). Mitigating gender bias in machine translation with target gender annotations. arXiv:2010.06203
Stark, F., Hazırbas, C., Triebel, R., & Cremers, D. (2015). Captcha recognition with active deep learning. In Workshop new challenges in neural computation (Vol. 2015, p. 94). Citeseer.
To, Q. G., To, K. G., Huynh, V.-A.N., Nguyen, N. T. Q., Ngo, D. T. N., Alley, S. J., Tran, A. N. Q., Tran, A. N. P., Pham, N. T. T., Bui, T. X., et al. (2021). Applying machine learning to identify anti-vaccination tweets during the covid-19 pandemic. International Journal of Environmental Research and Public Health, 18(8), 4069.
Tran, T., Do, T.-T., Reid, I., & Carneiro, G. (2019). Bayesian generative active deep learning. In International conference on machine learning (pp. 6295–6304). PMLR.
Villavicencio, C., Macrohon, J. J., Alphonse Inbaraj, X., Jeng, J.-H., & Hsieh, J.-G. (2021). Twitter sentiment analysis towards covid-19 vaccines in the philippines using naïve Bayes. Information, 12(5), 204.
Wang, G., & Ren, P. (2020). Hyperspectral image classification with feature-oriented adversarial active learning. Remote Sensing, 12(23), 3879.
Wang, W., Lu, Y., Wu, B., Chen, T., Chen, D. Z., & Wu, J. (2018). Deep active self-paced learning for accurate pulmonary nodule segmentation. In International conference on medical image computing and computer-assisted intervention (pp. 723–731). Springer.
Wilson, S. L., & Wiysonge, C. (2020). Social media and vaccine hesitancy. BMJ global health, 5(10), e004206.
Xing, W., Chen, C., Zhong, M., Wang, J., & Shi, J. (2021). Covid-al: The diagnosis of covid-19 with deep active learning. Medical Image Analysis, 68, 101913.
Xue, J., Chen, J., Chen, C., Zheng, C., Li, S., & Zhu, T. (2020). Public discourse and sentiment during the covid 19 pandemic: Using latent Dirichlet allocation for topic modeling on twitter. PloS one, 15(9), e0239441.
Yan, Y.-F., Huang, S.-J., Chen, S., Liao, M., & Xu, J. (2020). Active learning with query generation for cost-effective text classification. In Proceedings of the AAAI conference on artificial intelligence (Vol. 34, pp. 6583–6590).
Yang, L., Zhang, Y., Chen, J., Zhang, S., & Chen, D. Z. (2017). Suggestive annotation: A deep active learning framework for biomedical image segmentation. In International conference on medical image computing and computer-assisted intervention (pp. 399–407). Springer.
Yoo, D., & Kweon, I. S. (2019). Learning loss for active learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 93–102).
Yuan, D., Chang, X., Liu, Q., Yang, Y., Wang, D., Shu, M., He, Z., & Shi, G. (2023). Active learning for deep visual tracking. IEEE Transactions on Neural Networks and Learning Systems.
Yue, Z., Zeng, H., Kou, Z., Shang, L., & Wang, D. (2022). Contrastive domain adaptation for early misinformation detection: A case study on covid-19. In Proceedings of the 31st ACM international conference on information and knowledge management (pp. 2423–2433).
Zhang, B., Li, L., Yang, S., Wang, S., Zha, Z.-J., & Huang, Q. (2020). State-relabeling adversarial active learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 8756–8765).
Zhang, Y., Lease, M., & Wallace, B. (2017). Active discriminative text representation learning. In Proceedings of the AAAI conference on artificial intelligence (Vol. 31).
Zhang, Y., Zhang, X., Zhang, R., Wang, R., Zhang, Q., Wang, Y., Liang, Y., Liang, H., & Liu, J. (2021). Vaccine hesitancy and behavior change theory-based social media intervention: A randomized controlled trial. Vaccine, 40(4), 647–654.
Zhou, S., Chen, Q., & Wang, X. (2013). Active deep learning method for semi-supervised sentiment classification. Neurocomputing, 120, 536–546.
Zhu, J.-J., & Bento, J. (2017). Generative adversarial active learning. arXiv:1702.07956
Funding
The current study is not supported financially by any organization or individual.
Author information
Authors and Affiliations
Contributions
Conceptualization: SC, Methodology: SC, Formal analysis and investigation: SB, SC, Writing: SB, SC, Writing—review and editing: SC, SB, AKD, Supervision: AKD, SB
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that there is no Conflict of interest.
Ethics approval
Not applicable.
Consent to participate
Not applicable.
Consent for publication
Not applicable.
Availability of data and material
Original Data is available at ‘https://www.kaggle.com/datasets/gpreda/all-covid19-vaccines-tweets’. Annotated data will be made available upon request to the Corresponding Author.
Code availability
Will be made available upon request to the Corresponding Author.
Additional information
Editors: Nuno Moniz, Paula Branco, Luís Torgo, Nathalie Japkowicz, Michal Wozniak and Shuo Wang.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Chatterjee, S., Bhattacharjee, S., Das, A.K. et al. Imbalanced COVID-19 vaccine sentiment classification with synthetic resampling coupled deep adversarial active learning. Mach Learn 113, 8027–8059 (2024). https://doi.org/10.1007/s10994-024-06562-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10994-024-06562-7