Using Machine Learning-Based Approaches for the Detection and Classification of Human Papillomavirus Vaccine Misinformation: Infodemiology Study of Reddit Discussions - PubMed Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Aug 5;23(8):e26478.
doi: 10.2196/26478.

Using Machine Learning-Based Approaches for the Detection and Classification of Human Papillomavirus Vaccine Misinformation: Infodemiology Study of Reddit Discussions

Affiliations

Using Machine Learning-Based Approaches for the Detection and Classification of Human Papillomavirus Vaccine Misinformation: Infodemiology Study of Reddit Discussions

Jingcheng Du et al. J Med Internet Res. .

Abstract

Background: The rapid growth of social media as an information channel has made it possible to quickly spread inaccurate or false vaccine information, thus creating obstacles for vaccine promotion.

Objective: The aim of this study is to develop and evaluate an intelligent automated protocol for identifying and classifying human papillomavirus (HPV) vaccine misinformation on social media using machine learning (ML)-based methods.

Methods: Reddit posts (from 2007 to 2017, N=28,121) that contained keywords related to HPV vaccination were compiled. A random subset (2200/28,121, 7.82%) was manually labeled for misinformation and served as the gold standard corpus for evaluation. A total of 5 ML-based algorithms, including a support vector machine, logistic regression, extremely randomized trees, a convolutional neural network, and a recurrent neural network designed to identify vaccine misinformation, were evaluated for identification performance. Topic modeling was applied to identify the major categories associated with HPV vaccine misinformation.

Results: A convolutional neural network model achieved the highest area under the receiver operating characteristic curve of 0.7943. Of the 28,121 Reddit posts, 7207 (25.63%) were classified as vaccine misinformation, with discussions about general safety issues identified as the leading type of misinformed posts (2666/7207, 36.99%).

Conclusions: ML-based approaches are effective in the identification and classification of HPV vaccine misinformation on Reddit and may be generalizable to other social media platforms. ML-based methods may provide the capacity and utility to meet the challenge involved in intelligent automated monitoring and classification of public health misinformation on social media platforms. The timely identification of vaccine misinformation on the internet is the first step in misinformation correction and vaccine promotion.

Keywords: HPV vaccine; Reddit; deep learning; infodemiology; infoveillance; machine learning; misinformation; social media.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: None declared.

Figures

Figure 1
Figure 1
The overview of human papillomavirus misinformation identification and classification on Reddit. (a) Evaluation of machine learning–based misinformation identification and (b) topic modeling. ML: machine learning.
Figure 2
Figure 2
The performance of machine learning algorithms on human papillomavirus misinformation identification. (a) Receiver operating characteristic and (b) convolutional neural networks precision-recall curves. AUC: area under the curve; ET: extremely randomized trees; CNN: convolutional neural networks; LR: logistic regression; RNN: recurrent neural network; SVM: support vector machines.

Similar articles

Cited by

References

    1. Human Papillomavirus (HPV) - reasons to get vaccinated. Centers for Disease Control and Prevention. 2019. [2021-07-02]. https://www.cdc.gov/hpv/parents/vaccine/six-reasons.html.
    1. Saraiya M, Unger E, Thompson T, Lynch CF, Hernandez BY, Lyu CW, Steinau M, Watson M, Wilkinson EJ, Hopenhayn C, Copeland G, Cozen W, Peters ES, Huang Y, Saber MS, Altekruse S, Goodman MT, HPV Typing of Cancers Workgroup US assessment of HPV types in cancers: implications for current and 9-valent HPV vaccines. J Natl Cancer Inst. 2015 Jun;107(6):djv086. doi: 10.1093/jnci/djv086. http://europepmc.org/abstract/MED/25925419 - DOI - PMC - PubMed
    1. HPV vaccine: who needs it, how it works. Mayo Clinic. 2019. [2020-02-09]. https://www.mayoclinic.org/diseases-conditions/hpv-infection/in-depth/hp....
    1. Zimet GD, Rosberger Z, Fisher WA, Perez S, Stupiansky NW. Beliefs, behaviors and HPV vaccine: correcting the myths and the misinformation. Prev Med. 2013 Nov;57(5):414–8. doi: 10.1016/j.ypmed.2013.05.013. https://linkinghub.elsevier.com/retrieve/pii/S0091-7435(13)00176-X - DOI - PubMed
    1. Etter DJ, Zimet GD, Rickert VI. Human papillomavirus vaccine in adolescent women: a 2012 update. Curr Opin Obstet Gynecol. 2012 Oct;24(5):305–10. doi: 10.1097/GCO.0b013e3283567005. - DOI - PubMed

Publication types

Substances

LinkOut - more resources