Abstract
Along with COVID-19 pandemic we are also fighting an ‘infodemic’. Fake news and rumors are rampant on social media. Believing in rumors can cause significant harm. This is further exacerbated at the time of a pandemic. To tackle this, we curate and release a manually annotated dataset of 10,700 social media posts and articles of real and fake news on COVID-19. We perform a binary classification task (real vs fake) and benchmark the annotated dataset with four machine learning baselines - Decision Tree, Logistic Regression, Gradient Boost, and Support Vector Machine (SVM). We obtain the best performance of 93.32% F1-score with SVM on the test set. The data and code is available at: https://github.com/parthpatwa/covid19-fake-news-dectection.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Acemoglu, D., Ozdaglar, A., ParandehGheibi, A.: Spread of (mis)information in social networks. Games Econom. Behav. 70(2), 194–227 (2010). http://www.sciencedirect.com/science/article/pii/S0899825610000217
Balmas, M.: When fake news becomes real: combined exposure to multiple news sources and political attitudes of inefficacy, alienation, and cynicism. Commun. Res. 41(3), 430–454 (2014). https://doi.org/10.1177/0093650212453600
Budak, C., Agrawal, D., El Abbadi, A.: Limiting the spread of misinformation in social networks. In: Proceedings of the 20th International Conference on World Wide Web, WWW 2011 (2011). https://doi.org/10.1145/1963405.1963499
Chandra, S., Mishra, P., Yannakoudakis, H., Nimishakavi, M., Saeidi, M., Shutova, E.: Graph-based modeling of online communities for fake news detection. arXiv:2008.06274 (2020)
Ferreira, W., Vlachos, A.: Emergent: a novel data-set for stance classification. In: Proceedings of the 2016 Conference of NAACL, June 2016. https://www.aclweb.org/anthology/N16-1138
Cement, J.: Number of social media users 2025. Statista (2020). https://www.statista.com/statistics/278414/number-of-worldwide-social-network-users/. Accessed 30 Oct 2020
Kar, D., Bhardwaj, M., Samanta, S., Azad, A.P.: No rumours please! A multi-indic-lingual approach for covid fake-tweet detection. arXiv: 2010.06906 (2020)
Karimi, N., Gambrell, J.: Hundreds die of poisoning in Iran as fake news suggests methanol cure for virus. Times of Israel (2020). https://www.timesofisrael.com/hundreds-die-of-poisoning-in-iran-as-fake-news-suggests-methanol-cure-for-virus/
Kwon, S., Cha, M., Jung, K., Chen, W., Wang, Y.: Prominent features of rumor propagation in online social media. In: Proceedings of the 13th IEEE International Conference on Data Mining (ICDM 2013), December 2013. https://www.microsoft.com/en-us/research/publication/prominent-features-rumor-propagation-online-social-media/
Ma, J., Gao, W., Wong, K.F.: Detect rumors in microblog posts using propagation structure via kernel learning. In: Proceedings of the 55th Annual Meeting of the ACL, vol. 1 (Long Papers), July 2017. https://www.aclweb.org/anthology/P17-1066
Mustafaraj, E., Metaxas, P.T.: The fake news spreading plague: was it preventable? In: Proceedings of the 2017 ACM on Web Science Conference, WebSci 2017, pp. 235–239. Association for Computing Machinery, New York (2017). https://doi.org/10.1145/3091478.3091523
Nguyen, V.H., Sugiyama, K., Nakov, P., Kan, M.Y.: Fang. In: Proceedings of the 29th ACM International Conference on Information and Knowledge Management, October 2020. https://doi.org/10.1145/3340531.3412046
Panke, S.: Social media and fake news. AACE (2020). https://www.aace.org/review/social-media-and-fake-news/
Potthast, M., Kiesel, J., Reinartz, K., Bevendorff, J., Stein, B.: A stylometric inquiry into hyperpartisan and fake news. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, Australia, vol. 1 (Long Papers), pp. 231–240. Association for Computational Linguistics, July 2018. https://www.aclweb.org/anthology/P18-1022
Rubin, V., Conroy, N., Chen, Y., Cornwell, S.: Fake news or truth? Using satirical cues to detect potentially misleading news. In: Proceedings of the Second Workshop on Computational Approaches to Deception Detection, San Diego, California, pp. 7–17. Association for Computational Linguistics, June 2016. https://www.aclweb.org/anthology/W16-0802
Shafqat, W., Lee, S., Malik, S., Kim, H.C.: The language of deceivers: linguistic features of crowdfunding scams. In: Proceedings of the 25th International Conference Companion on World Wide Web, WWW 2016 Companion, International World Wide Web Conferences Steering Committee. Republic and Canton of Geneva, CHE, pp. 99–100 (2016). https://doi.org/10.1145/2872518.2889356
Shahi, G., Nandini, D.: Fakecovid - a multilingual cross-domain fact check news dataset for COVID-19. ArXiv, June 2020
Vijjali, R., Potluri, P., Kumar, S., Sundeep, T.: Two stage transformer model for COVID-19 fake news detection and fact checking. In: Proceedings of the Workshop on NLP for Internet Freedom (2020)
Vlachos, A., Riedel, S.: Fact checking: task definition and dataset construction. In: Workshop on Language Technologies and Computational Social Science, pp. 18–22, January 2014
Vo, N., Lee, K.: Where are the facts? Searching for fact-checked information to alleviate the spread of fake news. arXiv 2010.03159 (2020)
Warkentin, D., Woodworth, M., Hancock, J.T., Cormier, N.: Warrants and deception in computer mediated communication. In: Proceedings of the 2010 ACM Conference on Computer Supported Cooperative Work, CSCW 2010, pp. 9–12. Association for Computing Machinery, New York (2010). https://doi.org/10.1145/1718918.1718922
Zhang, L., Guan, Y.: Detecting click fraud in pay-per-click streams of online advertising networks. In: Proceedings of the 2008 the 28th International Conference on Distributed Computing Systems, ICDCS 2008, pp. 77–84. IEEE Computer Society, USA (2008). https://doi.org/10.1109/ICDCS.2008.98
Zubiaga, A., Liakata, M., Procter, R., Wong Sak Hoi, G., Tolmie, P.: Analysing how people orient to and spread rumours in social media by looking at conversational threads. PLOS One 11(3), e0150989 (2016). https://doi.org/10.1371/journal.pone.0150989
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Patwa, P. et al. (2021). Fighting an Infodemic: COVID-19 Fake News Dataset. In: Chakraborty, T., Shu, K., Bernard, H.R., Liu, H., Akhtar, M.S. (eds) Combating Online Hostile Posts in Regional Languages during Emergency Situation. CONSTRAINT 2021. Communications in Computer and Information Science, vol 1402. Springer, Cham. https://doi.org/10.1007/978-3-030-73696-5_3
Download citation
DOI: https://doi.org/10.1007/978-3-030-73696-5_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-73695-8
Online ISBN: 978-3-030-73696-5
eBook Packages: Computer ScienceComputer Science (R0)