Abstract
Severe acute respiratory syndrome coronavirus 2 (or SARS-CoV-2) has spread globally, causing a pandemic with, so far, more than 152 million infections and more than three million deaths (as of May 2021). In order to address the COVID-19 pandemic by limiting transmission, an intense global effort is in the development of a safe and effective vaccine, which generally requires several years of pre-clinical and clinical stages of evaluation as well as strict regulatory approvals. However, because of the unprecedented impact of COVID-19 worldwide, the development and testing of a new vaccine are being accelerated. There are currently some authorized, not yet approved, vaccines to fight COVID-19, besides other ones in clinical evaluation or in a pre-clinical stage, and many more being researched. In this work, we used natural language processing and a machine learning model to predict good candidate vaccines. We built an unsupervised deep learning model (CVW2V) to produce word-embeddings using Word2vec from a corpus of published articles, selectively focusing on COVID-19 candidate vaccines that appeared in the literature, to identify promising target vaccines according to their similarity with approved and authorized vaccines.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References.
Usama, M., et al.: Unsupervised machine learning for networking: techniques, applications and research challenges. IEEE Access. 7, 65579–65615 (2019)
Yu, L.-C., et al.: Refining word embeddings using intensity scores for sentiment analysis. IEEE/ACM Trans. Audio Speech Lang. Process. 26(3), 671–681 (2018)
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval, Cambridge University Press, vol. 39 (2008)
Chen, W., et al.: Distributed feature representations for dependency parsing. IEEE/ACM Trans. Audio, Speech Lang. Process. 23(3), 451–460 (2015)
Ouchi, H., et al.: Transition-based dependency parsing exploiting supertags. IEEE/ACM Trans. Audio, Speech, Lang. Process. 24(11), 2059–2068 (2016)
Shen, M., et al.: Dependency parse reranking with rich subtree features. IEEE/ACM Trans. Audio Speech Lang. Process. 22(7), 1208–1218 (2014)
Zhou, G., et al.: Learning the multilingual translation representations for question retrieval in community question answering via non-negative matrix factorization. IEEE/ACM Trans. Audio Speech Lang. Process. 24(7), 1305–1314 (2016)
Hao, Y., et al.: An end-to-end model for question answering over knowledge base with cross-attention combining global knowledge. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). (2017)
Zhang, B., et al.: A context-aware recurrent encoder for neural machine translation. IEEE/ACM Trans. Audio Speech Lang. Process. 25(12), 2424–2432 (2017)
Chen, K., et al.: A neural approach to source dependence based context model for statistical machine translation. IEEE/ACM Trans. Audio Speech Lang. Process. 26(2), 266–280 (2018)
Sun, F., et al.: Learning word representations by jointly modeling syntagmatic and paradigmatic relations. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) (2015)
Lai, S., et al.: How to generate a good word embedding. IEEE Intell. Syst. 31(6), 5–14 (2016)
Yin, W., Schütze, H.: Discriminative phrase embedding for paraphrase identification. In: Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (2015)
Li, Y., Yang, T.: Word Embedding for Understanding Natural Language: a Survey. Studies in Big Data, pp. 83–104 (2017)
Mellet, J., Pepper, M.S.: A COVID-19 vaccine: big strides come with big challenges. Vaccines. 9(1), 39 (2021)
Beck, B.R., et al.: Predicting commercially available antiviral drugs that may act on the novel coronavirus (SARS-CoV-2) through a drug-target interaction deep learning model. Comput. Struct. Biotechnol. J. 18, 784–790 (2020)
Zhavoronkov, A., et al.: Deep learning enables rapid identification of potent DDR1 kinase inhibitors. Nat. Biotechnol. 37(9), 1038–1040 (2019)
Keshavarzi Arshadi, A. et al.: Artificial Intelligence for COVID-19 Drug Discovery and Vaccine Development. Frontiers in Artificial Intelligence. 3, (2020).
Center for Biologics Evaluation and Research: Vaccine Development – 101. https://www.fda.gov/vaccines-blood-biologics/development-approval-process-cber/vaccine-development-101
Mikolov, T., Chen, K., Corrado, G., et al.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
ElsevierDev: ElsevierDev/elsapy. https://github.com/ElsevierDev/elsapy. Accessed 22 Apr 2021
gensim: topic modelling for humans. https://radimrehurek.com/gensim_3.8.3/index.html. Accessed 22 Apr 2021
Tshitoyan, V., et al.: Unsupervised word embeddings capture latent knowledge from materials science literature. Nature 571(7763), 95–98 (2019)
Yang, Z., et al.: An in silico deep learning approach to multi-epitope vaccine design: a SARS-CoV-2 case study. Sci. Rep. 11, 1 (2021)
"Google Code Archive - Long-term storage for Google Code Project Hosting”. code.google.com. Retrieved 22 October 2020.
Strandqvist, W.: Neural Networks for Part-of-Speech Tagging (Dissertation) (2016). http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-129296
Rong, X.: Word2vec Parameter Learning Explained. ArXiv Preprint ArXiv:1411.2738 (2014)
Fast, E., Altman, R.B., Chen, B.: Potential t-cell and b-cell epitopes of 2019-ncov. (2020). Goldberg, Y. and Levy, O., “word2vec Explained: deriving Mikolov et al.‘s negative-sampling word-embedding method”, <i>arXiv e-prints</i> (2014)
Chen, B., et al.: Predicting HLA class II antigen presentation through integrated deep learning. Nat. Biotechnol. 37, 1332–1343 (2019)
Jurtz, V., Paul, S., Andreatta, M., Marcatili, P., Peters, B., Nielsen, M.: NetMHCpan-4.0: Improved Peptide–MHC Class I Interaction Predictions Integrating Eluted Ligand and Peptide Binding Affinity Data. J. Immunology 199, 3360–3368 (2017)
Crossman, L.C.: Leveraging Deep Learning to Simulate Coronavirus Spike proteins has the potential to predict future Zoonotic sequences (2020)
Abbasi, B.A., Saraf, D., Sharma, T., Sinha, R., Singh, S., Gupta, P., Sood, S., Gupta, A., rawal, kamal: Identification of vaccine targets & design of vaccine against sars-cov-2 coronavirus using computational and deep learning-based approaches. (2020).
Schnabel, T., Labutov, I., Mimno, D., Joachims, T.: Evaluation methods for unsupervised word embeddings. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (2015)
The different types of COVID-19 vaccines. https://www.who.int/news-room/feature-stories/detail/the-race-for-a-covid-19-vaccine-explained
Pollet, J., Chen, W.-H., Strych, U.: Recombinant protein vaccines, a proven approach against coronavirus pandemics. Adv. Drug Deliv. Rev. 170, 71–82 (2021)
Wee, S.-lee, Qin, A.: China Approves Covid-19 Vaccine as It Moves to Inoculate Millions, https://www.nytimes.com/2020/12/30/business/china-vaccine.html.
Corum, J., Zimmer, C.: How the Sinovac Vaccine Works. https://www.nytimes.com/interactive/2020/health/sinovac-covid-19-vaccine.html
Logunov, D.Y., et al.: Safety and efficacy of an rAd26 and rAd5 vector-based heterologous prime-boost COVID-19 vaccine: an interim analysis of a randomised controlled phase 3 trial in Russia. The Lancet. 397, 671–681 (2021)
Bharat Biotech-Vaccines & Bio-Therapeutics Manufacturer in India. https://www.bharatbiotech.com/covaxin.html. Accessed 22 Apr 2021
Understanding Viral Vector COVID-19 Vaccines. https://www.cdc.gov/coronavirus/2019-ncov/vaccines/different-vaccines/viralvector.html?CDC_AA_refVal=https%3A%2F%2Fwww.cdc.gov%2Fvaccines%2Fcovid-19%2Fhcp%2Fviral-vector-vaccine-basics.html. Accessed 22 Apr 2021
Codagenix Home. https://codagenix.com/. Accessed 27 Apr 2021
Dutta, D.S.S.: What are Adenovirus-Based Vaccines? https://www.news-medical.net/health/What-are-Adenovirus-Based-Vaccines.aspx
Modjarrad, K.,et al.: Safety and immunogenicity of an anti-Middle East respiratory syndrome coronavirus DNA vaccine: a phase 1, open-label, single-arm, dose-escalation trial. Lancet Infectious Diseases 19, 1013–1022 (2019)
China's mRNA COVID-19 vaccine may start late-stage trial in May - state media. https://www.reuters.com/business/healthcare-pharmaceuticals/chinas-mrna-covid-19-vaccine-may-start-late-stage-trial-may-state-media-2021-04-13/. Accessed 27 Apr 2021
COVID-19 S-Trimer (SCB-2019) Vaccine. https://www.precisionvaccinations.com/vaccines/covid-19-s-trimer-scb-2019-vaccine. Accessed 28 Apr 2021
Celonic and CureVac Announce Agreement to Manufacture over 100 Million Doses of CureVac's COVID-19 Vaccine Candidate, CVnCoV. https://www.curevac.com/en/2021/03/30/celonic-and-curevac-announce-agreement-to-manufacture-over-100-million-doses-of-curevacs-covid-19-vaccine-candidate-cvncov/. Accessed 28 Apr 2021
Ascending Dose Study of Investigational SARS-CoV-2 Vaccine ARCT-021 in Healthy Adult Subjects – Full Text View. https://www.clinicaltrials.gov/ct2/show/NCT04480957. Accessed 2 May 2021
Study of Recombinant Protein Vaccine Formulations Against COVID-19 in Healthy Adults 18 Years of Age and Older - Full Text View. https://www.clinicaltrials.gov/ct2/show/NCT04537208. Accessed 2 May 2021
Philippidis, A.: Genexine - GX-19. https://www.genengnews.com/covid-19-candidates/genexine-gx-19/. Accessed 2 May 2021
Dey, A., Chozhavel Rajanathan, T.M., Chandra, H., Pericherla, H.P.R., Kumar, S., Choonia, H.S., Bajpai, M., Singh, A.K., Sinha, A., Saini, G., Dalal, P., Vandriwala, S., Raheem, M.A., Divate, R.D., Navlani, N.L., Sharma, V., Parikh, A., Prasath, S., Rao, S., Maithal, K.: Immunogenic Potential of DNA Vaccine candidate, ZyCoV-D against SARS-CoV-2 in Animal Models. (2021).
Ella, R., et al.: Safety and immunogenicity of an inactivated SARS-CoV-2 vaccine, BBV152: a double-blind, randomised, phase 1 trial. Lancet. Infect. Dis 21, 637–646 (2021)
Pizza, M., Bekkat-Berkani, R., Rappuoli, R.: Vaccines against Meningococcal Diseases. Microorganisms. 8, 1521 (2020)
Commissioner, O.of the: Coronavirus (COVID-19) Update: FDA Authorizes Monoclonal Antibodies for Treatment of COVID-19. https://www.fda.gov/news-events/press-announcements/coronavirus-covid-19-update-fda-authorizes-monoclonal-antibodies-treatment-covid-19-0. Accessed 2 May 2021
Miller, K.: Merck Oral COVID-19 Drug Shows Promise in Early Trials. https://www.verywellhealth.com/merck-oral-covid-19-drug-clinical-trial-5115909. Accessed 2 May 2021
BGB-DXP593. https://go.drugbank.com/drugs/DB16357. Accessed 2 May 2021
A Study on the Safety, Tolerability and Immune Response of SARS-CoV-2 Sclamp (COVID-19) Vaccine in Healthy Adults - Full Text View. https://www.clinicaltrials.gov/ct2/show/NCT04495933
Dhama, K., et al.: Plant-based vaccines and antibodies to combat COVID-19: current status and prospects. Hum. Vaccin. Immunother. 16, 2913–2920 (2020)
The Future of Genetic Engineering. https://www.greffex.com/. Accessed 2 May 2021
New Data from Vaxart Oral COVID-19 Vaccine Phase I Study Suggests Broad Cross-Reactivity against Other Coronaviruses. https://investors.vaxart.com/news-releases/news-release-details/new-data-vaxart-oral-covid-19-vaccine-phase-i-study-suggests. Accessed 2 May 2021
Phase II / III Study of COVID-19 DNA Vaccine (AG0302-COVID19) - Full Text View. https://www.clinicaltrials.gov/ct2/show/NCT04655625. Accessed 2 May 2021
Efficacy and Safety of TY027 a Treatment for COVID-19 in Humans. https://www.centerwatch.com/clinical-trials/listings/259289/efficacy-and-safety-of-ty027-a-treatment-for-covid-19-in-humans/. Accessed 2 May 2021
Safety and Immunogenicity of AdCOVID in Healthy Adults (COVID-19 Vaccine Study) - Full Text View. https://clinicaltrials.gov/ct2/show/NCT04679909. Accessed 4 May 2021
Gharaibeh, T., de Doncker, E.: Unsupervised Learning with Word Embeddings Captures Knowledge from COVID-19 Literature. CSCI 2020 (Dec. 2020), IEEE CPS, Accepted
Caselles-Dupré, H., Lesaint, F., Royo-Letelier, J.: Word2vec applied to recommendation. In: Proceedings of the 12th ACM Conference on Recommender Systems. (2018)
Yildiz, B., Tezgider, M.: Learning quality improved word embedding with assessment of hyperparameters. In: Schwardmann, U., et al. (eds.) Euro-Par 2019. LNCS, vol. 11997, pp. 506–518. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-48340-1_39
Acknowledgments
The authors would like to thank Dr. Alvis Fong and Dr. Pnina Ari-Gur for their valuable suggestions in the development of this work. Furthermore, we thank the anonymous reviewers for their valuable feedback and suggestions.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Gharaibeh, T., de Doncker, E. (2021). Unsupervised Learning Model to Uncover. In: Gervasi, O., et al. Computational Science and Its Applications – ICCSA 2021. ICCSA 2021. Lecture Notes in Computer Science(), vol 12950. Springer, Cham. https://doi.org/10.1007/978-3-030-86960-1_38
Download citation
DOI: https://doi.org/10.1007/978-3-030-86960-1_38
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86959-5
Online ISBN: 978-3-030-86960-1
eBook Packages: Computer ScienceComputer Science (R0)