Unsupervised Learning Model to Uncover | SpringerLink
Skip to main content

Unsupervised Learning Model to Uncover

Hidden Knowledge from COVID-19 Vaccines Literature

  • Conference paper
  • First Online:
Computational Science and Its Applications – ICCSA 2021 (ICCSA 2021)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12950))

Included in the following conference series:

  • 1687 Accesses

Abstract

Severe acute respiratory syndrome coronavirus 2 (or SARS-CoV-2) has spread globally, causing a pandemic with, so far, more than 152 million infections and more than three million deaths (as of May 2021). In order to address the COVID-19 pandemic by limiting transmission, an intense global effort is in the development of a safe and effective vaccine, which generally requires several years of pre-clinical and clinical stages of evaluation as well as strict regulatory approvals. However, because of the unprecedented impact of COVID-19 worldwide, the development and testing of a new vaccine are being accelerated. There are currently some authorized, not yet approved, vaccines to fight COVID-19, besides other ones in clinical evaluation or in a pre-clinical stage, and many more being researched. In this work, we used natural language processing and a machine learning model to predict good candidate vaccines. We built an unsupervised deep learning model (CVW2V) to produce word-embeddings using Word2vec from a corpus of published articles, selectively focusing on COVID-19 candidate vaccines that appeared in the literature, to identify promising target vaccines according to their similarity with approved and authorized vaccines.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 11210
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 14013
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References.

  1. Usama, M., et al.: Unsupervised machine learning for networking: techniques, applications and research challenges. IEEE Access. 7, 65579–65615 (2019)

    Article  Google Scholar 

  2. Yu, L.-C., et al.: Refining word embeddings using intensity scores for sentiment analysis. IEEE/ACM Trans. Audio Speech Lang. Process. 26(3), 671–681 (2018)

    Article  Google Scholar 

  3. Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval, Cambridge University Press, vol. 39 (2008)

    Google Scholar 

  4. Chen, W., et al.: Distributed feature representations for dependency parsing. IEEE/ACM Trans. Audio, Speech Lang. Process. 23(3), 451–460 (2015)

    Article  Google Scholar 

  5. Ouchi, H., et al.: Transition-based dependency parsing exploiting supertags. IEEE/ACM Trans. Audio, Speech, Lang. Process. 24(11), 2059–2068 (2016)

    Article  Google Scholar 

  6. Shen, M., et al.: Dependency parse reranking with rich subtree features. IEEE/ACM Trans. Audio Speech Lang. Process. 22(7), 1208–1218 (2014)

    Article  Google Scholar 

  7. Zhou, G., et al.: Learning the multilingual translation representations for question retrieval in community question answering via non-negative matrix factorization. IEEE/ACM Trans. Audio Speech Lang. Process. 24(7), 1305–1314 (2016)

    Article  Google Scholar 

  8. Hao, Y., et al.: An end-to-end model for question answering over knowledge base with cross-attention combining global knowledge. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). (2017)

    Google Scholar 

  9. Zhang, B., et al.: A context-aware recurrent encoder for neural machine translation. IEEE/ACM Trans. Audio Speech Lang. Process. 25(12), 2424–2432 (2017)

    Article  Google Scholar 

  10. Chen, K., et al.: A neural approach to source dependence based context model for statistical machine translation. IEEE/ACM Trans. Audio Speech Lang. Process. 26(2), 266–280 (2018)

    Article  Google Scholar 

  11. Sun, F., et al.: Learning word representations by jointly modeling syntagmatic and paradigmatic relations. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) (2015)

    Google Scholar 

  12. Lai, S., et al.: How to generate a good word embedding. IEEE Intell. Syst. 31(6), 5–14 (2016)

    Article  Google Scholar 

  13. Yin, W., Schütze, H.: Discriminative phrase embedding for paraphrase identification. In: Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (2015)

    Google Scholar 

  14. Li, Y., Yang, T.: Word Embedding for Understanding Natural Language: a Survey. Studies in Big Data, pp. 83–104 (2017)

    Google Scholar 

  15. Mellet, J., Pepper, M.S.: A COVID-19 vaccine: big strides come with big challenges. Vaccines. 9(1), 39 (2021)

    Google Scholar 

  16. Beck, B.R., et al.: Predicting commercially available antiviral drugs that may act on the novel coronavirus (SARS-CoV-2) through a drug-target interaction deep learning model. Comput. Struct. Biotechnol. J. 18, 784–790 (2020)

    Article  Google Scholar 

  17. Zhavoronkov, A., et al.: Deep learning enables rapid identification of potent DDR1 kinase inhibitors. Nat. Biotechnol. 37(9), 1038–1040 (2019)

    Article  Google Scholar 

  18. Keshavarzi Arshadi, A. et al.: Artificial Intelligence for COVID-19 Drug Discovery and Vaccine Development. Frontiers in Artificial Intelligence. 3, (2020).

    Google Scholar 

  19. Center for Biologics Evaluation and Research: Vaccine Development – 101. https://www.fda.gov/vaccines-blood-biologics/development-approval-process-cber/vaccine-development-101

  20. Mikolov, T., Chen, K., Corrado, G., et al.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)

  21. ElsevierDev: ElsevierDev/elsapy. https://github.com/ElsevierDev/elsapy. Accessed 22 Apr 2021

  22. gensim: topic modelling for humans. https://radimrehurek.com/gensim_3.8.3/index.html. Accessed 22 Apr 2021

  23. Tshitoyan, V., et al.: Unsupervised word embeddings capture latent knowledge from materials science literature. Nature 571(7763), 95–98 (2019)

    Article  Google Scholar 

  24. Yang, Z., et al.: An in silico deep learning approach to multi-epitope vaccine design: a SARS-CoV-2 case study. Sci. Rep. 11, 1 (2021)

    Article  Google Scholar 

  25. "Google Code Archive - Long-term storage for Google Code Project Hosting”. code.google.com. Retrieved 22 October 2020.

    Google Scholar 

  26. Strandqvist, W.: Neural Networks for Part-of-Speech Tagging (Dissertation) (2016). http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-129296

  27. Rong, X.: Word2vec Parameter Learning Explained. ArXiv Preprint ArXiv:1411.2738 (2014)

    Google Scholar 

  28. Fast, E., Altman, R.B., Chen, B.: Potential t-cell and b-cell epitopes of 2019-ncov. (2020). Goldberg, Y. and Levy, O., “word2vec Explained: deriving Mikolov et al.‘s negative-sampling word-embedding method”, <i>arXiv e-prints</i> (2014)

    Google Scholar 

  29. Chen, B., et al.: Predicting HLA class II antigen presentation through integrated deep learning. Nat. Biotechnol. 37, 1332–1343 (2019)

    Article  Google Scholar 

  30. Jurtz, V., Paul, S., Andreatta, M., Marcatili, P., Peters, B., Nielsen, M.: NetMHCpan-4.0: Improved Peptide–MHC Class I Interaction Predictions Integrating Eluted Ligand and Peptide Binding Affinity Data. J. Immunology 199, 3360–3368 (2017)

    Google Scholar 

  31. Crossman, L.C.: Leveraging Deep Learning to Simulate Coronavirus Spike proteins has the potential to predict future Zoonotic sequences (2020)

    Google Scholar 

  32. Abbasi, B.A., Saraf, D., Sharma, T., Sinha, R., Singh, S., Gupta, P., Sood, S., Gupta, A., rawal, kamal: Identification of vaccine targets &amp; design of vaccine against sars-cov-2 coronavirus using computational and deep learning-based approaches. (2020).

    Google Scholar 

  33. Schnabel, T., Labutov, I., Mimno, D., Joachims, T.: Evaluation methods for unsupervised word embeddings. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (2015)

    Google Scholar 

  34. The different types of COVID-19 vaccines. https://www.who.int/news-room/feature-stories/detail/the-race-for-a-covid-19-vaccine-explained

  35. Pollet, J., Chen, W.-H., Strych, U.: Recombinant protein vaccines, a proven approach against coronavirus pandemics. Adv. Drug Deliv. Rev. 170, 71–82 (2021)

    Article  Google Scholar 

  36. Wee, S.-lee, Qin, A.: China Approves Covid-19 Vaccine as It Moves to Inoculate Millions, https://www.nytimes.com/2020/12/30/business/china-vaccine.html.

  37. Corum, J., Zimmer, C.: How the Sinovac Vaccine Works. https://www.nytimes.com/interactive/2020/health/sinovac-covid-19-vaccine.html

  38. Logunov, D.Y., et al.: Safety and efficacy of an rAd26 and rAd5 vector-based heterologous prime-boost COVID-19 vaccine: an interim analysis of a randomised controlled phase 3 trial in Russia. The Lancet. 397, 671–681 (2021)

    Article  Google Scholar 

  39. Bharat Biotech-Vaccines & Bio-Therapeutics Manufacturer in India. https://www.bharatbiotech.com/covaxin.html. Accessed 22 Apr 2021

  40. Understanding Viral Vector COVID-19 Vaccines. https://www.cdc.gov/coronavirus/2019-ncov/vaccines/different-vaccines/viralvector.html?CDC_AA_refVal=https%3A%2F%2Fwww.cdc.gov%2Fvaccines%2Fcovid-19%2Fhcp%2Fviral-vector-vaccine-basics.html. Accessed 22 Apr 2021

  41. Codagenix Home. https://codagenix.com/. Accessed 27 Apr 2021

  42. Dutta, D.S.S.: What are Adenovirus-Based Vaccines? https://www.news-medical.net/health/What-are-Adenovirus-Based-Vaccines.aspx

  43. Modjarrad, K.,et al.: Safety and immunogenicity of an anti-Middle East respiratory syndrome coronavirus DNA vaccine: a phase 1, open-label, single-arm, dose-escalation trial. Lancet Infectious Diseases 19, 1013–1022 (2019)

    Google Scholar 

  44. China's mRNA COVID-19 vaccine may start late-stage trial in May - state media. https://www.reuters.com/business/healthcare-pharmaceuticals/chinas-mrna-covid-19-vaccine-may-start-late-stage-trial-may-state-media-2021-04-13/. Accessed 27 Apr 2021

  45. COVID-19 S-Trimer (SCB-2019) Vaccine. https://www.precisionvaccinations.com/vaccines/covid-19-s-trimer-scb-2019-vaccine. Accessed 28 Apr 2021

  46. Celonic and CureVac Announce Agreement to Manufacture over 100 Million Doses of CureVac's COVID-19 Vaccine Candidate, CVnCoV. https://www.curevac.com/en/2021/03/30/celonic-and-curevac-announce-agreement-to-manufacture-over-100-million-doses-of-curevacs-covid-19-vaccine-candidate-cvncov/. Accessed 28 Apr 2021

  47. Ascending Dose Study of Investigational SARS-CoV-2 Vaccine ARCT-021 in Healthy Adult Subjects – Full Text View. https://www.clinicaltrials.gov/ct2/show/NCT04480957. Accessed 2 May 2021

  48. Study of Recombinant Protein Vaccine Formulations Against COVID-19 in Healthy Adults 18 Years of Age and Older - Full Text View. https://www.clinicaltrials.gov/ct2/show/NCT04537208. Accessed 2 May 2021

  49. Philippidis, A.: Genexine - GX-19. https://www.genengnews.com/covid-19-candidates/genexine-gx-19/. Accessed 2 May 2021

  50. Dey, A., Chozhavel Rajanathan, T.M., Chandra, H., Pericherla, H.P.R., Kumar, S., Choonia, H.S., Bajpai, M., Singh, A.K., Sinha, A., Saini, G., Dalal, P., Vandriwala, S., Raheem, M.A., Divate, R.D., Navlani, N.L., Sharma, V., Parikh, A., Prasath, S., Rao, S., Maithal, K.: Immunogenic Potential of DNA Vaccine candidate, ZyCoV-D against SARS-CoV-2 in Animal Models. (2021).

    Google Scholar 

  51. Ella, R., et al.: Safety and immunogenicity of an inactivated SARS-CoV-2 vaccine, BBV152: a double-blind, randomised, phase 1 trial. Lancet. Infect. Dis 21, 637–646 (2021)

    Article  Google Scholar 

  52. Pizza, M., Bekkat-Berkani, R., Rappuoli, R.: Vaccines against Meningococcal Diseases. Microorganisms. 8, 1521 (2020)

    Google Scholar 

  53. Commissioner, O.of the: Coronavirus (COVID-19) Update: FDA Authorizes Monoclonal Antibodies for Treatment of COVID-19. https://www.fda.gov/news-events/press-announcements/coronavirus-covid-19-update-fda-authorizes-monoclonal-antibodies-treatment-covid-19-0. Accessed 2 May 2021

  54. Miller, K.: Merck Oral COVID-19 Drug Shows Promise in Early Trials. https://www.verywellhealth.com/merck-oral-covid-19-drug-clinical-trial-5115909. Accessed 2 May 2021

  55. BGB-DXP593. https://go.drugbank.com/drugs/DB16357. Accessed 2 May 2021

  56. A Study on the Safety, Tolerability and Immune Response of SARS-CoV-2 Sclamp (COVID-19) Vaccine in Healthy Adults - Full Text View. https://www.clinicaltrials.gov/ct2/show/NCT04495933

  57. Dhama, K., et al.: Plant-based vaccines and antibodies to combat COVID-19: current status and prospects. Hum. Vaccin. Immunother. 16, 2913–2920 (2020)

    Article  Google Scholar 

  58. The Future of Genetic Engineering. https://www.greffex.com/. Accessed 2 May 2021

  59. New Data from Vaxart Oral COVID-19 Vaccine Phase I Study Suggests Broad Cross-Reactivity against Other Coronaviruses. https://investors.vaxart.com/news-releases/news-release-details/new-data-vaxart-oral-covid-19-vaccine-phase-i-study-suggests. Accessed 2 May 2021

  60. Phase II / III Study of COVID-19 DNA Vaccine (AG0302-COVID19) - Full Text View. https://www.clinicaltrials.gov/ct2/show/NCT04655625. Accessed 2 May 2021

  61. Efficacy and Safety of TY027 a Treatment for COVID-19 in Humans. https://www.centerwatch.com/clinical-trials/listings/259289/efficacy-and-safety-of-ty027-a-treatment-for-covid-19-in-humans/. Accessed 2 May 2021

  62. Safety and Immunogenicity of AdCOVID in Healthy Adults (COVID-19 Vaccine Study) - Full Text View. https://clinicaltrials.gov/ct2/show/NCT04679909. Accessed 4 May 2021

  63. Gharaibeh, T., de Doncker, E.: Unsupervised Learning with Word Embeddings Captures Knowledge from COVID-19 Literature. CSCI 2020 (Dec. 2020), IEEE CPS, Accepted

    Google Scholar 

  64. Caselles-Dupré, H., Lesaint, F., Royo-Letelier, J.: Word2vec applied to recommendation. In: Proceedings of the 12th ACM Conference on Recommender Systems. (2018)

    Google Scholar 

  65. Yildiz, B., Tezgider, M.: Learning quality improved word embedding with assessment of hyperparameters. In: Schwardmann, U., et al. (eds.) Euro-Par 2019. LNCS, vol. 11997, pp. 506–518. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-48340-1_39

    Chapter  Google Scholar 

Download references

Acknowledgments

The authors would like to thank Dr. Alvis Fong and Dr. Pnina Ari-Gur for their valuable suggestions in the development of this work. Furthermore, we thank the anonymous reviewers for their valuable feedback and suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tasnim Gharaibeh .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Gharaibeh, T., de Doncker, E. (2021). Unsupervised Learning Model to Uncover. In: Gervasi, O., et al. Computational Science and Its Applications – ICCSA 2021. ICCSA 2021. Lecture Notes in Computer Science(), vol 12950. Springer, Cham. https://doi.org/10.1007/978-3-030-86960-1_38

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-86960-1_38

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-86959-5

  • Online ISBN: 978-3-030-86960-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics