Empowering Virus Sequence Research Through Conceptual Modeling | SpringerLink
Skip to main content

Empowering Virus Sequence Research Through Conceptual Modeling

  • Conference paper
  • First Online:
Conceptual Modeling (ER 2020)

Abstract

The pandemic outbreak of the coronavirus disease has attracted attention towards the genetic mechanisms of viruses. We hereby present the Viral Conceptual Model (VCM), centered on the virus sequence and described from four perspectives: biological (virus type and hosts/sample), analytical (annotations, nucleotide and amino acid variants), organizational (sequencing project) and technical (experimental technology).

VCM is inspired by GCM, our previously developed Genomic Conceptual Model, but it introduces many novel concepts, as viral sequences significantly differ from human genomes. When applied to SARS-CoV-2 virus, complex conceptual queries upon VCM are able to replicate the search results of recent articles, hence demonstrating huge potential in supporting virology research.

Our effort is part of a broad vision: availability of conceptual models for both human genomics and viruses will provide important opportunities for research, especially if interconnected by the same human being, playing the role of virus host as well as provider of genomic and phenotype information.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 5719
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 7149
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    SARS-CoV-2 is generally identified by the NCBI taxonomy  [18] ID 2697049.

  2. 2.

    GenoSurf: http://gmql.eu/genosurf/; ViruSurf: http://gmql.eu/virusurf/.

  3. 3.

    http://www.insdc.org/.

  4. 4.

    https://www.ebi.ac.uk/ena/pathogens/.

  5. 5.

    https://4virology.net/virology-ca-tools/vgo/.

  6. 6.

    NGDC: https://bigd.big.ac.cn/; CNGB: https://db.cngb.org/.

  7. 7.

    http://clinicaltrials.gov/.

  8. 8.

    In RNA sequencing databases uracil (U) is replaced with thymine (T).

  9. 9.

    https://www.ncbi.nlm.nih.gov/pubmed/.

  10. 10.

    https://www.ncbi.nlm.nih.gov/bioproject/.

  11. 11.

    https://en.wikipedia.org/wiki/Nucleic_acid_notation#IUPAC_notation.

  12. 12.

    It represents the positive-sense, single-stranded RNA virus (from 0 to the 29903\(^{th}\) base) of NC_045512 RefSeq staff-curated complete sequence (StrainName “Wuhan-Hu-1”), collected in China from a “Homo Sapiens” HostSample in December 2019.

  13. 13.

    http://glue-tools.cvr.gla.ac.uk/images/projectModel.png.

  14. 14.

    https://www.covid19hg.org/.

  15. 15.

    We coordinated about 50 active participants and released the “Freeze 1” version of the data dictionary on April 16, 2020 (http://gmql.eu/phenotype/).

References

  1. Amid, C., et al.: The European nucleotide archive in 2019. Nucleic Acids Res. 48(D1), D70–D76 (2020)

    Google Scholar 

  2. Babenko, V., et al.: GUS the genomics unified schema a platform for genomics databases. http://www.gusdb.org/. Accessed 1 Aug 2020

  3. Bairoch, A.: The cellosaurus, a cell-line knowledge resource. J. Biomol. Tech. JBT 29(2), 25 (2018)

    Article  Google Scholar 

  4. Bernasconi, A., et al.: Exploiting conceptual modeling for searching genomic metadata: a quantitative and qualitative empirical study. In: Guizzardi, G., et al. (eds.) Advances in Conceptual Modeling, pp. 83–94. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-34146-6_8

    Chapter  Google Scholar 

  5. Bernasconi, A., et al.: From a conceptual model to a knowledge graph for genomic datasets. In: Laender, A.H.F., et al. (eds.) Conceptual Modeling, pp. 352–360. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-33223-5_29

    Chapter  Google Scholar 

  6. Bernasconi, A., et al.: META-BASE: a novel architecture for large-scale genomic metadata integration. IEEE/ACM Trans. Comput. Biol. Bioinform. (2020)

    Google Scholar 

  7. Bernasconi, A., et al.: The road towards data integration in human genomics: players, steps and interactions. Briefings Bioinform. 4, 80 (2020)

    Google Scholar 

  8. Bernasconi, A., et al.: Conceptual modeling for genomics: building an integrated repository of open data. In: Mayr, H.C., et al. (eds.) Conceptual Modeling, pp. 325–339. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-69904-2_26

    Chapter  Google Scholar 

  9. Bonifati, A., et al.: Designing data marts for data warehouses. ACM Transactions on Software Engineering and Methodology 10(4), 452–483 (2001)

    Article  Google Scholar 

  10. Canakoglu, A., et al.: GenoSurf: metadata driven semantic search system for integrated genomic datasets. Database 2019, 132 (2019)

    Article  Google Scholar 

  11. Canakoglu, A., et al.: ViruSurf: an integrated database to investigate viral sequences. Nucleic Acids Research, gkaa846 (2020). https://doi.org/10.1093/nar/gkaa846

  12. Cingolani, P., et al.: A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly 6(2), 80–92 (2012)

    Article  Google Scholar 

  13. Consortium, G.O.: The gene ontology resource: 20 years and still going strong. Nucleic Acids Res. 47(D1), D330–D338 (2019)

    Article  Google Scholar 

  14. Corman, V.M., et al.: Detection of 2019 novel coronavirus (2019-nCoV) by real-time RT-PCR. Eurosurveillance 25(3), 200045 (2020)

    Article  Google Scholar 

  15. Cornell, M., et al.: GIMS: an integrated data storage and analysis environment for genomic and functional data. Yeast 20(15), 1291–1306 (2003)

    Article  Google Scholar 

  16. De Francesco, E., et al.: A summary of genomic databases: overview and discussion. In: Biomedical Data and Applications, pp. 37–54. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-02193-0_3

  17. Do, H.H., et al.: Flexible integration of molecular-biological annotation data: the genmapper approach. In: Bertino, E., Christodoulakis, S., Plexousakis, D., Christophides, V., Koubarakis, M., Böhm, K., Ferrari, E. (eds.) EDBT 2004. LNCS, vol. 2992, pp. 811–822. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24741-8_47

    Chapter  Google Scholar 

  18. Federhen, S.: The NCBI taxonomy database. Nucleic Acids Res. 40(D1), D136–D143 (2012)

    Article  Google Scholar 

  19. Ferrandis, A.M.M., et al.: Applying the principles of an ontology-based approach to a conceptual schema of human genome. In: Ng, W., Storey, V.C., Trujillo, J.C. (eds.) ER 2013. LNCS, vol. 8217, pp. 471–478. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41924-9_40

    Chapter  Google Scholar 

  20. Flicek, P., et al.: The European Genotype Archive: Background and implementation [white paper] (2007). https://www.ebi.ac.uk/ega/sites/ebi.ac.uk.ega/files/documents/ega_whitepaper.pdf

  21. Gudbjartsson, D.F., et al.: Spread of SARS-CoV-2 in the Icelandic population. New Engl. J. Med. 382, 2302–2315 (2020)

    Article  Google Scholar 

  22. Guérin, E., et al.: Integrating and warehousing liver gene expression data and related biomedical resources in GEDAW. In: Ludäscher, B., Raschid, L. (eds.) DILS 2005. LNCS, vol. 3615, pp. 158–174. Springer, Heidelberg (2005). https://doi.org/10.1007/11530084_14

    Chapter  Google Scholar 

  23. Hadfield, J., et al.: Nextstrain: real-time tracking of pathogen evolution. Bioinformatics 34(23), 4121–4123 (2018)

    Article  Google Scholar 

  24. Hatcher, E.L., et al.: Virus variation resource-improved response to emergent viral outbreaks. Nucleic Acids Res. 45(D1), D482–D490 (2017)

    Article  Google Scholar 

  25. Hulo, C., et al.: ViralZone: a knowledge resource to understand virus diversity. Nucleic Acids Res. 39, D576–D582 (2011)

    Article  Google Scholar 

  26. Junior, I.J.M., et al.: The global population of SARS-CoV-2 is composed of six major subtypes. bioRxiv (2020)

    Google Scholar 

  27. Koonin, E.V., et al.: Global organization and proposed megataxonomy of the virus world. Microbiol. Mol. Biol. Rev. 84(2), 156 (2020)

    Google Scholar 

  28. Lescure, F.X., et al.: Clinical and virological data of the first cases of COVID-19 in Europe: a case series. The Lancet Infect. Dis. 20, 6 (2020)

    Article  Google Scholar 

  29. Lu, G., et al.: Influenza A virus informatics: genotype-centered database and genotype annotation. In: Second International Multi-Symposiums on Computer and Computational Sciences (IMSCCS 2007), pp. 76–83. IEEE (2007)

    Google Scholar 

  30. Lu, R., et al.: Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding. The Lancet 395(10224), 565–574 (2020)

    Article  Google Scholar 

  31. Médigue, C., et al.: Imagene: an integrated computer environment for sequence annotation and analysis. Bioinformatics (Oxford, England) 15(1), 2–15 (1999)

    Article  Google Scholar 

  32. Needleman, S.B., et al.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48(3), 443–453 (1970)

    Article  Google Scholar 

  33. Okayama, T., et al.: Formal design and implementation of an improved DDBJ DNA database with a new schema and object-oriented library. Bioinformatics (Oxford, England) 14(6), 472–478 (1998)

    Article  Google Scholar 

  34. O’Leary, N.A., et al.: Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 44(D1), D733–D745 (2015)

    Article  MathSciNet  Google Scholar 

  35. Palacio, A.L., et al.: A method to identify relevant genome data: conceptual modeling for the medicine of precision. In: Trujillo, J.C., et al. (eds.) ER 2018. LNCS, vol. 11157, pp. 597–609. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00847-5_44

    Chapter  Google Scholar 

  36. Paton, N.W., et al.: Conceptual modelling of genomic information. Bioinformatics 16(6), 548–557 (2000)

    Article  Google Scholar 

  37. Pickett, B.E., et al.: ViPR: an open bioinformatics database and analysis resource for virology research. Nucleic Acids Res. 40(D1), D593–D598 (2012)

    Article  Google Scholar 

  38. Nomenclature Committee of the International Union of Biochemistry (NC-IUB): Nomenclature for incompletely specified bases in nucleic acid sequences: Recommendations 1984. Proceedings of the National Academy of Sciences of the United States of America 83(1), 4–8 (1986)

    Google Scholar 

  39. UniProt Consortium: UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res. 47(D1), D506–D515 (2019)

    Article  Google Scholar 

  40. Reyes Román, J.F., Pastor, Ó., Casamayor, J.C., Valverde, F.: Applying conceptual modeling to better understand the human genome. In: Comyn-Wattiau, I., Tanaka, K., Song, I.-Y., Yamamoto, S., Saeki, M. (eds.) ER 2016. LNCS, vol. 9974, pp. 404–412. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46397-1_31

    Chapter  Google Scholar 

  41. Sayers, E.: The E-utilities in-depth: parameters, syntax and more. Entrez Programming Utilities Help [Internet] (2009). https://www.ncbi.nlm.nih.gov/books/NBK25499/

  42. Sayers, E.W., et al.: GenBank. Nucleic Acids Res. 47(D1), D94–D99 (2019)

    Article  Google Scholar 

  43. Sharma, D., et al.: Unraveling the web of viroinformatics: computational tools and databases in virus research. J. Virol. 89(3), 1489–1501 (2015)

    Article  Google Scholar 

  44. Shu, Y., et al.: GISAID: Global initiative on sharing all influenza data-from vision to reality. Eurosurveill. 22(13), 30494 (2017)

    Article  Google Scholar 

  45. Singer, J., et al.: CoV-Glue: a web application for tracking SARS-CoV-2 genomic variation (2020). Preprints 2020, 2020060225

    Google Scholar 

  46. Smith, B., et al.: The obo foundry: coordinated evolution of ontologies to support biomedical data integration. Nat. Biotechnol. 25(11), 1251–1255 (2007)

    Article  Google Scholar 

  47. Stano, M., et al.: viruSITE-integrated database for viral genomics. Database 2016, e00152 (2016)

    Article  Google Scholar 

  48. Tahsin, T., et al.: Named entity linking of geospatial and host metadata in genbank for advancing biomedical research. Database 2017, 93 (2017)

    Article  Google Scholar 

  49. Tang, X., et al.: On the origin and continuing evolution of SARS-CoV-2. Nat. Sci. Rev. (2020)

    Google Scholar 

Download references

Acknowledgements

This research is funded by the ERC Advanced Grant 693174 GeCo (Data-Driven Genomic Computing), 2016–2021. The authors would like to thank Ilaria Capua, Luca Ferretti, Alice Fusaro, Susanna Lamers, Francesca Mari, Carla Mavian, Alessandra Renieri, Stephen Tsui, and Limsoon Wong for their precious contributions during the phase of requirements elicitation and for their inspiration towards future developments of this research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anna Bernasconi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bernasconi, A., Canakoglu, A., Pinoli, P., Ceri, S. (2020). Empowering Virus Sequence Research Through Conceptual Modeling. In: Dobbie, G., Frank, U., Kappel, G., Liddle, S.W., Mayr, H.C. (eds) Conceptual Modeling. ER 2020. Lecture Notes in Computer Science(), vol 12400. Springer, Cham. https://doi.org/10.1007/978-3-030-62522-1_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-62522-1_29

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-62521-4

  • Online ISBN: 978-3-030-62522-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics