Abstract
The pandemic outbreak of the coronavirus disease has attracted attention towards the genetic mechanisms of viruses. We hereby present the Viral Conceptual Model (VCM), centered on the virus sequence and described from four perspectives: biological (virus type and hosts/sample), analytical (annotations, nucleotide and amino acid variants), organizational (sequencing project) and technical (experimental technology).
VCM is inspired by GCM, our previously developed Genomic Conceptual Model, but it introduces many novel concepts, as viral sequences significantly differ from human genomes. When applied to SARS-CoV-2 virus, complex conceptual queries upon VCM are able to replicate the search results of recent articles, hence demonstrating huge potential in supporting virology research.
Our effort is part of a broad vision: availability of conceptual models for both human genomics and viruses will provide important opportunities for research, especially if interconnected by the same human being, playing the role of virus host as well as provider of genomic and phenotype information.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
SARS-CoV-2 is generally identified by the NCBI taxonomy [18] ID 2697049.
- 2.
GenoSurf: http://gmql.eu/genosurf/; ViruSurf: http://gmql.eu/virusurf/.
- 3.
- 4.
- 5.
- 6.
NGDC: https://bigd.big.ac.cn/; CNGB: https://db.cngb.org/.
- 7.
- 8.
In RNA sequencing databases uracil (U) is replaced with thymine (T).
- 9.
- 10.
- 11.
- 12.
It represents the positive-sense, single-stranded RNA virus (from 0 to the 29903\(^{th}\) base) of NC_045512 RefSeq staff-curated complete sequence (StrainName “Wuhan-Hu-1”), collected in China from a “Homo Sapiens” HostSample in December 2019.
- 13.
- 14.
- 15.
We coordinated about 50 active participants and released the “Freeze 1” version of the data dictionary on April 16, 2020 (http://gmql.eu/phenotype/).
References
Amid, C., et al.: The European nucleotide archive in 2019. Nucleic Acids Res. 48(D1), D70–D76 (2020)
Babenko, V., et al.: GUS the genomics unified schema a platform for genomics databases. http://www.gusdb.org/. Accessed 1 Aug 2020
Bairoch, A.: The cellosaurus, a cell-line knowledge resource. J. Biomol. Tech. JBT 29(2), 25 (2018)
Bernasconi, A., et al.: Exploiting conceptual modeling for searching genomic metadata: a quantitative and qualitative empirical study. In: Guizzardi, G., et al. (eds.) Advances in Conceptual Modeling, pp. 83–94. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-34146-6_8
Bernasconi, A., et al.: From a conceptual model to a knowledge graph for genomic datasets. In: Laender, A.H.F., et al. (eds.) Conceptual Modeling, pp. 352–360. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-33223-5_29
Bernasconi, A., et al.: META-BASE: a novel architecture for large-scale genomic metadata integration. IEEE/ACM Trans. Comput. Biol. Bioinform. (2020)
Bernasconi, A., et al.: The road towards data integration in human genomics: players, steps and interactions. Briefings Bioinform. 4, 80 (2020)
Bernasconi, A., et al.: Conceptual modeling for genomics: building an integrated repository of open data. In: Mayr, H.C., et al. (eds.) Conceptual Modeling, pp. 325–339. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-69904-2_26
Bonifati, A., et al.: Designing data marts for data warehouses. ACM Transactions on Software Engineering and Methodology 10(4), 452–483 (2001)
Canakoglu, A., et al.: GenoSurf: metadata driven semantic search system for integrated genomic datasets. Database 2019, 132 (2019)
Canakoglu, A., et al.: ViruSurf: an integrated database to investigate viral sequences. Nucleic Acids Research, gkaa846 (2020). https://doi.org/10.1093/nar/gkaa846
Cingolani, P., et al.: A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly 6(2), 80–92 (2012)
Consortium, G.O.: The gene ontology resource: 20 years and still going strong. Nucleic Acids Res. 47(D1), D330–D338 (2019)
Corman, V.M., et al.: Detection of 2019 novel coronavirus (2019-nCoV) by real-time RT-PCR. Eurosurveillance 25(3), 200045 (2020)
Cornell, M., et al.: GIMS: an integrated data storage and analysis environment for genomic and functional data. Yeast 20(15), 1291–1306 (2003)
De Francesco, E., et al.: A summary of genomic databases: overview and discussion. In: Biomedical Data and Applications, pp. 37–54. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-02193-0_3
Do, H.H., et al.: Flexible integration of molecular-biological annotation data: the genmapper approach. In: Bertino, E., Christodoulakis, S., Plexousakis, D., Christophides, V., Koubarakis, M., Böhm, K., Ferrari, E. (eds.) EDBT 2004. LNCS, vol. 2992, pp. 811–822. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24741-8_47
Federhen, S.: The NCBI taxonomy database. Nucleic Acids Res. 40(D1), D136–D143 (2012)
Ferrandis, A.M.M., et al.: Applying the principles of an ontology-based approach to a conceptual schema of human genome. In: Ng, W., Storey, V.C., Trujillo, J.C. (eds.) ER 2013. LNCS, vol. 8217, pp. 471–478. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41924-9_40
Flicek, P., et al.: The European Genotype Archive: Background and implementation [white paper] (2007). https://www.ebi.ac.uk/ega/sites/ebi.ac.uk.ega/files/documents/ega_whitepaper.pdf
Gudbjartsson, D.F., et al.: Spread of SARS-CoV-2 in the Icelandic population. New Engl. J. Med. 382, 2302–2315 (2020)
Guérin, E., et al.: Integrating and warehousing liver gene expression data and related biomedical resources in GEDAW. In: Ludäscher, B., Raschid, L. (eds.) DILS 2005. LNCS, vol. 3615, pp. 158–174. Springer, Heidelberg (2005). https://doi.org/10.1007/11530084_14
Hadfield, J., et al.: Nextstrain: real-time tracking of pathogen evolution. Bioinformatics 34(23), 4121–4123 (2018)
Hatcher, E.L., et al.: Virus variation resource-improved response to emergent viral outbreaks. Nucleic Acids Res. 45(D1), D482–D490 (2017)
Hulo, C., et al.: ViralZone: a knowledge resource to understand virus diversity. Nucleic Acids Res. 39, D576–D582 (2011)
Junior, I.J.M., et al.: The global population of SARS-CoV-2 is composed of six major subtypes. bioRxiv (2020)
Koonin, E.V., et al.: Global organization and proposed megataxonomy of the virus world. Microbiol. Mol. Biol. Rev. 84(2), 156 (2020)
Lescure, F.X., et al.: Clinical and virological data of the first cases of COVID-19 in Europe: a case series. The Lancet Infect. Dis. 20, 6 (2020)
Lu, G., et al.: Influenza A virus informatics: genotype-centered database and genotype annotation. In: Second International Multi-Symposiums on Computer and Computational Sciences (IMSCCS 2007), pp. 76–83. IEEE (2007)
Lu, R., et al.: Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding. The Lancet 395(10224), 565–574 (2020)
Médigue, C., et al.: Imagene: an integrated computer environment for sequence annotation and analysis. Bioinformatics (Oxford, England) 15(1), 2–15 (1999)
Needleman, S.B., et al.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48(3), 443–453 (1970)
Okayama, T., et al.: Formal design and implementation of an improved DDBJ DNA database with a new schema and object-oriented library. Bioinformatics (Oxford, England) 14(6), 472–478 (1998)
O’Leary, N.A., et al.: Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 44(D1), D733–D745 (2015)
Palacio, A.L., et al.: A method to identify relevant genome data: conceptual modeling for the medicine of precision. In: Trujillo, J.C., et al. (eds.) ER 2018. LNCS, vol. 11157, pp. 597–609. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00847-5_44
Paton, N.W., et al.: Conceptual modelling of genomic information. Bioinformatics 16(6), 548–557 (2000)
Pickett, B.E., et al.: ViPR: an open bioinformatics database and analysis resource for virology research. Nucleic Acids Res. 40(D1), D593–D598 (2012)
Nomenclature Committee of the International Union of Biochemistry (NC-IUB): Nomenclature for incompletely specified bases in nucleic acid sequences: Recommendations 1984. Proceedings of the National Academy of Sciences of the United States of America 83(1), 4–8 (1986)
UniProt Consortium: UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res. 47(D1), D506–D515 (2019)
Reyes Román, J.F., Pastor, Ó., Casamayor, J.C., Valverde, F.: Applying conceptual modeling to better understand the human genome. In: Comyn-Wattiau, I., Tanaka, K., Song, I.-Y., Yamamoto, S., Saeki, M. (eds.) ER 2016. LNCS, vol. 9974, pp. 404–412. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46397-1_31
Sayers, E.: The E-utilities in-depth: parameters, syntax and more. Entrez Programming Utilities Help [Internet] (2009). https://www.ncbi.nlm.nih.gov/books/NBK25499/
Sayers, E.W., et al.: GenBank. Nucleic Acids Res. 47(D1), D94–D99 (2019)
Sharma, D., et al.: Unraveling the web of viroinformatics: computational tools and databases in virus research. J. Virol. 89(3), 1489–1501 (2015)
Shu, Y., et al.: GISAID: Global initiative on sharing all influenza data-from vision to reality. Eurosurveill. 22(13), 30494 (2017)
Singer, J., et al.: CoV-Glue: a web application for tracking SARS-CoV-2 genomic variation (2020). Preprints 2020, 2020060225
Smith, B., et al.: The obo foundry: coordinated evolution of ontologies to support biomedical data integration. Nat. Biotechnol. 25(11), 1251–1255 (2007)
Stano, M., et al.: viruSITE-integrated database for viral genomics. Database 2016, e00152 (2016)
Tahsin, T., et al.: Named entity linking of geospatial and host metadata in genbank for advancing biomedical research. Database 2017, 93 (2017)
Tang, X., et al.: On the origin and continuing evolution of SARS-CoV-2. Nat. Sci. Rev. (2020)
Acknowledgements
This research is funded by the ERC Advanced Grant 693174 GeCo (Data-Driven Genomic Computing), 2016–2021. The authors would like to thank Ilaria Capua, Luca Ferretti, Alice Fusaro, Susanna Lamers, Francesca Mari, Carla Mavian, Alessandra Renieri, Stephen Tsui, and Limsoon Wong for their precious contributions during the phase of requirements elicitation and for their inspiration towards future developments of this research.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Bernasconi, A., Canakoglu, A., Pinoli, P., Ceri, S. (2020). Empowering Virus Sequence Research Through Conceptual Modeling. In: Dobbie, G., Frank, U., Kappel, G., Liddle, S.W., Mayr, H.C. (eds) Conceptual Modeling. ER 2020. Lecture Notes in Computer Science(), vol 12400. Springer, Cham. https://doi.org/10.1007/978-3-030-62522-1_29
Download citation
DOI: https://doi.org/10.1007/978-3-030-62522-1_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-62521-4
Online ISBN: 978-3-030-62522-1
eBook Packages: Computer ScienceComputer Science (R0)