Abstract
There are over 1100 different databases available containing primary and derived data of interest to research biologists. It is inevitable that many of these databases contain overlapping, related or conflicting information. Data integration methods are being developed to address these issues by providing a consolidated view over multiple databases. However, a key challenge for data integration is the identification of links between closely related entries in different life sciences databases when there is no direct information that provides a reliable cross-reference. Here we describe and evaluate three data integration methods to address this challenge in the context of a graph-based data integration framework (the ONDEX system). A key result presented in this paper is a quantitative evaluation of their performance in two different situations: the integration and analysis of different metabolic pathways resources and the mapping of equivalent elements between the Gene Ontology and a nomenclature describing enzyme function.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Köhler, J., Baumbach, J., Taubert, J., Specht, M., Skusa, A., Rueegg, A., Rawlings, C., Verrier, P., Philippi, S.: Graph-based analysis and visualization of experimental results with ONDEX. Bioinformatics 22, 1383–1390 (2006)
Gaylord, M., Calley, J., Qiang, H., Su, E.W., Liao, B.: A flexible integration and visualisation system for biomarker discovery. Applied bioinformatics 5, 219–223 (2006)
Fischer, H.P.: Towards quantitative biology: integration of biological information to elucidate disease pathways and to guide drug discovery. Biotechnol. Annu. Rev. 11, 1–68 (2005)
Etzold, T., Ulyanov, A., Argos, P.: SRS: information retrieval system for molecular biology data banks. Methods Enzymol. 266, 114–128 (1996)
Baitaluk, M., Qian, X., Godbole, S., Raval, A., Ray, A., Gupta, A.: PathSys: integrating molecular interaction graphs for systems biology. BMC bioinformatics 7, 55 (2006)
Küntzer, J., Blum, T., Gerasch, A., Backes, C., Hildebrandt, A., Kaufmann, M., Kohlbacher, O., Lenhof, H.-P.: BN++ - A Biological Information System. Journal of Integrative Bioinformatics 3 (2006)
Köhler, J., Rawlings, C., Verrier, P., Mitchell, R., Skusa, A., Ruegg, A., Philippi, S.: Linking experimental results, biological networks and sequence analysis methods using Ontologies and Generalized Data Structures. Silico. Biol. 5, 33–44 (2004)
Shannon, P., Markiel, A., Ozier, O., Baliga, N.S., Wang, J.T., Ramage, D., Amin, N., Schwikowski, B., Ideker, T.: Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504 (2003)
RIKEN: Semantic Web Folders (2009)
Köhler, J., Philippi, S., Specht, M., Rüegg, A.: Ontology based text indexing and querying for the semantic web. Know.-Based Syst. 19, 744–754 (2006)
Bairoch, A.: The ENZYME database in 2000. Nucleic Acids Res. 28, 304–305 (2000)
Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., Harris, M.A., Hill, D.P., Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese, J.C., Richardson, J.E., Ringwald, M., Rubin, G.M., Sherlock, G.: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25, 25–29 (2000)
Kanehisa, M., Goto, S., Kawashima, S., Okuno, Y., Hattori, M.: The KEGG resource for deciphering the genome. Nucleic Acids Res. 32, D277–D280 (2004)
Mueller, L.A., Zhang, P., Rhee, S.Y.: AraCyc: a biochemical pathway database for Arabidopsis. Plant Physiol. 132, 453–460 (2003)
Taubert, J., Sieren, K.P., Hindle, M., Hoekman, B., Winnenburg, R., Philippi, S., Rawlings, C., Köhler, J.: The OXL format for the exchange of integrated datasets. Journal of Integrative Bioinformatics 4 (2007)
Smith, B.: Beyond Concepts: Ontology as Reality Representation. In: Varzi, A., Vieu, L. (eds.) Proceedings of FOIS (2004)
Baldwin, T.K., Winnenburg, R., Urban, M., Rawlings, C., Köhler, J., Hammond-Kosack, K.E.: PHI-base provides insights into generic and novel themes of pathogenicity. Molecular Plant-Microbe Interactions 19, 1451–1462 (2006)
Winnenburg, R., Baldwin, T.K., Urban, M., Rawlings, C., Köhler, J., Hammond-Kosack, K.E.: PHI-base: A new database for Pathogen Host Interactions. Nucleic Acids Res. 34(Database issue), D459–D464 (2006)
Köhler, J., Munn, K., Rüegg, A., Skusa, A., Smith, B.: Quality Control for Terms and Definitions in Ontologies and Taxonomies. BMC Bioinformatics 7, 212 (2006)
Goutte, C., Gaussier, É.: A Probabilistic Interpretation of Precision, Recall and F-Score, with Implication for Evaluation. In: Losada, D.E., Fernández-Luna, J.M. (eds.) ECIR 2005. LNCS, vol. 3408, pp. 345–359. Springer, Heidelberg (2005)
Green, M.L., Karp, P.D.: The outcomes of pathway database computations depend on pathway ontology. Nucl. Acids Res. 34, 3687–3697 (2006)
Buntrock, R.E.: Chemical registries–in the fourth decade of service. J. Chem. Inf. Comput. Sci. 41, 259–263 (2001)
Meinke, D.: Genetic nomenclature guide. Arabidopsis thaliana. Trends in genetics, 22–23 (1995)
Zhang, L., Gu, J.-G.: Ontology based semantic mapping architecture. In: Fourth International Conference on Machine Learning and Cybernetics. IEEE, Los Alamitos (2005)
Nov, N.: The Prompt Tab, vol. 2008 (2005)
Marquet, G., Mosser, J., Burgun, A.: A method exploiting syntactic patterns and the UMLS semantics for aligning biomedical ontologies: The case of OBO disease ontologies. Int. J. Med. Inform. 76(suppl. 3), S353–S361 (2007)
Racunas, S.A., Shah, N.H., Fedoroff, N.V.: A case study in pathway knowledgebase verification. BMC bioinformatics 7, 196 (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Taubert, J., Hindle, M., Lysenko, A., Weile, J., Köhler, J., Rawlings, C.J. (2009). Linking Life Sciences Data Using Graph-Based Mapping. In: Paton, N.W., Missier, P., Hedeler, C. (eds) Data Integration in the Life Sciences. DILS 2009. Lecture Notes in Computer Science(), vol 5647. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02879-3_3
Download citation
DOI: https://doi.org/10.1007/978-3-642-02879-3_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-02878-6
Online ISBN: 978-3-642-02879-3
eBook Packages: Computer ScienceComputer Science (R0)