Abstract
The Linking Open Data (LOD) cloud is a global data space for publishing and linking structured data on the Web. The idea is to facilitate the integration, exchange, and processing of data. The LOD cloud already includes a lot of datasets that are related to the biological area. Nevertheless, most of the datasets about protein interactions do not use metadata standards. This means that they do not follow the LOD requirements and, consequently, hamper data integration. This problem has impacts on the information retrieval, specially with respect to datasets provenance and reuse in further prediction experiments. This paper proposes an ontology to describe and unite the four main kinds of data in a single prediction experiment environment: (i) information about the experiment itself; (ii) description and reference to the datasets used in an experiment; (iii) information about each protein involved in the candidate pairs. They correspond to the biological information that describes them and normally involves integration with other datasets; and, finally, (iv) information about the prediction scores organized by evidence and the final prediction. Additionally, we also present some case studies that illustrate the relevance of our proposal, by showing how queries can retrieve useful information.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
- 13.
- 14.
- 15.
- 16.
- 17.
- 18.
- 19.
- 20.
prefix ppi: <https://www.ypublish.info/protein_interaction_domain_ontology#>
prefix annot: <https://www.ypublish.info/protein_annotation_information#>
prefix prov: <https://www.ypublish.info/provenance_information#>
prefix result: <https://www.ypublish.info/prediction_results_information#>
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix biopax: <http://www.biopax.org/release/biopax-level2.owl#>.
- 21.
- 22.
References
Bizer, C., Heath, T., Berners-Lee, T.: Linked data - the story so far. Int. J. Semant. Web Inf. Syst. 5(3), 1–22 (2009)
Cannataro, M., Guzzi, P.H., Veltri, P.: Using ontologies for querying and analysing protein-protein interaction data. Procedia Comput. Sci. 1(1), 997–1004 (2010)
Chang, J.W., Zhou, Y.Q., Ul Qamar, M., Chen, L.L., Ding, Y.D.: Prediction of protein-protein interactions by evidence combining methods. Int. J. Mol. Sci. 17(11), 1946 (2016)
Cuevas-Vicenttín, V., et al.: ProvONE: a PROV extension data model for scientific workflow provenance. DataOne Project (2014)
De Las Rivas, J., Fontanillo, C.: Protein-protein interactions essentials: key concepts to building and analyzing interactome networks. PLoS Comput. Biol. 6(6), e1000807 (2010)
Demir, E., et al.: The biopax community standard for pathway data sharing. Nat. Biotechnol. 28(9), 935 (2010)
Esteves, D., et al.: MEX vocabulary: a lightweight interchange format for machine learning experiments. In: Proceedings of the 11th International Conference on Semantic Systems, pp. 169–176. ACM (2015)
TWSW Group: Sparql 1.1 overview (2013). https://www.w3.org/TR/sparql11-overview/. Accessed 02 Dec 2015
Guzzi, P.H., Mina, M., Guerra, C., Cannataro, M.: Semantic similarity analysis of protein data: assessment with biological features and issues. Brief. Bioinform. 13(5), 569–585 (2011)
Kazemzadeh, L., Kamdar, M.R., Beyan, O.D., Decker, S., Barry, F.: LinkedPPI: enabling intuitive, integrative protein-protein interaction discovery. In: Proceedings of the 4th Workshop on Linked Science 2014 - Making Sense Out of Data (LISC 2014) co-located with the 13th International Semantic Web Conference (ISWC 2014), Riva del Garda, Italy, 19 October 2014, pp. 48–59 (2014)
Li, Y., Ilie, L.: Sprint: ultrafast protein-protein interaction prediction of the entire human interactome. BMC Bioinform. 18(1), 485 (2017)
Mosca, R., Céol, A., Stein, A., Olivella, R., Aloy, P.: 3DID: a catalog of domain-based interactions of known three-dimensional structure. Nucleic Acids Res. 42(D1), D374–D379 (2013)
Newman, A., Hunter, J., Li, Y.F., Bouton, C., Davis, M.: BioMANTA ontology: the integration of protein-protein interaction data (2008)
Perfetto, L., et al.: Causaltab: Psi-mitab 2.8 updated format for signaling data representation and dissemination. BioRxiv, p. 385773 (2018)
Sicilia, M.Á., García-Barriocanal, E., Sánchez-Alonso, S., Mora-Cantallops, M., Cuadrado, J.-J.: Ontologies for data science: on its application to data pipelines. In: Garoufallou, E., Sartori, F., Siatri, R., Zervas, M. (eds.) MTSR 2018. CCIS, vol. 846, pp. 169–180. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-14401-2_16
Studer, R., Benjamins, V.R., Fensel, D.: Knowledge engineering: principles and methods. Data Knowl. Eng. 25(1–2), 161–197 (1998)
Acknowledgements
This work was partially funded by CAPES, CNPq, and FAPERJ.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Martins, Y.C., Cavalcanti, M.C., Arge, L.W.P., Ziviani, A., de Vasconcelos, A.T.R. (2019). OntoPPI: Towards Data Formalization on the Prediction of Protein Interactions. In: Garoufallou, E., Fallucchi, F., William De Luca, E. (eds) Metadata and Semantic Research. MTSR 2019. Communications in Computer and Information Science, vol 1057. Springer, Cham. https://doi.org/10.1007/978-3-030-36599-8_23
Download citation
DOI: https://doi.org/10.1007/978-3-030-36599-8_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-36598-1
Online ISBN: 978-3-030-36599-8
eBook Packages: Computer ScienceComputer Science (R0)