Abstract
The protein function is tightly related to classification of proteins in hierarchical levels where proteins share same or similar functions. One of the most relevant protein classification schemes is the structural classification of proteins (SCOP). The SCOP scheme has one negative drawback; due to its manual classification methods, the dynamic of classification of new proteins is much slower than the dynamic of discovering novel protein structures in the protein data bank (PDB). In this work, we propose two approaches for automated protein classification. We extract protein descriptors from the structural coordinates stored in the PDB files. Then we apply C4.5 algorithm to select the most appropriate descriptor features for protein classification based on the SCOP hierarchy. We propose novel classification approach by introducing a bottom-up classification flow, and a multi-level classification approach. The results show that these approaches are much faster than other similar algorithms with comparable accuracy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Marsolo, K., Parthasarathy, S., Ding, C.: A Multi-Level Approach to SCOP Fold Recognition. In: IEEE Symposium on Bioinformatics and Bioeng., pp. 57–64 (2005)
Murzin, A.G., Brenner, S.E., Hubbard, T., Chothia, C.: Scop: a structural classification of proteins database for the investigation of sequences and structures. Journal of Molecular Biology 247, 536–540 (1995)
Camoğlu, O., Can, T., Singh, A.K., Wang, Y.F.: Decision tree based information integration for automated protein classification. Journal of Bioinformatics and Computational Biology 3(3), 717–724 (2005)
Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., Bourne, P.E.: The Protein Data Bank. Nucleic Acids Res. 28, 235–242 (2000)
Shindyalov, H.N., Bourne, P.E.: Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. Protein Engineering 9, 739–747 (1998)
Ortiz, A.R., Strauss, C.E., Olmea, O.: Mammoth: An automated method for model comparison. Protein Science 11, 2606–2621 (2002)
Holm, L., Sander, C.: Protein structure comparison by alignment of distance matrices. Journal of Molecular Biology 233, 123–138 (1993)
Cheek, S., Qi, Y., Krishna, S.S., Kinch, L.N., Grishin, N.V.: SCOPmap: Automated assignment of protein structures to evolutionary superfamilies. BMC Bioinformatics 5, 197–221 (2004)
Tung, C.H., Yang, J.M.: FastSCOP: a fast web server for recognizing protein structural domains and SCOP superfamilies. Nucleic Acids Res. 35, W438–W443 (2007)
Holm, L., Sander, C.: Dali: a network tool for protein structure comparison. Trends in Biochemical Science 20, 478–480 (1995)
Sadreyev, R., Grishin, N.: COMPASS: a tool for comparison of multiple protein alignments with assessment of statistical significance. J. Mol. Biol. 326, 317–336 (2003)
Yang, J.M., Tung, C.H.: Protein structure database search and evolutionary classification. Nucleic Acids Research 34, 3646–3659 (2006)
Kalajdziski, S., Mirceva, G., Trivodaliev, K., Davcev, D.: Protein Classification by Matching 3D Structures. In: Frontiers in the Convergence of Bioscience and Information Technologies 2007, Jeju Island, Korea, pp. 147–152 (2007)
Chi, P.H.: Efficient protein tertiary structure retrievals and classifications using content based comparison algorithms. PhD thesis, University of Missouri-Columbia (2007)
Holm, L., Sander, C.: The FSSP Database: Fold Classification Based on Structure-Structure Alignment of Proteins. Nucleic Acids Research 24, 206–210 (1996)
Orengo, C.A., Michie, A.D., Jones, D.T., Swindells, M.B., Thornton, J.M.: CATH - A hierarchic classif. of protein domain structures. Structure 5(8), 1093–1108 (1997)
Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research 25(17), 3389–3402 (1997)
Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. Journal of Molecular Biology 215(3), 403–410 (1990)
Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Bio. 48(3), 443–453 (1970)
Madej, T., Gibrat, J.F., Bryant, S.H.: Threading a database of protein cores. Proteins 23, 356–369 (1995)
Tung, C.H., Huang, J.W., Yang, J.M.: Kappa-alpha plot derived structural alphabet and BLOSUM-like substitution matrix for rapid search of protein structure database. Genome Biology 8(3), 31–46 (2007)
Clare, A.: Machine learning and data mining for yeast functional genomics. PhD thesis, University of Wales Aberystwyth (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kalajdziski, S., Pepik, B., Ivanovska, I., Mirceva, G., Trivodaliev, K., Davcev, D. (2010). Automated Structural Classification of Proteins by Using Decision Trees and Structural Protein Features. In: Davcev, D., Gómez, J.M. (eds) ICT Innovations 2009. ICT Innovations 2009. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-10781-8_15
Download citation
DOI: https://doi.org/10.1007/978-3-642-10781-8_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-10780-1
Online ISBN: 978-3-642-10781-8
eBook Packages: EngineeringEngineering (R0)