Abstract
Protein structural classification is critical in bioinformatics. In this study, a simple and connected graph was used to represent a 3D protein structure in which each node represented an amino acid and each edge represented a contact distance between two amino acids. The B-factor (atomic displacement parameters) was then used to substantially reduce the number of nodes and edges in each graph representation. A graph mining approach was applied to determine the critical subgraphs among these graphs, which can be applied to classify protein structural families. An experimental study was conducted in which characteristic substructural patterns were identified in several protein families in the SCOP database.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
In the test protein graph G, the same labeled node is enumerated once.
- 2.
In general, the nodes of candidate subgraph are between 10 and 15.
References
Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., Limpman, D.J.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25(17), 3389–3402 (1997)
Aloy, P., Querol, E., Aviles, F.X., Sternberg, M.J.E.: Automates structure-based prediction of functional sites in proteins: applications to assessing the validity of inheriting protein function from homology in genome annotation and to protein docking. J. Mol. Biol. 311(2), 395–408 (2001)
Aung, Z., Tan, K.L.: Automatic protein structure classification through structural fingerprinting. In: 4th IEEE Symposium on Bioinformatics and Bioengineering, pp. 508–515 (2004)
Bandyopadhyay, D., Huan, J., Liu, J., Prins, J., Snoeyink, J., Tropsha, A., Wang, W.: Using Fast Subgraph Isomorphism Checking for Protein Functional Annotation Using SCOP and Gene Ontology. Technical report, The University of North Carolina at Chapel Hill Department of Computer Science (2005)
Bandyopadhyay, D., Huan, J., Liu, J., Prins, J., Snoeyink, J., Wang, W., Tropsha, A.: Functional neighbors: inferring relationships between nonhomologous protein families using family-specific packing motifs. IEEE Trans. Inf Technol. Biomed. 14(5), 1137–1143 (2010)
Henikoff, S., Henikoff, J.G., Pietrokovski, S.: Blocks + : a non-redundant database of protein alignment blocks derived from multiple compilations. Bioinformatics 15(6), 471–479 (1999)
Holder, L.B., Cook, D.J., Djoko, S.: Substructure discovery in the SUBDUE system. In: Association for the Advancement of Artificial Intelligence Workshop on Knowledge Discovery in Database (AAAI), pp. 169–180 (1994)
Huan, J., Bandyopadhyay, D., Wang, W., Snoeyink, J., Prins, J., Tropsha, A.: Comparing graph representations of protein structure for mining family-specific residue-based packing motifs. J. Comput. Biol. 12(6), 657–671 (2005)
Huan, J., Wang, W., Bandyopadhyay, D., Snoeyink, J., Prins, J., Tropsha, A.: Mining protein family specific residue packing patterns from protein structure graphs. In: 8th Annual International Conference on Research in Computational Molecular Biology (RECOMB), pp. 308–315 (2004)
Huan, J., Wang, W., Prins, J.: Efficient mining of frequent subgraph in the presence of isomorphism. In: 3th IEEE International Conference on Data Mining (ICDM), pp. 549–552 (2003)
Kent, W.J.: BLAT-the BLAST-like alignment tool. Genome Res. 12(4), 656–664 (2000)
Krishna, V., Suri, N.N.R.R., Athithan, G.: A comparative survey of algorithms for frequent subgraph discovery. Curr. Sci. 100(25), 190–198 (2011)
Kuramochi, M., Karypis, G.: Frequent subgraph discovery. In: 1st IEEE Conference on Data Mining (ICDM), pp. 313–320 (2001)
Laberge, M., Yonetani, T.: Common dynamics of globin family proteins. Int. Union Biochem. Mol. Biol. 59(8), 528–534 (2007)
Lam, W.W.M., Chan, K.C.C.: A graph mining algorithm for classifying chemical compounds. In: IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 321–324 (2008)
Nijssen, S., Kok, J.N.: A quickstart in frequent structure mining can make a difference. In: 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 647–652 (2004)
Nijssen, S., Kok, J.N.: Frequent graph mining and its application to molecular databases. In: IEEE International Conference on Systems, Man and Cybernetics (SMC), 5, pp. 4571–4577 (2004)
Petros, A.M., Olejniczak, E.T., Fesik, S.W.: Structural biology of the Bcl-2 family of proteins. Biochim. et Biophys Acta (BBA)-Mol. Cell Res. 1644(2), 83–94 (2004)
Remold-O’Donnell, E.: The ovalbumin family of serpin proteins. Fed. Eur. Biochem. Societeies Lett. 315(2), 105–108 (1993)
Wackersreuther, B., Wackersreuther, P., Oswald, A.: Frequent subgraph discovery in dynamic networks. In: 8th Workshop on Mining and Learning with Graphs (MLG), pp. 155–162 (2010)
Weskamp, N., Kuhn, D., Hllermeier, E., Klebe, G.: Efficient similarity search in protein structure databases by k-clique hashing. Bioinformatics 20(10), 1522–1526 (2005)
Williams, D.W., Huan, J., Wang, W.: Graph database indexing using structured graph decomposition. In: 23th International Conference on Data Engineering (ICDE), pp. 976–975 (2007)
Yan, X., Han, J.: gSpan: Graph-based substructure pattern mining. In: 3th IEEE International Conference on Data Mining (ICDM), pp. 721–724 (2002)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Hsieh, SY., Lee, CW., Yang, ZY., Wang, HW., Yu, JH. (2015). A Novel Algorithm for Classifying Protein Structure Familiar by Using the Graph Mining Approach. In: Huang, DS., Bevilacqua, V., Premaratne, P. (eds) Intelligent Computing Theories and Methodologies. ICIC 2015. Lecture Notes in Computer Science(), vol 9225. Springer, Cham. https://doi.org/10.1007/978-3-319-22180-9_72
Download citation
DOI: https://doi.org/10.1007/978-3-319-22180-9_72
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-22179-3
Online ISBN: 978-3-319-22180-9
eBook Packages: Computer ScienceComputer Science (R0)