Abstract
In this paper, we adapt a statistical learning approach, inspired by automated topic segmentation techniques in speech-recognized documents to the challenging protein segmentation problem in the context of G-protein coupled receptors (GPCR). Each GPCR consists of 7 transmembrane helices separated by alternating extracellular and intracellular loops. Viewing the helices and extracellular and intracellular loops as 3 different topics, the problem of segmenting the protein amino acid sequence according to its secondary structure is analogous to the problem of topic segmentation. The method presented involves building an n-gram language model for each ‘topic’ and comparing their performance in predicting the current amino acid, to determine whether a boundary occurs at the current position. This presents a distinctly different approach to protein segmentation from the Markov models that have been used previously and its commendable results is evidence of the benefit of applying machine learning and language technologies to bioinformatics.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Rost, B.: Review: protein secondary structure prediction continues to rise. J. Struct. Biol. 134(2-3), 204–218 (2001)
Chen, C.P., Kernytsky, A., Rost, B.: Transmembrane helix predictions revisited. Protein Science 11(12), 2774–2791 (2002)
Chen, C.P., Rost, B.: State-of-the-art in membrane protein prediction. Applied Bioinformatics 1(1), 21–35 (2002)
Bateman, A., et al.: The Pfam protein families database. Nucleic Acids Res. 30(1), 276–280 (2002)
Okada, T., et al.: Functional role of internal water molecules in rhodopsin revealed by X-ray crystallography. Proc. Natl. Acad. Sci. U.S.A. 99(9), 5982–5987 (2002)
Palczewski, K.: Crystal structure of rhodopsin: implication for vision and beyond. Mechanisms of activation. Scientific World Journal 2(1 suppl. 2), 106–107 (2002)
Muller, G.: Towards 3D structures of G protein-coupled receptors: a multidisciplinary approach. Current Medical Chemistry 7(9), 861–888 (2000)
Kyte, J., Doolittle, R.F.: A simple method for displaying the hydropathic character of a protein. J. Mol. Biol. 157(1), 105–132 (1982)
Sonnhammer, E.L., von Heijne, G., Krogh, A.: A hidden Markov model for predicting transmembrane helices in protein sequences. In: Proc. Int. Conf. Intell. Syst. Mol. Biol., vol. 6, pp. 175–182 (1998)
Pasquier, C., et al.: A novel method for predicting transmembrane segments in proteins based on a statistical analysis of the SwissProt database: the PRED-TMR algorithm. Protein Engineering 12(5), 381–385 (1999)
Schmidler, S.C., Liu, J.S., Brutlag, D.L.: Bayesian segmentation of protein secondary structure. Journal of Computational Biology 7(1-2), 233–248 (2000)
Jones, D.T., Taylor, W.R., Thornton, J.M.: A model recognition approach to the prediction of all-helical membrane protein structure and topology. Biochemistry 33(10), 3038–3049 (1994)
Tusnady, G.E., Simon, I.: Principles governing amino acid composition of integral membrane proteins: application to topology prediction. J. Mol. Biol. 283(2), 489–506 (1998)
Kernytsky, A., Rost, B.: Static benchmarking of membrane helix predictions. Nucleic Acids Research 31(13), 3642–3644 (2003)
Beeferman, D., Berger, A., Lafferty, J.: Statistical Models for Text Segmentation. Machine Learning, Special Issue on Natural Language Learning 34(1-3), 177–210 (1999)
Weisser, D., Klein-Seetharaman, J.: Identification of Fundamental Building Blocks in Protein Sequences Using Statistical Association Measures. In: ACM SIG Proceedings (2004) (in press)
Ganapathiraju, M., et al.: Yule values tables from protein datasets of different categories: emphasis on membrane proteins. In: Biological Language Conference, Pittsburgh, PA, USA (2003)
Horn, F., et al.: GPCRDB: an information system for G-protein coupled receptors. Nucleic Acids Research 26(1), 275–279 (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Cheng, B.Y.M., Carbonell, J.G., Klein-Seetharaman, J. (2005). A Machine Text-Inspired Machine Learning Approach for Identification of Transmembrane Helix Boundaries. In: Hacid, MS., Murray, N.V., Raś, Z.W., Tsumoto, S. (eds) Foundations of Intelligent Systems. ISMIS 2005. Lecture Notes in Computer Science(), vol 3488. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11425274_3
Download citation
DOI: https://doi.org/10.1007/11425274_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25878-0
Online ISBN: 978-3-540-31949-8
eBook Packages: Computer ScienceComputer Science (R0)