Abstract
Post-translational modification (PTM) is considered a significant biological process with a tremendous impact on the function of proteins in both eukaryotes, and prokaryotes cells. Malonylation of lysine is a newly discovered post-translational modification, which is associated with many diseases, such as type 2 diabetes and different types of cancer. In addition, compared with the experimental identification of propionylation sites, the calculation method can save time and reduce cost. In this paper, we combine principal component analysis with support vector machine (SVM) to propose a new computational model - Mal-prec (malonylation prediction). Firstly, the one-hot encoding, physicochemical properties and the composition of k-spacer acid pairs were used to extract sequence features. Secondly, we preprocess the data, select the best feature subset by principal component analysis (PCA), and predict the malonylation sites by SVM. And then, we do a five-fold cross validation, and the results show that compared with other methods, Mal-prec can get better prediction performance. In the 10-fold cross validation of independent data sets, AUC (area under receiver operating characteristic curve) analysis has reached 96.39%. Mal-pred is used to identify the malonylation sites in the protein sequence, which is a computationally reliable method. It is superior to the existing prediction tools that found in the literature and can be used as a useful tool for identifying and discovering novel malonylation sites in human proteins.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Molinie, B., Giallourakis, C.C.: Genome-Wide Location Analyses of N6-Methyladenosine Modifications (m6A-Seq), pp. 45–53. Humana Press (2017)
Nye, T.M., van Gijtenbeek, L.A., Stevens, A.G.: Methyltransferase DnmA is responsible for genome-wide N6-methyladenosine modifications at non-palindromic recognition sites in Bacillus subtilis. Nucleic Acids Res. 48, 5332–5348 (2020)
O’Brown, Z.K., Greer, E.L.: N6-methyladenine: a conserved and dynamic DNA mark. In: Jeltsch, A., Jurkowska, R.Z. (eds.) DNA Methyltransferases - Role and Function. AEMB, vol. 945, pp. 213–246. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-43624-1_10
Zhang, G., Huang, H., Liu, D.: N6-methyladenine DNA modification in Drosophila. Cell 161(4), 893–906 (2015)
Janulaitis, A., Klimašauskas, S., Petrušyte, M.: Cytosine modification in DNA by BCNI methylase yields N4-methylcytosine. FEBS Lett. 161, 131–134 (1983)
Unger, G., Venner, H.: Remarks on minor bases in spermatic desoxyribonucleic acid. Hoppe-Seyler’s Zeitschrift fur physiologische Chemie 344, 280–283 (1966)
Fu, Y.: N6-methyldeoxyadenosine marks active transcription start sites in Chlamydomonas. Cell 161, 879–892 (2015)
Greer, E.L., Blanco, M.A., Gu, L.: DNA methylation on N6-adenine in C. elegans. Cell 161, 868–878 (2015)
Wu, T.P., Wang, T., Seetin, M.G.: DNA methylation on N6-adenine in mammalian embryonic stem cells. Nature 532, 329–333 (2016)
Xiao, C.L., Zhu, S., He, M.: N-methyladenine DNA modification in the human genome. Mol. Cell 71, 306–318 (2018)
Zhou, C., Wang, C., Liu, H.: Identification and analysis of adenine N6-methylation sites in the rice genome. Nat. Plants 4, 554–563 (2018)
Chen, W., Lv, H., Nie, F.: i6mA-Pred: identifying DNA N6-methyladenine sites in the rice genome. Bioinformatics 2796–2800 (2019)
Almagor, H.: A Markov analysis of DNA sequences. J. Theor. Biol. 104, 633–645 (1983)
Borodovsky, M., Mclninch, J.D., Koonin, E.V.: Detection of new genes in a bacterial genome using Markov models for three gene classes. Nucleic Acids Res. 17, 3554–3562 (1995)
Durbin, R., Eddy, S.R., Krogh, A.: Biological Sequence Analysis Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, Cambridge (1998)
Ohler, U., Harbeck, S., Niemann, H.: Interpolated Markov chains for Eukaryotic promoter recognition. Bioinformatics 362–369 (1999)
Reese, M.G., Eeckman, F.H., Kulp, D.: Improved splice site detection in Genie. J. Comput. Biol. 311–323 (1997)
Wren, J.D., Hildebrand, W.H., Chandrasekaran, S.: Markov model recognition and classification of DNA/protein sequences within large text databases. Bioinformatics 4046–4053 (2005)
Acknowledgement
This work is supported by the fundamental Research Funds for the Central Universities, 2020QN89, Xuzhou science and technology plan project (KC19142), the talent project of ‘Qingtan scholar’ of Zaozhuang University, Jiangsu Provincial Natural Science Foundation, China (SBK2019040953), Youth Innovation Teamof Scientific Research Foundation of the Higher Education Institutions of Shandong Province, China (2019KJM006), the Key Research Program of the Science Foundation of Shandong Province (ZR2020KE001), the PhD research startup foundation of Zaozhuang University (2014BS13) and Zaozhuang University Foundation (2015YY02), the Natural Science Foundation of China (61902337), Natural Science Fund for Colleges and Universities in Jiangsu Province (19KJB520016), Xuzhou Natural Science Foundation KC21047 and Young talents of science and technology in Jiangsu.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Chen, B., Gu, Y., Yang, B., Bao, W. (2022). The Identifications of Post Translational Modification Sites with Capsule Network. In: Jiang, D., Song, H. (eds) Simulation Tools and Techniques. SIMUtools 2021. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 424. Springer, Cham. https://doi.org/10.1007/978-3-030-97124-3_42
Download citation
DOI: https://doi.org/10.1007/978-3-030-97124-3_42
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-97123-6
Online ISBN: 978-3-030-97124-3
eBook Packages: Computer ScienceComputer Science (R0)