Abstract
Machine learning methods are increasingly being applied to model and predict biomolecular interactions, while efficient feature representation plays a vital role. To this end, a unified biological sequence deep representation learning framework BioSeq2vec is proposed to extract discriminative features of any type of biological sequence. For arbitrary-length sequence input, the BioSeq2vec produces fixed-length efficient feature representation, which can be applied to various learning models. The performance of BioSeq2vec is evaluated on lncRNA-protein interaction prediction tasks. Experimental results reveal the superior performance of BioSeq2vec in biological sequence feature representation and broad prospects in various genome informatics and computational biology studies.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Crick, F.: Central dogma of molecular biology. Nature 227(5258), 561–563 (1970)
Chen, X., Yan, C.C., Zhang, X., You, Z.H.: Long non-coding RNAs and complex diseases: from experimental results to computational models. Brief. Bioinform. 18(4), 558 (2016)
Li, J., Shi, X., You, Z., Chen, Z., Lin, Q., Fang, M.: Using weighted extreme learning machine combined with scale-invariant feature transform to predict protein-protein interactions from protein evolutionary information, pp. 527–532
Luo, X., et al.: Incorporation of efficient second-order solvers into latent factor models for accurate prediction of missing QoS data. IEEE Trans. Cybern. 48(4), 1216–1228 (2018)
Wang, L., et al.: An improved efficient rotation forest algorithm to predict the interactions among proteins. Soft. Comput. 22(10), 3373–3381 (2017). https://doi.org/10.1007/s00500-017-2582-y
Wang, Y., et al.: Predicting protein interactions using a deep learning method-stacked sparse autoencoder combined with a probabilistic classification vector machine. Complexity 2018 (2018)
Li, S., Zhou, M., Luo, X., You, Z.-H.: Distributed winner-take-all in dynamic networks. IEEE Trans. Autom. Control 62(2), 577–589 (2017)
Huang, Y.-A., Chan, K.C., You, Z.-H.: Constructing prediction models from expression profiles for large scale lncRNA–miRNA interaction profiling. Bioinformatics 34(5), 812–819 (2017)
Hu, L., Hu, P., Yuan, X., Luo, X., You, Z.: Incorporating the coevolving information of substrates in predicting HIV-1 protease cleavage sites. IEEE/ACM Trans. Comput. Biol. Bioinf 1 (2019)
Li, J.-Q., You, Z.-H., Li, X., Ming, Z., Chen, X.: PSPEL: in silico prediction of self-interacting proteins from amino acids sequences using ensemble learning. IEEE/ACM Trans. Comput. Biol. Bioinf. (TCBB) 14(5), 1165–1172 (2017)
Yi, H.-C., You, Z.-H., Huang, D.-S., Guo, Z.-H., Chan, K.C.C., Li, Y.: Learning representations to predict intermolecular interactions on large-scale heterogeneous molecular association network. iScience 23(7), 101261 (2020)
Chen, X., Xie, D., Zhao, Q., You, Z.H.: MicroRNAs and complex diseases: from experimental results to computational models. Brief. Bioinf. 20, 515–539 (2017)
Luo, X., Zhou, M., Li, S., You, Z., Xia, Y., Zhu, Q.: A nonnegative latent factor model for large-scale sparse matrices in recommender systems via alternating direction method. IEEE Trans. Neural Netw. Learn. Syst. 27(3), 579–592 (2016)
You, Z.-H., Yin, Z., Han, K., Huang, D.-S., Zhou, X.: A semi-supervised learning approach to predict synthetic genetic interactions by combining functional and topological properties of functional gene network. BMC Bioinf. 11(1), 343 (2010)
Chen, X., Huang, Y.-A., You, Z.-H., Yan, G.-Y., Wang, X.-S.: A novel approach based on KATZ measure to predict associations of human microbiota with non-infectious diseases. Bioinformatics 33(5), 733–739 (2016)
Huang, Y.-A., You, Z.-H., Chen, X., Chan, K., Luo, X.: Sequence-based prediction of protein-protein interactions using weighted sparse representation model combined with global encoding. BMC Bioinf. 17(1), 184 (2016)
Wang, L., et al.: Advancing the prediction accuracy of protein-protein interactions by utilizing evolutionary information from position-specific scoring matrix and ensemble classifier. J. Theor. Biol. 418, 105–110 (2017)
You, Z.-H., Huang, W., Zhang, S., Huang, Y.-A., Yu, C.-Q., Li, L.-P.: An efficient ensemble learning approach for predicting protein-protein interactions by integrating protein primary sequence and evolutionary information. IEEE/ACM Trans. Computat. Biol. Bioinf. 16, 809–817 (2018)
You, Z.H., Zhu, L., Zheng, C.H., Yu, H.J., Deng, S.P., Ji, Z.: Prediction of protein-protein interactions from amino acid sequences using a novel multi-scale continuous and discontinuous feature set. BMC Bioinf. 15(S15), S9 (2014)
You, Z.-H., Chan, K.C., Hu, P.: Predicting protein-protein interactions from primary protein sequences using a novel multi-scale local feature representation scheme and the random forest. PLoS One 10(5), e0125811 (2015)
You, Z.-H., Lei, Y.-K., Gui, J., Huang, D.-S., Zhou, X.: Using manifold embedding for assessing and predicting protein interactions from high-throughput experimental data. Bioinformatics 26(21), 2744–2751 (2010)
You, Z.-H., Lei, Y.-K., Zhu, L., Xia, J., Wang, B.: Prediction of protein-protein interactions from amino acid sequences with ensemble extreme learning machines and principal component analysis. BMC Bioinf. 14(Suppl 8), S10 (2013)
You, Z.-H., et al.: PRMDA: personalized recommendation-based MiRNA-disease association prediction. Oncotarget 8(49), 85568 (2017)
You, Z.-H., Yu, J.-Z., Zhu, L., Li, S., Wen, Z.-K.: A MapReduce based parallel SVM for large-scale predicting protein-protein interactions. Neurocomputing 145, 37–43 (2014)
An, J.-Y., You, Z.-H., Zhou, Y., Wang, D.-F.: Sequence-based prediction of protein-protein interactions using gray wolf optimizer-based relevance vector machine. Evol. Bioinf. 15, 1176934319844522 (2019)
Yi, H.-C., et al.: ACP-DL: a deep learning long short-term memory model to predict anticancer peptides using high-efficiency feature representation. Mol. Ther. - Nucleic Acids 17, 1–9 (2019)
Yi, H.-C., et al.: Learning distributed representations of RNA and protein sequences and its application for predicting lncRNA-protein interactions. Computat. Struct. Biotechnol. J. (2019)
Min, S., Lee, B., Yoon, S.: Deep learning in bioinformatics. Brief. Bioinform. 18(5), 851–869 (2016)
You, Z.-H., et al.: PBMDA: A novel and effective path-based computational model for miRNA-disease association prediction. PLoS Comput. Biol. 13(3), e1005455 (2017)
You, Z.-H., Zhou, M., Luo, X., Li, S.: Highly efficient framework for predicting interactions between proteins. IEEE Trans. Cybern. 47(3), 731–743 (2017)
Guo, Z.-H., You, Z.-H., Wang, Y.-B., Huang, D.-S., Yi, H.-C., Chen, Z.-H.: Bioentity2vec: attribute- and behavior-driven representation for predicting multi-type relationships between bioentities. GigaScience 9(6), giaa032 (2020)
Guo, Z.-H., et al.: MeSHHeading2vec: a new method for representing MeSH headings as vectors based on graph embedding algorithm. Brief. Bioinf. (2020)
Guo, Z.-H., You, Z.-H., Huang, D.-S., Yi, H.-C., Chen, Z.-H., Wang, Y.-B.: A learning based framework for diverse biomolecule relationship prediction in molecular association network. Commun. Biol. 3(1), 118 (2020)
Huang, Y.-A., Hu, P., Chan, K.C.C., You, Z.-H.: Graph convolution for predicting associations between miRNA and drug resistance. Bioinformatics 36(3), 851–858 (2019)
Wong, L., Huang, Y.-A., You, Z.-H., Chen, Z.-H., Cao, M.-Y.: LNRLMI: linear neighbour representation for predicting lncRNA-miRNA interactions. J. Cell Mol. Med. 24(1), 79–87 (2020)
Wang, L., You, Z.-H., Huang, Y.-A., Huang, D.-S., Chan, K.C.C.: An efficient approach based on multi-sources information to predict circRNA-disease associations using deep convolutional neural network. Bioinformatics 36(13), 4038–4046 (2019)
Wang, Y., You, Z.-H., Yang, S., Li, X., Jiang, T.-H., Zhou, X.: A high efficient biological language model for predicting protein-protein interactions. Cells 8(2), 122 (2019)
Guo, Z.-H., You, Z.-H., Yi, H.-C.: Integrative construction and analysis of molecular association network in human cells by fusing node attribute and behavior information. Mol. Ther.-Nucleic Acids 19, 498–506 (2019)
Peng, W., Chan, K.C.C., You, Z.: Large-scale prediction of drug-target interactions from deep representations, pp. 1236–1243 (2016)
Hu, P., Huang, Y., Chan, K.C.C., You, Z.: Learning multimodal networks from heterogeneous data for prediction of lncRNA-miRNA interactions. IEEE/ACM Trans. Comput. Biol. Bioinf. 1 (2019)
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database, pp. 248–255 (2009)
Vaswani, A., et al.: Attention is all you need, pp. 5998–6008 (2017)
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014)
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks, pp. 3104–3112 (2014)
Bottou, L., Curtis, F.E., Nocedal, J.: Optimization methods for large-scale machine learning. SIAM Rev. 60(2), 223–311 (2018)
Nemirovski, A., Juditsky, A., Lan, G., Shapiro, A.: Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19(4), 1574–1609 (2009)
Frankish, A., et al.: GENCODE reference annotation for the human and mouse genomes. Nucleic Acids Res. 47(D1), D766–D773 (2019)
Pan, X., Fan, Y.X., Yan, J., Shen, H.B.: IPMiner: hidden ncRNA-protein interaction sequential pattern mining with stacked autoencoder for accurate computational prediction. BMC Genom. 17(1), 582 (2016)
Berman, H.M., et al.: The protein data bank. Nucleic Acids Res. 28(1), 235–242 (2000)
Yi, H., You, Z., Guo, Z., Huang, D., Chan, K.C.C.: Learning representation of molecules in association network for predicting intermolecular associations. IEEE/ACM Trans. Comput. Biol. Bioinf. 1 (2020)
Yi, H.-C., You, Z.-H., Huang, D.-S., Li, X., Jiang, T.-H., Li, L.-P.: A Deep learning framework for robust and accurate prediction of ncRNA-protein interactions using evolutionary information. Mol. Ther. Nucleic Acids 11, 337–344 (2018)
Yi, H.-C., You, Z.-H., Wang, M.-N., Guo, Z.-H., Wang, Y.-B., Zhou, J.-R.: RPI-SE: a stacking ensemble learning framework for ncRNA-protein interactions prediction using sequence information. BMC Bioinf. 21(1), 60 (2020)
Wang, L., You, Z.-H., Huang, D.-S., Zhou, F.: Combining high speed ELM learning with a deep convolutional neural network feature encoding for predicting protein-RNA interactions. IEEE/ACM Trans. Comput. Biol. Bioinf. (2018)
Yi, H.-C., You, Z.-H., Guo, Z.-H.: Construction and analysis of molecular association network by combining behavior representation and node attributes. Front. Genet. 10, 1106 (2019)
Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 1–27 (2011)
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting, pp. 23–37
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Shen, J., et al.: Predicting protein–protein interactions based only on sequences information. Proc. Natl. Acad. Sci. 104(11), 4337–4341 (2007)
Guo, Z.-H., Yi, H.-C., You, Z.-H.: Construction and comprehensive analysis of a molecular association network via lncRNA–miRNA–disease–drug–protein graph. Cells 8(8), 866 (2019)
Lei, H., et al.: Protein–protein interactions prediction via multimodal deep polynomial network and regularized extreme learning machine. IEEE J. Biomed. Health Inf. 23(3), 1290–1303 (2019)
Acknowledgement
HCY, ZHY and XRS designed, conceived the algorithm, carried out analyses, prepared the data sets, carried out experiments, and wrote the manuscript; DSH, ZHG performed and analyzed experiments and wrote the manuscript; All authors read and approved the final manuscript.
Funding
This work is supported by the National Outstanding Youth Science Foundation of NSFC, under grant 61722212, and the National Natural Science Foundation of China, under grants 61873212, 61861146002, and 61732012.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Ethics declarations
The authors declare that they have no conflict of interest.
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Yi, HC., You, ZH., Su, XR., Huang, DS., Guo, ZH. (2020). A Unified Deep Biological Sequence Representation Learning with Pretrained Encoder-Decoder Model. In: Huang, DS., Jo, KH. (eds) Intelligent Computing Theories and Application. ICIC 2020. Lecture Notes in Computer Science(), vol 12464. Springer, Cham. https://doi.org/10.1007/978-3-030-60802-6_30
Download citation
DOI: https://doi.org/10.1007/978-3-030-60802-6_30
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60801-9
Online ISBN: 978-3-030-60802-6
eBook Packages: Computer ScienceComputer Science (R0)