Abstract
Predicting the affinity between two proteins is one of the most relevant challenges in bioinformatics and one of the most useful for biotechnological and pharmaceutical applications. Current prediction methods use the structural information of the interaction complexes. However, predicting the structure of proteins requires enormous computational costs. Machine learning methods emerge as an alternative to this bioinformatics challenge. There are predictive methods for protein affinity based on structural information. However, for linear information, there are no development guidelines for elaborating predictive models, being necessary to explore several alternatives for processing and developing predictive models. This work explores different options for building predictive protein interaction models via deep learning architectures and classical machine learning algorithms, evaluating numerical representation methods and transformation techniques to represent structural complexes using linear information. Six types of predictive tasks related to the affinity and mutational variant evaluations and their effect on the interaction complex were explored. We show that classical machine learning and convolutional network-based methods perform better than graph convolutional network methods for studying mutational variants. In contrast, graph-based methods perform better on affinity problems or association constants, using only the linear information of the protein sequences. Finally, we show an illustrative use case, expose how to use the developed models, discuss the limitations of the explored methods and comment on future development strategies for improving the studied processes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bouvier, B.: Protein-protein interface topology as a predictor of secondary structure and molecular function using convolutional deep learning. J. Chem. Inf. Model. 61(7), 3292–3303 (2021)
Bunkute, E., Cummins, C., Crofts, F.J., Bunce, G., Nabney, I.T., Flower, D.R.: PIP-DB: the protein isoelectric point database. Bioinformatics 31(2), 295–296 (2015)
Cadet, F., et al.: A machine learning approach for reliable prediction of amino acid interactions and its application in the directed evolution of enantioselective enzymes. Sci. Rep. 8(1), 16757 (2018)
Dallago, C., et al.: Learned embeddings from deep learning to visualize and predict protein sets. Current Protoc. 1(5), e113 (2021)
Das, S., Chakrabarti, S.: Classification and prediction of protein-protein interaction interface using machine learning algorithm. Sci. Rep. 11(1), 1–12 (2021)
Gaillard, T.: Evaluation of AutoDock and AutoDock vina on the CASF-2013 benchmark. J. Chem. Inf. Model. 58(8), 1697–1706 (2018)
Gapsys, V., et al.: Large scale relative protein ligand binding affinities using non-equilibrium alchemy. Chem. Sci. 11(4), 1140–1152 (2020)
Gil, C., Martinez, A.: Is drug repurposing really the future of drug discovery or is new innovation truly the way forward? Expert Opin. Drug Discov. 16(8), 829–831 (2021)
Gupta, P., Mohanty, D.: SMMPPI: a machine learning-based approach for prediction of modulators of protein-protein interactions and its application for identification of novel inhibitors for RBD: hACE2 interactions in SARS-CoV-2. Briefings Bioinf. 22(5), bbab111 (2021)
Huang, L., et al.: LGFC-CNN: prediction of lncRNA-protein interactions by using multiple types of features through deep learning. Genes 12(11), 1689 (2021)
Jankauskaitė, J., Jiménez-García, B., Dapkūnas, J., Fernández-Recio, J., Moal, I.H.: SKEMPI 2.0: an updated benchmark of changes in protein-protein binding energy, kinetics and thermodynamics upon mutation. Bioinformatics 35(3), 462–469 (2019)
Jemimah, S., Yugandhar, K., Michael Gromiha, M.: Proximate: a database of mutant protein-protein complex thermodynamics and kinetics. Bioinformatics 33(17), 2787–2788 (2017)
Kairys, V., Baranauskiene, L., Kazlauskiene, M., Matulis, D., Kazlauskas, E.: Binding affinity in drug design: experimental and computational techniques. Expert Opin. Drug Discov. 14(8), 755–768 (2019)
Kerrien, S., et al.: The intact molecular interaction database in 2012. Nucleic Acids Res. 40(D1), D841–D846 (2012)
Liu, J., Gong, X.: Attention mechanism enhanced LSTM with residual architecture and its application for protein-protein interaction residue pairs prediction. BMC Bioinf. 20, 1–11 (2019)
Luo, Y., et al.: ECNet is an evolutionary context-integrated deep learning framework for protein engineering. Nat. Commun. 12(1), 1–14 (2021)
Medina-Ortiz, D.: Generalized property-based encoders and digital signal processing facilitate predictive tasks in protein engineering. Frontiers Mol. Biosci. 9 (2022)
Medina-Ortiz, D., Contreras, S., Quiroz, C., Asenjo, J.A., Olivera-Nappa, Á.: Dmakit: a user-friendly web platform for bringing state-of-the-art data analysis techniques to non-specific users. Inf. Syst. 93, 101557 (2020)
Medina-Ortiz, D., Contreras, S., Quiroz, C., Olivera-Nappa, Á.: Development of supervised learning predictive models for highly non-linear biological, biomedical, and general datasets. Front. Mol. Biosci. 7, 13 (2020)
Mewara, B., Lalwani, S.: Sequence-based prediction of protein-protein interaction using auto-feature engineering of RNN-based model. Res. Biomed. Eng., 1–14 (2023)
Parvathaneni, V., Kulkarni, N.S., Muth, A., Gupta, V.: Drug repurposing: a promising tool to accelerate the drug discovery process. Drug Discov. Today 24(10), 2076–2085 (2019)
Rao, R., et al.: Evaluating protein transfer learning with tape. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Sable, R., Jois, S.: Surfing the protein-protein interaction surface using docking methods: application to the design of PPI inhibitors. Molecules 20(6), 11569–11603 (2015)
Siedhoff, N.E., Illig, A.M., Schwaneberg, U., Davari, M.D.: Pypef-an integrated framework for data-driven protein engineering. J. Chem. Inf. Model. 61(7), 3463–3476 (2021)
Szklarczyk, D., et al.: The string database in 2021: customizable protein-protein networks, and functional characterization of user-uploaded gene/measurement sets. Nucleic Acids Res. 49(D1), D605–D612 (2021)
Tsukiyama, S., Hasan, M.M., Fujii, S., Kurata, H.: LSTM-PHV: prediction of human-virus protein-protein interactions by LSTM with Word2Vec. Briefings Bioinf. 22(6), bbab228 (2021)
Wang, R., Fang, X., Lu, Y., Yang, C.Y., Wang, S.: The PDBbind database: methodologies and updates. J. Med. Chem. 48(12), 4111–4119 (2005)
Yang, F., Fan, K., Song, D., Lin, H.: Graph-based prediction of protein-protein interactions with attributed signed graph embedding. BMC Bioinf. 21(1), 1–16 (2020)
Yu, J., Vavrusa, M., Andreani, J., Rey, J., Tufféry, P., Guerois, R.: InterEvDock: a docking server to predict the structure of protein-protein interactions using evolutionary information. Nucleic Acids Res. 44(W1), W542–W549 (2016)
Yun, S., Lee, S., Park, J.P., Choo, J., Lee, E.: Modification of phage display technique for improved screening of high-affinity binding peptides. J. Biotechnol. 289, 88–92 (2019)
Zeng, M., Zhang, F., Wu, F.X., Li, Y., Wang, J., Li, M.: Protein-protein interaction site prediction through combining local and global features with deep neural networks. Bioinformatics 36(4), 1114–1120 (2020)
Zhang, H., et al.: Deep residual convolutional neural network for protein-protein interaction extraction. IEEE Access 7, 89354–89365 (2019)
Zhao, L., Wang, J., Hu, Y., Cheng, L.: Conjoint feature representation of go and protein sequence for PPI prediction based on an inception RNN attention network. Mol. Ther. Nucleic Acids 22, 198–208 (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Medina-Ortiz, D., Salinas, P., Cabas-Moras, G., Durán-Verdugo, F., Olivera-Nappa, Á., Uribe-Paredes, R. (2023). Exploring Machine Learning Algorithms and Numerical Representations Strategies to Develop Sequence-Based Predictive Models for Protein Networks. In: Gervasi, O., et al. Computational Science and Its Applications – ICCSA 2023. ICCSA 2023. Lecture Notes in Computer Science, vol 13956 . Springer, Cham. https://doi.org/10.1007/978-3-031-36805-9_16
Download citation
DOI: https://doi.org/10.1007/978-3-031-36805-9_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-36804-2
Online ISBN: 978-3-031-36805-9
eBook Packages: Computer ScienceComputer Science (R0)