Exploring Machine Learning Algorithms and Numerical Representations Strategies to Develop Sequence-Based Predictive Models for Protein Networks | SpringerLink
Skip to main content

Exploring Machine Learning Algorithms and Numerical Representations Strategies to Develop Sequence-Based Predictive Models for Protein Networks

  • Conference paper
  • First Online:
Computational Science and Its Applications – ICCSA 2023 (ICCSA 2023)

Abstract

Predicting the affinity between two proteins is one of the most relevant challenges in bioinformatics and one of the most useful for biotechnological and pharmaceutical applications. Current prediction methods use the structural information of the interaction complexes. However, predicting the structure of proteins requires enormous computational costs. Machine learning methods emerge as an alternative to this bioinformatics challenge. There are predictive methods for protein affinity based on structural information. However, for linear information, there are no development guidelines for elaborating predictive models, being necessary to explore several alternatives for processing and developing predictive models. This work explores different options for building predictive protein interaction models via deep learning architectures and classical machine learning algorithms, evaluating numerical representation methods and transformation techniques to represent structural complexes using linear information. Six types of predictive tasks related to the affinity and mutational variant evaluations and their effect on the interaction complex were explored. We show that classical machine learning and convolutional network-based methods perform better than graph convolutional network methods for studying mutational variants. In contrast, graph-based methods perform better on affinity problems or association constants, using only the linear information of the protein sequences. Finally, we show an illustrative use case, expose how to use the developed models, discuss the limitations of the explored methods and comment on future development strategies for improving the studied processes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 12583
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 15729
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bouvier, B.: Protein-protein interface topology as a predictor of secondary structure and molecular function using convolutional deep learning. J. Chem. Inf. Model. 61(7), 3292–3303 (2021)

    Article  Google Scholar 

  2. Bunkute, E., Cummins, C., Crofts, F.J., Bunce, G., Nabney, I.T., Flower, D.R.: PIP-DB: the protein isoelectric point database. Bioinformatics 31(2), 295–296 (2015)

    Article  Google Scholar 

  3. Cadet, F., et al.: A machine learning approach for reliable prediction of amino acid interactions and its application in the directed evolution of enantioselective enzymes. Sci. Rep. 8(1), 16757 (2018)

    Google Scholar 

  4. Dallago, C., et al.: Learned embeddings from deep learning to visualize and predict protein sets. Current Protoc. 1(5), e113 (2021)

    Article  Google Scholar 

  5. Das, S., Chakrabarti, S.: Classification and prediction of protein-protein interaction interface using machine learning algorithm. Sci. Rep. 11(1), 1–12 (2021)

    Article  Google Scholar 

  6. Gaillard, T.: Evaluation of AutoDock and AutoDock vina on the CASF-2013 benchmark. J. Chem. Inf. Model. 58(8), 1697–1706 (2018)

    Article  Google Scholar 

  7. Gapsys, V., et al.: Large scale relative protein ligand binding affinities using non-equilibrium alchemy. Chem. Sci. 11(4), 1140–1152 (2020)

    Article  Google Scholar 

  8. Gil, C., Martinez, A.: Is drug repurposing really the future of drug discovery or is new innovation truly the way forward? Expert Opin. Drug Discov. 16(8), 829–831 (2021)

    Article  Google Scholar 

  9. Gupta, P., Mohanty, D.: SMMPPI: a machine learning-based approach for prediction of modulators of protein-protein interactions and its application for identification of novel inhibitors for RBD: hACE2 interactions in SARS-CoV-2. Briefings Bioinf. 22(5), bbab111 (2021)

    Google Scholar 

  10. Huang, L., et al.: LGFC-CNN: prediction of lncRNA-protein interactions by using multiple types of features through deep learning. Genes 12(11), 1689 (2021)

    Article  Google Scholar 

  11. Jankauskaitė, J., Jiménez-García, B., Dapkūnas, J., Fernández-Recio, J., Moal, I.H.: SKEMPI 2.0: an updated benchmark of changes in protein-protein binding energy, kinetics and thermodynamics upon mutation. Bioinformatics 35(3), 462–469 (2019)

    Google Scholar 

  12. Jemimah, S., Yugandhar, K., Michael Gromiha, M.: Proximate: a database of mutant protein-protein complex thermodynamics and kinetics. Bioinformatics 33(17), 2787–2788 (2017)

    Article  Google Scholar 

  13. Kairys, V., Baranauskiene, L., Kazlauskiene, M., Matulis, D., Kazlauskas, E.: Binding affinity in drug design: experimental and computational techniques. Expert Opin. Drug Discov. 14(8), 755–768 (2019)

    Article  Google Scholar 

  14. Kerrien, S., et al.: The intact molecular interaction database in 2012. Nucleic Acids Res. 40(D1), D841–D846 (2012)

    Article  Google Scholar 

  15. Liu, J., Gong, X.: Attention mechanism enhanced LSTM with residual architecture and its application for protein-protein interaction residue pairs prediction. BMC Bioinf. 20, 1–11 (2019)

    Article  Google Scholar 

  16. Luo, Y., et al.: ECNet is an evolutionary context-integrated deep learning framework for protein engineering. Nat. Commun. 12(1), 1–14 (2021)

    Article  Google Scholar 

  17. Medina-Ortiz, D.: Generalized property-based encoders and digital signal processing facilitate predictive tasks in protein engineering. Frontiers Mol. Biosci. 9 (2022)

    Google Scholar 

  18. Medina-Ortiz, D., Contreras, S., Quiroz, C., Asenjo, J.A., Olivera-Nappa, Á.: Dmakit: a user-friendly web platform for bringing state-of-the-art data analysis techniques to non-specific users. Inf. Syst. 93, 101557 (2020)

    Article  Google Scholar 

  19. Medina-Ortiz, D., Contreras, S., Quiroz, C., Olivera-Nappa, Á.: Development of supervised learning predictive models for highly non-linear biological, biomedical, and general datasets. Front. Mol. Biosci. 7, 13 (2020)

    Article  Google Scholar 

  20. Mewara, B., Lalwani, S.: Sequence-based prediction of protein-protein interaction using auto-feature engineering of RNN-based model. Res. Biomed. Eng., 1–14 (2023)

    Google Scholar 

  21. Parvathaneni, V., Kulkarni, N.S., Muth, A., Gupta, V.: Drug repurposing: a promising tool to accelerate the drug discovery process. Drug Discov. Today 24(10), 2076–2085 (2019)

    Article  Google Scholar 

  22. Rao, R., et al.: Evaluating protein transfer learning with tape. In: Advances in Neural Information Processing Systems, vol. 32 (2019)

    Google Scholar 

  23. Sable, R., Jois, S.: Surfing the protein-protein interaction surface using docking methods: application to the design of PPI inhibitors. Molecules 20(6), 11569–11603 (2015)

    Article  Google Scholar 

  24. Siedhoff, N.E., Illig, A.M., Schwaneberg, U., Davari, M.D.: Pypef-an integrated framework for data-driven protein engineering. J. Chem. Inf. Model. 61(7), 3463–3476 (2021)

    Article  Google Scholar 

  25. Szklarczyk, D., et al.: The string database in 2021: customizable protein-protein networks, and functional characterization of user-uploaded gene/measurement sets. Nucleic Acids Res. 49(D1), D605–D612 (2021)

    Article  Google Scholar 

  26. Tsukiyama, S., Hasan, M.M., Fujii, S., Kurata, H.: LSTM-PHV: prediction of human-virus protein-protein interactions by LSTM with Word2Vec. Briefings Bioinf. 22(6), bbab228 (2021)

    Google Scholar 

  27. Wang, R., Fang, X., Lu, Y., Yang, C.Y., Wang, S.: The PDBbind database: methodologies and updates. J. Med. Chem. 48(12), 4111–4119 (2005)

    Article  Google Scholar 

  28. Yang, F., Fan, K., Song, D., Lin, H.: Graph-based prediction of protein-protein interactions with attributed signed graph embedding. BMC Bioinf. 21(1), 1–16 (2020)

    Article  Google Scholar 

  29. Yu, J., Vavrusa, M., Andreani, J., Rey, J., Tufféry, P., Guerois, R.: InterEvDock: a docking server to predict the structure of protein-protein interactions using evolutionary information. Nucleic Acids Res. 44(W1), W542–W549 (2016)

    Article  Google Scholar 

  30. Yun, S., Lee, S., Park, J.P., Choo, J., Lee, E.: Modification of phage display technique for improved screening of high-affinity binding peptides. J. Biotechnol. 289, 88–92 (2019)

    Article  Google Scholar 

  31. Zeng, M., Zhang, F., Wu, F.X., Li, Y., Wang, J., Li, M.: Protein-protein interaction site prediction through combining local and global features with deep neural networks. Bioinformatics 36(4), 1114–1120 (2020)

    Article  Google Scholar 

  32. Zhang, H., et al.: Deep residual convolutional neural network for protein-protein interaction extraction. IEEE Access 7, 89354–89365 (2019)

    Article  Google Scholar 

  33. Zhao, L., Wang, J., Hu, Y., Cheng, L.: Conjoint feature representation of go and protein sequence for PPI prediction based on an inception RNN attention network. Mol. Ther. Nucleic Acids 22, 198–208 (2020)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to David Medina-Ortiz .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Medina-Ortiz, D., Salinas, P., Cabas-Moras, G., Durán-Verdugo, F., Olivera-Nappa, Á., Uribe-Paredes, R. (2023). Exploring Machine Learning Algorithms and Numerical Representations Strategies to Develop Sequence-Based Predictive Models for Protein Networks. In: Gervasi, O., et al. Computational Science and Its Applications – ICCSA 2023. ICCSA 2023. Lecture Notes in Computer Science, vol 13956 . Springer, Cham. https://doi.org/10.1007/978-3-031-36805-9_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-36805-9_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-36804-2

  • Online ISBN: 978-3-031-36805-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics