Cell-Level Pathway Scoring Comparison with a Biologically Constrained Variational Autoencoder | SpringerLink
Skip to main content

Cell-Level Pathway Scoring Comparison with a Biologically Constrained Variational Autoencoder

  • Conference paper
  • First Online:
Computational Methods in Systems Biology (CMSB 2023)

Abstract

Unsupervised techniques are ubiquitous to study and understand the complex patterns that arise when analyzing genomic data at single-cell resolution. Particularly, unsupervised deep learning models provide state-of-the-art solutions for the most common tasks that arise when dealing with scRNA-seq data. However, the biological usefulness of these complex models is burdened by their black-box nature. To address such limitations several lines of research have emerged, from post hoc approximations to ante hoc modeling. In this work, we study the behavior of two biologically-constrained variational autoencoders (ante hoc modeling). On the one hand, we use a one-layer architecture where the constraints come from the signaling pathways, and, on the other hand, we propose a two-layer architecture following the recent trends in mechanistic models of signal transduction. We use the representations learned by the model as proxies of the signaling activity at the single-cell level. We check the performance of the scoring model using a known scRNA-seq public dataset with a clearly established ground truth. Although both models capture the relevant signals, the most pronounced differences are better captured by the one-layer architecture, while the two-layer design is able to learn more fine-grained features that can expose less prominent aspects of the data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 6634
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 8293
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous distributed systems, March 2016. https://doi.org/10.48550/arXiv.1603.04467

  2. Aibar, S., et al.: SCENIC: single-cell regulatory network inference and clustering. Nat. Methods 14(11), 1083–1086 (2017). https://doi.org/10.1038/nmeth.4463

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Badia-i-Mompel, P., et al.: decoupleR: ensemble of computational methods to infer biological activities from omics data. Bioinf. Adv. 2(1), vbac016 (2022). https://doi.org/10.1093/bioadv/vbac016

  4. Benjamini, Y., Hochberg, Y.: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. Roy. Stat. Soc. Ser. B (Methodological) 57(1), 289–300 (1995). https://doi.org/10.1111/j.2517-6161.1995.tb02031.x

  5. Çubuk, C., Loucera, C., Peña-Chilet, M., Dopazo, J.: Crosstalk between metabolite production and signaling activity in breast cancer. Int. J. Mol. Sci. 24(8), 7450 (2023). https://doi.org/10.3390/ijms24087450

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Dash, T., Chitlangia, S., Ahuja, A., Srinivasan, A.: A review of some techniques for inclusion of domain-knowledge into deep neural networks. Sci. Rep. 12(1), 1040 (2022). https://doi.org/10.1038/s41598-021-04590-0

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Gillespie, M., et al.: The Reactome pathway knowledgebase 2022. Nucleic Acids Res. 50(D1), D687–D692 (2022). https://doi.org/10.1093/nar/gkab1028

    Article  CAS  PubMed  Google Scholar 

  8. Graziani, M., et al.: A global taxonomy of interpretable AI: unifying the terminology for the technical and social sciences. Artif. Intell. Rev. 56(4), 3473–3504 (2023). https://doi.org/10.1007/s10462-022-10256-8

    Article  PubMed  Google Scholar 

  9. Gundogdu, P., Alamo, I., Nepomuceno-Chamorro, I.A., Dopazo, J., Loucera, C.: SigPrimedNet: a signaling-informed neural network for scRNA-seq annotation of known and unknown cell types. Biology 12(4), 579 (2023). https://doi.org/10.3390/biology12040579

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Gundogdu, P., Loucera, C., Alamo-Alvarez, I., Dopazo, J., Nepomuceno, I.: Integrating pathway knowledge with deep neural networks to reduce the dimensionality in single-cell RNA-seq data. BioData Mining 15(1), 1 (2022). https://doi.org/10.1186/s13040-021-00285-4

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Harris, C.R., et al.: Array programming with NumPy. Nature 585(7825), 357–362 (2020). https://doi.org/10.1038/s41586-020-2649-2

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Heumos, L., et al.: Best practices for single-cell analysis across modalities. Nat. Rev. Genet. (2023). https://doi.org/10.1038/s41576-023-00586-w

  13. Hidalgo, M.R., Cubuk, C., Amadoz, A., Salavert, F., Carbonell-Caballero, J., Dopazo, J.: High throughput estimation of functional cell activities reveals disease mechanisms and predicts relevant clinical outcomes. Oncotarget 8(3), 5160–5178 (2016). https://doi.org/10.18632/oncotarget.14107

  14. Kang, H.M., et al.: Multiplexed droplet single-cell RNA-sequencing using natural genetic variation. Nat. Biotechnol. 36(1), 89–94 (2018). https://doi.org/10.1038/nbt.4042

    Article  CAS  PubMed  Google Scholar 

  15. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization, January 2017. https://doi.org/10.48550/arXiv.1412.6980

  16. Kuenzi, B.M., et al.: Predicting drug response and synergy using a deep learning model of human cancer cells. Cancer Cell 38(5), 672-684.e6 (2020). https://doi.org/10.1016/j.ccell.2020.09.014

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Lähnemann, D., et al.: Eleven grand challenges in single-cell data science. Genome Biol. 21(1), 31 (2020). https://doi.org/10.1186/s13059-020-1926-6

    Article  PubMed  PubMed Central  Google Scholar 

  18. Levine, J.H., et al.: Data-driven phenotypic dissection of AML reveals progenitor-like cells that correlate with prognosis. Cell 162(1), 184–197 (2015). https://doi.org/10.1016/j.cell.2015.05.047

  19. Li, C., et al.: SciBet as a portable and fast single cell type identifier. Nat. Commun. 11(1), 1818 (2020). https://doi.org/10.1038/s41467-020-15523-2. https://www.nature.com/articles/s41467-020-15523-2, bandiera_abtest: a Cc_license_type: cc_by Cg_type: Nature Research Journals Number: 1 Primary_atype: Research Publisher: Nature Publishing Group Subject_term: Machine learning;Transcriptomics Subject_term_id: machine-learning;transcriptomics

  20. Lotfollahi, M., et al.: Biologically informed deep learning to query gene programs in single-cell atlases. Nat. Cell Biol. 25(2), 337–350 (2023). https://doi.org/10.1038/s41556-022-01072-x

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Ma, J., et al.: Using deep learning to model the hierarchical structure and function of a cell. Nat. Methods 15(4), 290–298 (2018). https://doi.org/10.1038/nmeth.4627

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. McInnes, L., Healy, J., Melville, J.: UMAP: uniform manifold approximation and projection for dimension reduction, September 2020. https://doi.org/10.48550/arXiv.1802.03426

  23. Ogata, H., Goto, S., Sato, K., Fujibuchi, W., Bono, H., Kanehisa, M.: KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 27(1), 29–34 (1999). https://doi.org/10.1093/nar/27.1.29

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Petegrosso, R., Li, Z., Kuang, R.: Machine learning and statistical methods for clustering single-cell RNA-sequencing data. Brief. Bioinform. 21(4), 1209–1223 (2020). https://doi.org/10.1093/bib/bbz063

    Article  CAS  PubMed  Google Scholar 

  25. Regev, A., et al.: Human cell atlas meeting participants: the human cell atlas. eLife 6, e27041 (2017). https://doi.org/10.7554/eLife.27041

  26. Traag, V., Waltman, L., van Eck, N.J.: From Louvain to Leiden: guaranteeing well-connected communities. Sci. Rep. 9(1), 5233 (2019). https://doi.org/10.1038/s41598-019-41695-z

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Virshup, I., et al.: The scverse project provides a computational ecosystem for single-cell omics data analysis. Nat. Biotechnol., 1–3 (2023). https://doi.org/10.1038/s41587-023-01733-8

  28. Virshup, I., Rybakov, S., Theis, F.J., Angerer, P., Wolf, F.A.: Anndata: annotated data, December 2021. https://doi.org/10.1101/2021.12.16.473007

  29. Virtanen, P., et al.: SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat. Methods 17(3), 261–272 (2020). https://doi.org/10.1038/s41592-019-0686-2

  30. Wang, J., Zou, Q., Lin, C.: A comparison of deep learning-based pre-processing and clustering approaches for single-cell RNA sequencing data. Briefings Bioinf. 23(1), bbab345 (2022). https://doi.org/10.1093/bib/bbab345

  31. Way, G.P., Greene, C.S.: Discovering pathway and cell type signatures in transcriptomic compendia with machine learning. Ann. Rev. Biomed. Data Sci. 2(1), 1–17 (2019). https://doi.org/10.1146/annurev-biodatasci-072018-021348

    Article  Google Scholar 

  32. Wolf, F.A., Angerer, P., Theis, F.J.: SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 19(1), 15 (2018). https://doi.org/10.1186/s13059-017-1382-0

    Article  PubMed  PubMed Central  Google Scholar 

  33. Zappia, L., Theis, F.J.: Over 1000 tools reveal trends in the single-cell RNA-seq analysis landscape. Genome Biol. 22(1), 301 (2021). https://doi.org/10.1186/s13059-021-02519-4

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Zhao, Y., Shao, J., Asmann, Y.W.: Assessment and optimization of explainable machine learning models applied to transcriptomic data. Genomics Proteomics Bioinf. 20(5), 899–911 (2022). https://doi.org/10.1016/j.gpb.2022.07.003

    Article  Google Scholar 

Download references

Acknowledgements

This work has been partially supported by grants PID2020-117979RB-I00 and PID2020-117954RB-C22 from the Spanish Ministry of Science and Innovation, IMP/00019 from the Instituto de Salud Carlos III (ISCIII), PIP-0087-2021 from Junta de Andalucía, co-funded with European Regional Development Funds (ERDF); grant H2020 Programme of the European Union grants Marie Curie Innovative Training Network “Machine Learning Frontiers in Precision Medicine” (MLFPM) (GA 813533). The authors also acknowledge Junta de Andalucía for the postdoctoral contract of Carlos Loucera (PAIDI2020-DOC_00350) co-funded by the European Social Fund (FSE) 2014-2020.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Isabel A. Nepomuceno-Chamorro , Joaquin Dopazo or Carlos Loucera .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Gundogdu, P., Payá-Milans, M., Alamo-Alvarez, I., Nepomuceno-Chamorro, I.A., Dopazo, J., Loucera, C. (2023). Cell-Level Pathway Scoring Comparison with a Biologically Constrained Variational Autoencoder. In: Pang, J., Niehren, J. (eds) Computational Methods in Systems Biology. CMSB 2023. Lecture Notes in Computer Science(), vol 14137. Springer, Cham. https://doi.org/10.1007/978-3-031-42697-1_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-42697-1_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-42696-4

  • Online ISBN: 978-3-031-42697-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics