Abstract
Cancer is a complex disease with significant social and economic impact. Advancements in high-throughput molecular assays and the reduced cost for performing high-quality multi-omic measurements have fuelled insights through machine learning. Previous studies have shown promise on using multiple omic layers to predict survival and stratify cancer patients. In this paper, we develop and report a Supervised Autoencoder (SAE) model for survival-based multi-omic integration, which improves upon previous work, as well as a Concrete Supervised Autoencoder model (CSAE) which uses feature selection to jointly reconstruct the input features as well as to predict survival. Our results show that our models either outperform or are on par with some of the most commonly used baselines, while either providing a better survival separation (SAE) or being more interpretable (CSAE). Feature selection stability analysis on our models shows a power-law relationship with features commonly associated with survival. The code for this project is available at: https://github.com/phcavelar/coxae.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Asada, K., et al.: Uncovering prognosis-related genes and pathways by multi-omics analysis in lung cancer. Biomolecules 10(4), 524 (2020)
Balın, M.F., Abid, A., Zou, J.: Concrete autoencoders: differentiable feature selection and reconstruction. In: International Conference on Machine Learning, pp. 444–453. PMLR (2019)
Bingham, E., Mannila, H.: Random projection in dimensionality reduction: applications to image and text data. In: Proceedings of the 7th ACM SIGKDD, KDD 2001, pp. 245–250. Association for Computing Machinery, New York (2001). https://doi.org/10.1145/502512.502546
Bode, A.M., Dong, Z.: Precision oncology-the future of personalized cancer medicine? NPJ Precis. Oncol. 1(1), 1–2 (2017). https://doi.org/10.1038/s41698-017-0010-5
Cantini, L., et al.: Benchmarking joint multi-omics dimensionality reduction approaches for the study of cancer. Nat. Commun. 12(1), 1–12 (2021)
Chaudhary, K., Poirion, O.B., Lu, L., Garmire, L.X.: Deep learning-based multi-omics integration robustly predicts survival in liver cancer. Clin. Can. Res. 24(6), 1248–1259 (2018)
Ching, T., Zhu, X., Garmire, L.X.: Cox-nnet: an artificial neural network method for prognosis prediction of high-throughput omics data. PLoS Comput. Biol. 14(4), e1006076 (2018)
Huang, Z., et al.: SALMON: survival analysis learning with multi-omics neural networks on breast cancer. Front. Genet. 10, 166 (2019)
Katzman, J.L., Shaham, U., Cloninger, A., Bates, J., Jiang, T., Kluger, Y.: DeepSurv: personalized treatment recommender system using a cox proportional hazards deep neural network. BMC Med. Res. Methodol. 18(1), 1–12 (2018)
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. In: Bengio, Y., LeCun, Y. (eds.) 2nd ICLR, Banff, AB, Canada, 14–16 April 2014, Conference Track Proceedings (2014)
Koch, C.M., et al.: A beginner’s guide to analysis of RNA sequencing data. Am. J. Respir. Cell Mol. Biol. 59(2), 145–157 (2018)
Korsunsky, I., et al.: Fast, sensitive and accurate integration of single-cell data with harmony. Nat. Methods 16(12), 1289–1296 (2019)
Lamb, J., et al.: The connectivity map: using gene-expression signatures to connect small molecules, genes, and disease. Science 313(5795), 1929–1935 (2006). https://doi.org/10.1126/science.1132939
Lee, T.Y., Huang, K.Y., Chuang, C.H., Lee, C.Y., Chang, T.H.: Incorporating deep learning and multi-omics autoencoding for analysis of lung adenocarcinoma prognostication. Comput. Biol. Chem. 87, 107277 (2020)
Maddison, C.J., Mnih, A., Teh, Y.W.: The concrete distribution: a continuous relaxation of discrete random variables. arXiv:1611.00712 [cs, stat] (2017)
Nicora, G., Vitali, F., Dagliati, A., Geifman, N., Bellazzi, R.: Integrated multi-omics analyses in oncology: a review of machine learning methods and tools. Front. Oncol. 10, 1030 (2020)
Poirion, O.B., Jing, Z., Chaudhary, K., Huang, S., Garmire, L.X.: DeepProg: an ensemble of deep-learning and machine-learning models for prognosis prediction using multi-omics data. Genome Med. 13(1), 1–15 (2021)
King’s College London e Research Team: King’s Computational Research, Engineering and Technology Environment (CREATE) (2022). https://doi.org/10.18742/RNVF-M076. https://docs.er.kcl.ac.uk/
Ronen, J., Hayat, S., Akalin, A.: Evaluation of colorectal cancer subtypes and cell lines using deep learning. Life Sci. Alliance 2(6) (2019)
Tong, L., Mitchel, J., Chatlin, K., Wang, M.D.: Deep learning based feature-level integration of multi-omics data for breast cancer patients survival analysis. BMC Med. Inform. Decis. Mak. 20(1), 225 (2020). https://doi.org/10.1186/s12911-020-01225-8
Uyar, B., Ronen, J., Franke, V., Gargiulo, G., Akalin, A.: Multi-omics and deep learning provide a multifaceted view of cancer. bioRxiv (2021)
Wissel, D., Rowson, D., Boeva, V.: Hierarchical autoencoder-based integration improves performance in multi-omics cancer survival models through soft modality selection. Technical report, bioRxiv (2022). https://doi.org/10.1101/2021.09.16.460589. Section: New Results Type: article
Zhang, L., et al.: Deep learning-based multi-omics data integration reveals two prognostic subtypes in high-risk neuroblastoma. Front. Genet. 9, 477 (2018)
Acknowledgements
We would like to thank Dr Jonathan Cardoso-Silva for fruitful conversations, and João Nuno Beleza Oliveira Vidal Lourenço for designing the diagrams. P.H.C.A. acknowledges that during his stay at KCL and A*STAR he’s partly funded by King’s College London and the A*STAR Research Attachment Programme (ARAP). The research was also supported by the National Institute for Health Research Biomedical Research Centre based at Guy’s and St Thomas’ NHS Foundation Trust and King’s College London (IS-BRC-1215-20006). The authors are solely responsible for study design, data collection, analysis, decision to publish, and preparation of the manuscript. The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health. This work used King’s CREATE compute cluster for its experiments [18]. The results shown here are in whole or part based upon data generated by the TCGA Research Network: https://www.cancer.gov/tcga.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
da Costa Avelar, P.H., Laddach, R., Karagiannis, S.N., Wu, M., Tsoka, S. (2023). Multi-omic Data Integration and Feature Selection for Survival-Based Patient Stratification via Supervised Concrete Autoencoders. In: Nicosia, G., et al. Machine Learning, Optimization, and Data Science. LOD 2022. Lecture Notes in Computer Science, vol 13811. Springer, Cham. https://doi.org/10.1007/978-3-031-25891-6_5
Download citation
DOI: https://doi.org/10.1007/978-3-031-25891-6_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-25890-9
Online ISBN: 978-3-031-25891-6
eBook Packages: Computer ScienceComputer Science (R0)