Abstract
Lung cancer is the leading cause of cancer death among both men and women, which mainly results from low effectiveness of the screening programs and late occurrence of symptoms, that are usually associated with advanced disease stages. Lung cancer shows high heterogeneity which was many times associated with its molecular background, providing the possibility to utilize machine learning approaches to aid both the diagnosis as well as the development of personalized treatments.
In this work we utilize multiple -omics datasets in order to assess their usefulness for predicting 2 year survival of lung adenocarcinoma using clinical data of 267 patients. By utilizing mRNA and microRNA expression levels, positions of somatic mutations, changes in the DNA copy number and DNA methylation levels we developed multiple single and multiple omics-based classifiers. We also tested various data aggregation and feature selection techniques, showing their influence on the classification accuracy manifested by the area under ROC curve (AUC).
The results of our study show not only that molecular data can be effectively used to predict 2 year survival in lung adenocarcinoma (AUC = 0.85), but also that information on gene expression changes, methylation and mutations provides much better predictors than copy number changes and data from microRNA studies. We were also able to show the classification performance obtained using different dimensionality reduction methods on the most problematic copy number variation dataset, concluding that gene and gene set aggregation provides the best classification results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Gridelli, C., et al.: Non-small-cell lung cancer. Nat. Rev. Dis. Primers. 1, 15009 (2015)
O’Brien, T.D., Jia, P., Aldrich, M.C., Zhao, Z.: Lung Cancer: One Disease or Many. Hum. Hered. 83, 65–70 (2018)
Yang, Y., Wang, M., Liu, B.: Exploring and comparing of the gene expression and methylation differences between lung adenocarcinoma and squamous cell carcinoma. J. Cell. Physiol. 234, 4454–4459 (2019)
Relli, V., Trerotola, M., Guerra, E., Alberti, S.: Distinct lung cancer subtypes associate to distinct drivers of tumor progression. Oncotarget 9, 35528–35540 (2018)
Borczuk, A.C., Toonkel, R.L., Powell, C.A.: Genomics of lung cancer. Proc. Am. Thorac. Soc. 6, 152–158 (2009)
Xiong, Y., Feng, Y., Qiao, T., Han, Y.: Identifying prognostic biomarkers of non-small cell lung cancer by transcriptome analysis. Cancer biomarkers : section A of Disease markers 27, 243–250 (2020)
Cheung, C.H.Y., Juan, H.F.: Quantitative proteomics in lung cancer. J. Biomed. Sci. 24, 37 (2017)
Qi, S.A., et al.: High-resolution metabolomic biomarkers for lung cancer diagnosis and prognosis. Sci. Rep. 11, 11805 (2021)
Cancer Genome Atlas Research Network, T.: Comprehensive molecular profiling of lung adenocarcinoma. Nature 511, 543-550 (2014)
Cancer Genome Atlas Research Network: Comprehensive genomic characterization of squamous cell lung cancers. Nature 489, 519-525 (2012)
Simes, R.J.: Treatment selection for cancer patients: application of statistical decision theory to the treatment of advanced ovarian cancer. J. Chronic Dis. 38, 171–186 (1985)
Astion, M.L., Wilding, P.: Application of neural networks to the interpretation of laboratory data in cancer diagnosis. Clin. Chem. 38, 34–38 (1992)
Bryce, T.J., Dewhirst, M.W., Floyd, C.E., Jr., Hars, V., Brizel, D.M.: Artificial neural network model of survival in patients treated with irradiation with and without concurrent chemotherapy for advanced carcinoma of the head and neck. Int. J. Radiat. Oncol. Biol. Phys. 41, 339–345 (1998)
Cruz, J.A., Wishart, D.S.: Applications of machine learning in cancer prediction and prognosis. Cancer informatics 2, 59–77 (2007)
Nguyen, T.M., et al.: Deep learning for human disease detection, subtype classification, and treatment response prediction using epigenomic data. Biomedicines 9 (2021)
Huang, Z., et al.: Deep learning-based cancer survival prognosis from RNA-seq data: approaches and evaluations. BMC Med. Genomics 13, 41 (2020)
Wang, Y., Lin, X., Sun, D.: A narrative review of prognosis prediction models for non-small cell lung cancer: what kind of predictors should be selected and how to improve models? Annals of translational medicine 9, 1597 (2021)
Schulz, S., et al.: Multimodal deep learning for prognosis prediction in renal cancer. Front. Oncol. 11, 788740 (2021)
Zhu, W., Xie, L., Han, J., Guo, X.: The application of deep learning in cancer prognosis prediction. Cancers 12, (2020)
Ten Haaf, K., et al.: Risk prediction models for selection of lung cancer screening candidates: A retrospective validation study. PLoS Med. 14, e1002277 (2017)
Ten Haaf, K., van der Aalst, C.M., de Koning, H.J., Kaaks, R., Tammemagi, M.C.: Personalising lung cancer screening: An overview of risk-stratification opportunities and challenges. Int J Cancer 149, 250–263 (2021)
Yeo, Y., et al.: Individual 5-year lung cancer risk prediction model in korea using a nationwide representative database. Cancers 13 (2021)
Tufail, A.B., et al.: Deep learning in cancer diagnosis and prognosis prediction: a minireview on challenges, recent trends, and future directions. Comput. Math. Methods Med. 2021, 9025470 (2021)
Gao, Y., Zhou, R., Lyu, Q.: Multiomics and machine learning in lung cancer prognosis. J. Thorac. Dis. 12, 4531–4535 (2020)
Laios, A., et al.: Feature selection is critical for 2-year prognosis in advanced stage high grade serous ovarian cancer by using machine learning. Cancer control: journal of the Moffitt Cancer Center 28, 10732748211044678 (2021)
Love, M.I., Huber, W., Anders, S.: Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014)
Subramanian, A., et al.: Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. U.S.A. 102, 15545–15550 (2005)
Francisco, C.-N.: Beta regression in R. Journal of Statistical Software 1–24 (2010)
Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33, 1–22 (2010)
Kursa, M., Rudnicki, W.: Feature selection with the boruta package. J. Stat. Softw. 36, 1–13 (2010)
Kuhn, M.: Building predictive models in R using the caret package. J. Stat. Softw. 28, 1–26 (2008)
Malik, V., Dutta, S., Kalakoti, Y., Sundar, D.: Multi-omics integration based predictive model for survival prediction of lung adenocarcinaoma. 2019 Grace Hopper Celebration India (GHCI) 1–5 (2019)
Jayasurya, K., et al.: Comparison of Bayesian network and support vector machine models for two-year survival prediction in lung cancer patients treated with radiotherapy. Med. Phys. 37, 1401–1407 (2010)
Sun, T., et al.: Comparative evaluation of support vector machines for computer aided diagnosis of lung cancer in CT based on a multi-dimensional data set. Comput. Methods Programs Biomed. 111, 519–524 (2013)
Hyun, S.H., Ahn, M.S., Koh, Y.W., Lee, S.J.: A machine-learning approach using PET-based radiomics to predict the histological subtypes of lung cancer. Clin. Nucl. Med. 44, 956–960 (2019)
Wang, D.D., Zhou, W., Yan, H., Wong, M., Lee, V.: Personalized prediction of EGFR mutation-induced drug resistance in lung cancer. Sci. Rep. 3, 2855 (2013)
Emaminejad, N., et al.: Fusion of quantitative image and genomic biomarkers to improve prognosis assessment of early stage lung cancer patients. I.E.E.E. Trans. Biomed. Eng. 63, 1034–1043 (2016)
Acknowledgements
This work was supported by Polish National Science Centre, grant number: UMO-2020/37/B/ST6/01959 and Silesian University of Technology statutory research funds. Calculations were performed on the Ziemowit computer cluster in the Laboratory of Bioinformatics and Computational Biology created in the EU Innovative Economy Programme POIG.02.01.00–00-166/08 and expanded in the POIG.02.03.01–00-040/13 project.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Jaksik, R., Śmieja, J. (2022). Prediction of Lung Cancer Survival Based on Multiomic Data. In: Nguyen, N.T., Tran, T.K., Tukayev, U., Hong, TP., Trawiński, B., Szczerbicki, E. (eds) Intelligent Information and Database Systems. ACIIDS 2022. Lecture Notes in Computer Science(), vol 13758. Springer, Cham. https://doi.org/10.1007/978-3-031-21967-2_10
Download citation
DOI: https://doi.org/10.1007/978-3-031-21967-2_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-21966-5
Online ISBN: 978-3-031-21967-2
eBook Packages: Computer ScienceComputer Science (R0)