Abstract
Lung cancer is one of the most frequent cancer types, and one among those causing more deceases worldwide. Nowadays, in order to improve the diagnosis of cancer more screenings are performed to the same patient and various biological sources are being gathered. Fusing the information provided by these sources can lead to a more robust diagnosis, which can improve the prognosis of the patient. In this work, a comparison of fusion methodologies (early and intermediate) using RNA-Seq and Copy Number Variation data for Non-Small-Cell Lung Cancer classification is performed. We found that great results can be attained using both fusion methodologies, with an AUC of 0.984 for the early fusion and 0.989 for the intermediate fusion, improving those obtained by each source of information independently (0.978 RNA-Seq and 0.910 Copy Number Variation). This work shows that fusion methodologies can enhance the classification of non-small-cell lung cancer, and that these methodologies can be promising for the diagnosis of other cancer types.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Castillo, D., et al.: Leukemia multiclass assessment and classification from microarray and rna-seq technologies integration at gene expression level. PloS One 14(2), e0212127 (2019)
Castillo, D., Gálvez, J.M., Herrera, L.J., San Román, B., Rojas, F., Rojas, I.: Integration of rna-seq data with heterogeneous microarray data for breast cancer profiling. BMC Bioinf. 18(1), 506 (2017)
Castillo-Secilla, D., et al.: Knowseq r-bioc package: the automatic smart gene expression tool for retrieving relevant biological knowledge. Comput. Biol. Med. 133, 104387 (2021)
Chen, T., Guestrin, C.: Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd ACM Sigkdd International Conference on Knowledge Discovery and Data Mining, pp. 785–794 (2016)
Ding, C., Peng, H.: Minimum redundancy feature selection from microarray gene expression data. J. Bioinf. Comput. Biol. 3(02), 185–205 (2005)
Dong, Y., et al.: Mlw-gcforest: a multi-weighted gcforest model towards the staging of lung adenocarcinoma based on multi-modal genetic data. BMC Bioinf. 20(1), 1–14 (2019)
Gálvez, J.M., et al.: Towards improving skin cancer diagnosis by integrating microarray and rna-seq datasets. IEEE J. Biomed. Health Inf. 24(7), 2119–2130 (2019)
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 249–256. JMLR Workshop and Conference Proceedings (2010)
González, S., Castillo, D., Galvez, J.M., Rojas, I., Herrera, L.J.: Feature selection and assessment of lung cancer sub-types by applying predictive models. In: International Work-Conference on Artificial Neural Networks, pp. 883–894. Springer (2019)
Grossman, R.L., et al.: Toward a shared vision for cancer genomic data. New England J. Med. 375(12), 1109–1112 (2016)
Hanna, N., et al.: Systemic therapy for stage iv non-small-cell lung cancer: american society of clinical oncology clinical practice guideline update. J. Clin. Oncol. (2017)
Huang, S.C., Pareek, A., Seyyedi, S., Banerjee, I., Lungren, M.P.: Fusion of medical imaging and electronic health records using deep learning: a systematic review and implementation guidelines. NPJ Digital Med. 3(1), 1–9 (2020)
Kenfield, S.A., Wei, E.K., Stampfer, M.J., Rosner, B.A., Colditz, G.A.: Comparison of aspects of smoking among the four histological types of lung cancer. Tobacco Control 17(3), 198–204 (2008)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Lawrence, M., et al.: Software for computing and annotating genomic ranges. PLoS Comput. Biol. 9(8), e1003118 (2013)
Lee, T.Y., Huang, K.Y., Chuang, C.H., Lee, C.Y., Chang, T.H.: Incorporating deep learning and multi-omics autoencoding for analysis of lung adenocarcinoma prognostication. Comput. Biol. Chem. 87, 107277 (2020)
Paszke, A., et al.: An imperative style, high-performance deep learning library. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’ Alché-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32, pp. 8024–8035. Curran Associates, Inc. (2019), http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Peng, H., Long, F., Ding, C.: Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Patt Anal. Mach. Intell. 27(8), 1226–1238 (2005)
Portal, G.: Gdc rna-seq analysis pipeline. https://docs.gdc.cancer.gov/Data/Bioinformatics_Pipelines/Expression_mRNA_Pipeline/. Accessed 4 Jul 2020
Qiu, Z.W., Bi, J.H., Gazdar, A.F., Song, K.: Genome-wide copy number variation pattern analysis and a classification signature for non-small cell lung cancer. Genes Chromosom. Cancer 56(7), 559–569 (2017)
Ritchie, M.E., et al.: Limma powers differential expression analyses for rna-sequencing and microarray studies. Nucleic Acids Res. 43(7), e47–e47 (2015)
Ross, D.T., et al.: Systematic variation in gene expression patterns in human cancer cell lines. Nat. Genet. 24(3), 227–235 (2000)
Shlien, A., Malkin, D.: Copy number variations and cancer. Genome Med. 1(6), 1–9 (2009)
Snoek, C.G., Worring, M., Smeulders, A.W.: Early versus late fusion in semantic video analysis. In: Proceedings of the 13th Annual ACM International Conference on Multimedia, pp. 399–402 (2005)
Subramanian, J., Govindan, R.: Lung cancer in never smokers: a review. J. Clin. Oncol. 25(5), 561–570 (2007)
Sung, H., et al.: Global cancer statistics 2020: globocan estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA: a cancer J. Clin. 71(3), pp. 209-249 (2021)
Heigener, D.F., Reck, M.: Der Internist 58(12), 1258–1263 (2017). https://doi.org/10.1007/s00108-017-0339-4
UK, C.R.: Types of lung cancer. https://www.cancerresearchuk.org/about-cancer/lung-cancer/stages-types-grades/types
Weinstein, J.N., et al.: The cancer genome atlas pan-cancer analysis project. Nat. Genet. 45(10), 1113 (2013)
Acknowledgements
The results published here are in whole or part based upon data generated by the TCGA Research Network: https://www.cancer.gov/tcga.
This work was funded by the Spanish Ministry of Sciences, Innovation and Universities under Grant RTI2018-101674-B-I00 as part of project “Computer Architectures and Machine Learning-based solutions for complex challenges in Bioinformatics, Biotechnology and Biomedicine” and by the Government of Andalusia under the grant CV20-64934 as part of the project “Development of an intelligence platform for the integration of heterogenous sources of information (images, genetic information and proteomics) for the characterization and prediction of COVID-19 patients’ virulence and pathogenicity”. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of this manuscript.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Carrillo-Perez, F., Morales, J.C., Castillo-Secilla, D., Guillen, A., Rojas, I., Herrera, L.J. (2021). Comparison of Fusion Methodologies Using CNV and RNA-Seq for Cancer Classification: A Case Study on Non-Small-Cell Lung Cancer. In: Rojas, I., Castillo-Secilla, D., Herrera, L.J., Pomares, H. (eds) Bioengineering and Biomedical Signal and Image Processing. BIOMESIP 2021. Lecture Notes in Computer Science(), vol 12940. Springer, Cham. https://doi.org/10.1007/978-3-030-88163-4_29
Download citation
DOI: https://doi.org/10.1007/978-3-030-88163-4_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-88162-7
Online ISBN: 978-3-030-88163-4
eBook Packages: Computer ScienceComputer Science (R0)