Abstract
‘KnowSeq’ R Package includes all the essential tools to carry out transcriptomic analysis, providing intuitive functions to build efficient and robust pipelines. In this paper, its capacities are demonstrated in a practical COVID-19 biomarkers detection problem using RNA-Sequencing data. Through Machine Learning techniques such as feature selection and supervised classification models, a clinical decision system for COVID-19 was developed using four genes proposed as COVID-19 signature: OAS3, CXCL9, IFITM1 and IFIT3. These four genes are highly related to different processes that affect the immune system behaviour and its response when facing viruses such as SARS-CoV-2. The final model reaches an accuracy over 97% when predicting over unseen samples.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
NCBI. Genbank and WGS statistics. https://www.ncbi.nlm.nih.gov/genbank/statistics/. Accessed May 2021
National human genome research institute. the cost of sequencing a human genome. https://www.genome.gov/about-genomics/fact-sheets/Sequencing-Human-Genome-cost. Accessed May 2021
Fernald, G.H., Capriotti, E., Daneshjou, R., Karczewski, K.J., Altman, R.B.: Bioinformatics challenges for personalized medicine. Bioinformatics 27(13), 1741–1748 (2011)
Overby, C.L., Tarczy-Hornoch, P.: Personalized medicine: challenges and opportunities for translational bioinformatics. Pers. Med. 10(5), 453–462 (2013)
Suwinski, P., Ong, C., Ling, M.H., Poh, Y.M., Khan, A.M., Ong, H.S.: Advancing personalized medicine through the application of whole exome sequencing and big data analytics. Front. Genet. 10, 49 (2019)
Lightbody, G., et al.: Review of applications of high-throughput sequencing in personalized medicine: barriers and facilitators of future progress in research and clinical application. Briefings Bioinform. 20(5), 1795–1811 (2019)
Castillo, D., et al.: Leukemia multiclass assessment and classification from microarray and rna-seq technologies integration at gene expression level. PloS One 14(2), e0212127 (2019)
Fan, Z., Jamil, M., Sadiq, M.T., Huang, X., Yu, X.: Exploiting multiple optimizers with transfer learning techniques for the identification of COVID-19 patients. J. Healthcare Eng. 2020, 8889412 (2020)
Akbari, H., et al.: Depression recognition based on the reconstruction of phase space of eeg signals and geometrical features. Appl. Acoust. 179, 108078 (2021)
Sadiq, M.T., Yu, X., Yuan, Z.: Exploiting dimensionality reduction and neural network techniques for the development of expert brain–computer interfaces. Expert Syst. Appl. 164, 114031 (2021)
Hassantabar, S., Wang, Z., Jha, N.K.: SCANN: synthesis of compact and accurate neural networks. arXiv preprint arXiv:1904.09090 (2019)
Hassantabar, S., Dai, X., Jha, N.K.: Steerage: synthesis of neural networks using architecture search and grow-and-prune methods. arXiv preprint arXiv:1912.05831 (2019)
Hassantabar, S., Terway, P., Jha, N.K.: Tutor: training neural networks using decision rules as model priors. arXiv preprint arXiv:2010.05429 (2020)
Hassantabar, S., et al.: COVIDDEEP: SARS-COV-2/COVID-19 test based on wearable medical sensors and efficient neural networks. arXiv preprint arXiv:2007.10497 (2020)
Imran, A., et al.: AI4COVID-19: AI enabled preliminary diagnosis for COVID-19 from cough samples via an app. Inform. Med. Unlocked 20, 100378 (2020)
Farooq, M., Hafeez, A.: COVID-ResNet: a deep learning framework for screening of covid19 from radiographs. arXiv preprint arXiv:2003.14395 (2020)
Hassantabar, S., Ahmadi, M., Sharifi, A.: Diagnosis and detection of infected tissue of COVID-19 patients based on lung x-ray image using convolutional neural network approaches. Chaos Solitons Fractals 140, 110170 (2020)
Besser, J., Carleton, H.A., Gerner-Smidt, P., Lindsey, R.L., Trees, E.: Next-generation sequencing technologies and their application to the study and control of bacterial infections. Clin. Microbiol. Infection 24(4), 335–341 (2018)
Ozsolak, F., Milos, P.M.: RNA sequencing: advances, challenges and opportunities. Nat. Rev. Genet. 12(2), 87–98 (2011)
Technology networks. RNA-Seq: Basics, applications and protocol. https://www.technologynetworks.com/genomics/articles/rna-seq-basics-applications-and-protocol-299461. Accessed May 2021
Wang, C., et al.: RNA-Seq profiling of circular RNA in human lung adenocarcinoma and squamous cell carcinoma. Mol. Cancer 18(1), 1–6 (2019)
Wang, J., Dean, D.C., Hornicek, F.J., Shi, H., Duan, Z.: RNA sequencing (RNA-Seq) and its application in ovarian cancer. Gynecol. Oncol. 152(1), 194–201 (2019)
Andres-Terre, M., et al.: Integrated, multi-cohort analysis identifies conserved transcriptional signatures across multiple respiratory viruses. Immunity 43(6), 1199–1211 (2015)
Woods, C.W., et al.: A host transcriptional signature for presymptomatic detection of infection in humans exposed to influenza H1N1 or H3N2. PloS One 8(1), e52198 (2013)
Wang, D., Li, J.R., Zhang, Y.H., Chen, L., Huang, T., Cai, Y.D.: Identification of differentially expressed genes between original breast cancer and xenograft using machine learning algorithms. Genes 9(3), 155 (2018)
Townes, F.W., Hicks, S.C., Aryee, M.J., Irizarry, R.A.: Feature selection and dimension reduction for single-cell RNA-Seq based on a multinomial model. Genome Biol. 20(1), 1–16 (2019)
Saeys, Y., Inza, I., Larranaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23(19), 2507–2517 (2007)
Lee, C.P., Leu, Y.: A novel hybrid feature selection method for microarray data analysis. Appl. Soft Comput. 11(1), 208–213 (2011)
Ayyad, S.M., Saleh, A.I., Labib, L.M.: Gene expression cancer classification using modified k-nearest neighbors technique. Biosystems 176, 41–51 (2019)
Cristianini, N., Shawe-Taylor, J., et al.: An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge University Press, Cambridge (2000)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13(1), 21–27 (1967)
Castillo-Secilla, D., et al.: KnowSeq R-Bioc package: the automatic smart gene expression tool for retrieving relevant biological knowledge. Comput. Biol. Med. 133, 104387 (2021)
Gentleman, R.C., et al.: Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 5(10), 1–16 (2004)
Barrett, T., et al.: NCBI geo: archive for functional genomics data sets‒’update. Nucl. Acids Res. 41(D1), D991–D995 (2012)
Massey, F.J., Jr.: The Kolmogorov-Smirnov test for goodness of fit. J. Am. Stat. Assoc. 46(253), 68–78 (1951)
Walfish, S.: A review of statistical outlier methods. Pharm. Technol. 30(11), 82 (2006)
Fujita, A., Sato, J.R., Demasi, M.A.A., Sogayar, M.C., Ferreira, C.E., Miyano, S.: Comparing Pearson, Spearman and Hoeffding’s d measure for gene expression association analysis. J. Bioinform. Comput. Biol. 7(04), 663–684 (2009)
Dudoit, S., Fridlyand, J., Speed, T.P.: Comparison of discrimination methods for the classification of tumors using gene expression data. J. Am. Stat. Assoc. 97(457), 77–87 (2002)
Lazar, C., et al.: Batch effect removal methods for microarray gene expression data integration: a survey. Briefings Bioinform. 14(4), 469–490 (2013)
Zhang, Y., Parmigiani, G., Johnson, W.E.: Combat-seq: batch effect adjustment for RNA-Seq count data. NAR Genom. Bioinform. 2(3), lqaa078 (2020)
Leek, J.T., Storey, J.D.: Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genet. 3(9), e161 (2007)
Love, M.I., Huber, W., Anders, S.: Moderated estimation of fold change and dispersion for RNA-Seq data with DESeq2. Genome Biol. 15(12), 1–21 (2014)
Peng, H., Long, F., Ding, C.: Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1226–1238 (2005)
Van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(11), 2579–2605 (2008)
John, C.R., et al.: M3c: Monte Carlo reference-based consensus clustering. Sci. Rep. 10(1), 1–14 (2020)
DeDiego, M.L., Martinez-Sobrido, L., Topham, D.J.: Novel functions of IFI44l as a feedback regulator of host antiviral responses. J. Virol. 93(21), e01159-19 (2019)
Fensterl, V., Sen, G.C.: The ISG56/IFIT1 gene family. J. Interferon Cytokine Res. 31(1), 71–78 (2011)
Yang, G., Xu, Y., Chen, X., Hu, G.: IFITM1 plays an essential role in the antiproliferative action of interferon-\(\gamma \). Oncogene 26(4), 594–603 (2007)
Rebouillat, D., Hovanessian, A.G.: The human 2’, 5’-oligoadenylate synthetase family: interferon-induced proteins with unique enzymatic properties. J. Interferon Cytokine Res. 19(4), 295–308 (1999)
Coperchini, F., Chiovato, L., Ricci, G., Croce, L., Magri, F., Rotondi, M.: The cytokine storm in COVID-19: further advances in our understanding the role of specific chemokines involved. Cytokine Growth Factor Rev. 58, 82–91 (2021)
Coperchini, F., Chiovato, L., Rotondi, M.: Interleukin-6, CXCL10 and infiltrating macrophages in COVID-19-related cytokine storm: not one for all but all for one! Front. Immunol. 12, 668507 (2021)
Shaath, H., Vishnubalaji, R., Elkord, E., Alajez, N.M.: Single-cell transcriptome analysis highlights a role for neutrophils and inflammatory macrophages in the pathogenesis of severe COVID-19. Cells 9(11), 2374 (2020)
Jain, R., et al.: Host transcriptomic profiling of COVID-19 patients with mild, moderate, and severe clinical outcomes. Comput. Struct. Biotechnol. J. 19, 153–160 (2021)
Blot, M., et al.: CXCL10 could drive longer duration of mechanical ventilation during COVID-19 ARDS. Critical Care 24(1), 1–15 (2020)
Callahan, V., et al.: The pro-inflammatory chemokines CXCL9, CXCL10 and CXCL11 are upregulated following SARS-COV-2 infection in an AKT-dependent manner. Viruses 13(6), 1062 (2021)
Zhou, S., et al.: A neanderthal OAS1 isoform protects individuals of European ancestry against COVID-19 susceptibility and severity. Nat. Med. 27(4), 659–667 (2021)
Wu, M., et al.: Profiling Covid-19 genetic research: a data-driven study utilizing intelligent bibliometrics. Front. Res. Metrics Analytics 6, 30 (2021)
Acknowledgements
This work was funded by the Government of Andalusia under the Project CV20-64934 titled “Development of an intelligent platform that allows the integration of heterogeneous information sources (imaging, genetics and proteomics) for the characterization and prediction of virulence and pathogenicity in patients with COVID-19”.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Ethics declarations
Supplemetary Information
Open source code is available at https://github.com/jbajo09/BIOMESIP-COVID19-KNOWSEQ for researchers to replicate the KnowSeq pipeline proposed.
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Bajo-Morales, J., Castillo-Secilla, D., Herrera, L.J., Rojas, I. (2021). COVID-19 Biomarkers Detection Using ‘KnowSeq’ R Package. In: Rojas, I., Castillo-Secilla, D., Herrera, L.J., Pomares, H. (eds) Bioengineering and Biomedical Signal and Image Processing. BIOMESIP 2021. Lecture Notes in Computer Science(), vol 12940. Springer, Cham. https://doi.org/10.1007/978-3-030-88163-4_37
Download citation
DOI: https://doi.org/10.1007/978-3-030-88163-4_37
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-88162-7
Online ISBN: 978-3-030-88163-4
eBook Packages: Computer ScienceComputer Science (R0)