Abstract
DNA sequencing has become an extremely popular assay with researchers claiming that in the distant future, the DNA sequencing impact will be equal to the microscope impact. Single-cell RNA-seq (scRNA-seq) is an emerging DNA-sequencing technology with promising capabilities, but with major computational challenges due to the large-scaled generated data. Given the fact that sequencing costs are constantly decreasing, the volume and complexity of the data generated by these technologies will be constantly increasing. Toward this direction, major computational challenges are posed at the cell level, in particular, when focusing on the ultra-high dimensionality aspect of the scRNA-seq data. The main challenges are related to three pillars of machine learning (ML) analysis, classification, clustering, and visualization methods. Although there has been remarkable progress in ML methods for single-cell RNA-seq data analysis, numerous questions are still unresolved. This review records the state-of-the-art classification, clustering, and visualization methods tailored for single-cell transcriptomics data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Achlioptas, D.: Database-friendly random projections. In: Proceedings of the Twentieth ACM Symposium on Principles of Database Systems, pp. 274–281. ACM Press (2001)
Amir, E.A.D., Davis, K.L., Tadmor, M.D., Simonds, E.F., Levine, J.H., Bendall, S.C., Shenfeld, D.K., Krishnaswamy, S., Nolan, G.P., Pe’er, D.: ViSVE enables visualization of high dimensional single-cell data and reveals phenotypic heterogeneity of leukemia. Nat. Biotechnol. 31(6), 545 (2013)
Andreu-Perez, J., Poon, C.C., Merrifield, R.D., Wong, S.T., Yang, G.Z.: Big data for health. IEEE J. Biomed. Health Inf. 19(4), 1193–1208 (2015)
Andrews, T.S., Hemberg, M.: Identifying cell populations with scRNASeq. Mol. Aspects Med. 59, 114–122 (2018)
Angerer, P., Simon, L., Tritschler, S., Wolf, F.A., Fischer, D., Theis, F.J.: Single cells make big data: new challenges and opportunities in transcriptomics. Curr. Opin. Syst. Biol. 4, 85–91 (2017)
Becht, E., McInnes, L., Healy, J., Dutertre, C.A., Kwok, I.W., Ng, L.G., Ginhoux, F., Newell, E.W.: Dimensionality reduction for visualizing single-cell data using UMAP. Nat. Biotechnol. 37(1), 38 (2019)
Behbehani, G.K., Bendall, S.C., Clutter, M.R., Fantl, W.J., Nolan, G.P.: Single-cell mass cytometry adapted to measurements of the cell cycle. Cytometry Part A 81(7), 552–566 (2012)
Bendall, S.C., Davis, K.L., Amir, E.A.D., Tadmor, M.D., Simonds, E.F., Chen, T.J., Shenfeld, D.K., Nolan, G.P., Pe’er, D.: Single-cell trajectory detection uncovers progression and regulatory coordination in human B cell development. Cell 157(3), 714–725 (2014)
Bingham, E., Mannila, H.: Random projection in dimensionality reduction: applications to image and text data. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 245–250. ACM Press (2001)
Brown, M.P., Grundy, W.N., Lin, D., Cristianini, N., Sugnet, C.W., Furey, T.S., Ares, M., Haussler, D.: Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc. Natl. Acad. Sci. 97(1), 262–267 (2000)
Buettner, F., Natarajan, K.N., Casale, F.P., Proserpio, V., Scialdone, A., Theis, F.J., Teichmann, S.A., Marioni, J.C., Stegle, O.: Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells. Nat. Biotechnol. 33(2), 155 (2015)
Camara, P.G.: Methods and challenges in the analysis of single-cell RNA-sequencing data. Curr. Opin. Syst. Biol. 7, 47–53 (2018)
Cannings, T.I., Samworth, R.J.: Random projection ensemble classification. J. R. Stat. Soc. Ser. B Stat. Methodol. 79(4), 959–1035 (2017). https://doi.org/10.1111/rssb.12228. https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/rssb.12228
Chen, J., Schlitzer, A., Chakarov, S., Ginhoux, F., Poidinger, M.: Mpath maps multi-branching single-cell trajectories revealing progenitor cell progression during development. Nat. Commun. 7, 11988 (2016)
Cokus, S.J., Feng, S., Zhang, X., Chen, Z., Merriman, B., Haudenschild, C.D., Pradhan, S., Nelson, S.F., Pellegrini, M., Jacobsen, S.E.: Shotgun bisulphite sequencing of the arabidopsis genome reveals DNA methylation patterning. Nature 452(7184), 215 (2008)
Dimitrakopoulou, K., Vrahatis, A.G., Wilk, E., Tsakalidis, A.K., Bezerianos, A.: Olympus: an automated hybrid clustering method in time series gene expression. Case study: host response after influenza a (H1N1) infection. Comput. Methods Prog. Biomed. 111(3), 650–661 (2013)
Dudoit, S., Fridlyand, J., Speed, T.P.: Comparison of discrimination methods for the classification of tumors using gene expression data. J. Am. Stat. Assoc. 97(457), 77–87 (2002)
Eberwine, J., Sul, J.Y., Bartfai, T., Kim, J.: The promise of single-cell sequencing. Nat. Methods 11(1), 25 (2014)
Fonseca, N.A., Rung, J., Brazma, A., Marioni, J.C.: Tools for mapping high-throughput sequencing data. Bioinformatics 28(24), 3169–3177 (2012)
Ghahramani, A., Watt, F.M., Luscombe, N.M.: Generative adversarial networks uncover epidermal regulators and predict single cell perturbations. bioRxiv, p. 262501 (2018)
Gross, A., Schoendube, J., Zimmermann, S., Steeb, M., Zengerle, R., Koltay, P.: Technologies for single-cell isolation. Int. J. Mol. Sci. 16(8), 16897–16919 (2015)
Grün, D., Lyubimova, A., Kester, L., Wiebrands, K., Basak, O., Sasaki, N., Clevers, H., van Oudenaarden, A.: Single-cell messenger RNA sequencing reveals rare intestinal cell types. Nature 525(7568), 251 (2015)
Guo, M., Wang, H., Potter, S.S., Whitsett, J.A., Xu, Y.: Sincera: a pipeline for single-cell RNA-seq profiling analysis. PLoS Comput. Biol. 11(11), e1004575 (2015)
Hedlund, E., Deng, Q.: Single-cell RNA sequencing: technical advancements and biological applications. Mol. Aspects Med. 59, 36–46 (2018)
Huang, X., Liu, S., Wu, L., Jiang, M., Hou, Y.: High throughput single cell RNA sequencing, bioinformatics analysis and applications. In: Single cell biomedicine, pp. 33–43. Springer (2018)
Hwang, B., Lee, J.H., Bang, D.: Single-cell RNA sequencing technologies and bioinformatics pipelines. Exp. Mol. Med. 50(8), 96 (2018)
Ilicic, T., Kim, J.K., Kolodziejczyk, A.A., Bagger, F.O., McCarthy, D.J., Marioni, J.C., Teichmann, S.A.: Classification of low quality cells from single-cell RNA-seq data. Genome Biol. 17(1), 29 (2016)
Jiang, L., Chen, H., Pinello, L., Yuan, G.C.: Giniclust: detecting rare cell types from single-cell gene expression data with gini index. Genome Biol. 17(1), 144 (2016)
Kanter, I., Kalisky, T.: Single cell transcriptomics: methods and applications. Front. Oncol. 5, 53 (2015)
Khalfaoui, B., Vert, J.P.: Droplasso: a robust variant of lasso for single cell RNA-seq data. arXiv preprint arXiv:1802.09381 (2018)
Kharchenko, P.V., Silberstein, L., Scadden, D.T.: Bayesian approach to single-cell differential expression analysis. Nat. Methods 11(7), 740 (2014)
Kiselev, V.Y., Andrews, T.S., Hemberg, M.: Challenges in unsupervised clustering of single-cell RNA-seq data. Nat. Rev. Gen., 1 (2019)
Kiselev, V.Y., Kirschner, K., Schaub, M.T., Andrews, T., Yiu, A., Chandra, T., Natarajan, K.N., Reik, W., Barahona, M., Green, A.R., et al.: SC3: consensus clustering of single-cell RNA-seq data. Nat. Methods 14(5), 483 (2017)
Kobak, D., Berens, P.: The art of using t-SNE for single-cell transcriptomics. bioRxiv, p. 453449 (2018)
Kolodziejczyk, A.A., Kim, J.K., Svensson, V., Marioni, J.C., Teichmann, S.A.: The technology and biology of single-cell RNA sequencing. Mol. Cell 58(4), 610–620 (2015)
Kumar, A., Sabharwal, Y., Sen, S.: A simple linear time \((1+ {\epsilon } )\) -approximation algorithm for k-means clustering in any dimensions. In: 2013 IEEE 54th Annual Symposium on Foundations of Computer Science 0, 454–462. http://doi.ieeecomputersociety.org/10.1109/FOCS.2004.7 (2004)
Lieberman-Aiden, E., Van Berkum, N.L., Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B.R., Sabo, P.J., Dorschner, M.O., et al.: Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326(5950), 289–293 (2009)
Liu, S., Maljovec, D., Wang, B., Bremer, P.T., Pascucci, V.: Visualizing high-dimensional data: advances in the past decade. IEEE Trans. Visual Comput. Graphics 23(3), 1249–1268 (2017)
Luo, J., Wu, M., Gopukumar, D., Zhao, Y.: Big data application in biomedical research and health care: a literature review. Biomed. Inform. Insights 8, BII-S31559 (2016)
Maaten, L.V.D., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(Nov), 2579–2605 (2008)
MacArthur, J., Bowler, E., Cerezo, M., Gil, L., Hall, P., Hastings, E., Junkins, H., McMahon, A., Milano, A., Morales, J., et al.: The new NHGRI-EBI catalog of published genome-wide association studies (GWAS catalog). Nucleic Acids Res. 45(D1), D896–D901 (2016)
Macosko, E.Z., Basu, A., Satija, R., Nemesh, J., Shekhar, K., Goldman, M., Tirosh, I., Bialas, A.R., Kamitaki, N., Martersteck, E.M., et al.: Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell 161(5), 1202–1214 (2015)
Mardis, E.R.: DNA sequencing technologies: 2006–2016. Nat. Protoc. 12(2), 213 (2017)
McInnes, L., Healy, J., Melville, J.: UMAP: uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426 (2018)
Moussa, M., Măndoiu, I.I.: Single cell RNA-seq data clustering using TF-IDF based methods. BMC Genom. 19(6), 127 (2018)
Nusrat, S., Harbig, T., Gehlenborg, N.: Tasks, techniques, and tools for genomic data visualization. arXiv preprint arXiv:1905.02853 (2019)
Ozsolak, F., Milos, P.M.: RNA sequencing: advances, challenges and opportunities. Nat. Rev. Genet. 12(2), 87 (2011)
Papadimitriou, C.H., Raghavan, P., Tamaki, H., Vempala, S.: Latent semantic indexing: a probabilistic analysis. In: Proceedings of the 17th ACM Symposium on the Principles of Database Systems, pp. 159–168 (1998)
Park, P.J.: Chip-seq: advantages and challenges of a maturing technology. Nat. Rev. Genet. 10(10), 669 (2009)
Pennisi, E.: Will computers crash genomics? (2011)
Pierson, E., Yau, C.: ZIFA: dimensionality reduction for zero-inflated single-cell gene expression analysis. Genome Biol. 16(1), 241 (2015)
Poirion, O.B., Zhu, X., Ching, T., Garmire, L.: Single-cell transcriptomics bioinformatics and computational challenges. Front. Genet. 7, 163 (2016)
Popescu, M., Keller, J.M.: Random projections fuzzy k-nearest neighbor (RPFKNN) for big data classification. In: 2016 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), pp. 1813–1817. IEEE (2016)
Qu, Z., Lau, C.W., Nguyen, Q.V., Zhou, Y., Catchpoole, D.R.: Visual analytics of genomic and cancer data: a systematic review. Cancer Inf. 18, 1176935119835546 (2019)
Regev, A., Teichmann, S.A., Lander, E.S., Amit, I., Benoist, C., Birney, E., Bodenmiller, B., Campbell, P., Carninci, P., Clatworthy, M., et al.: Science forum: the human cell atlas. Elife 6, e27041 (2017)
Reuter, J.A., Spacek, D.V., Snyder, M.P.: High-throughput sequencing technologies. Mol. Cell 58(4), 586–597 (2015)
Rostom, R., Svensson, V., Teichmann, S.A., Kar, G.: Computational approaches for interpreting SCRNA-seq data. FEBS Lett. 591(15), 2213–2225 (2017)
Scialdone, A., Natarajan, K.N., Saraiva, L.R., Proserpio, V., Teichmann, S.A., Stegle, O., Marioni, J.C., Buettner, F.: Computational assignment of cell-cycle stage from single-cell transcriptome data. Methods 85, 54–61 (2015)
Setty, M., Tadmor, M.D., Reich-Zeliger, S., Angel, O., Salame, T.M., Kathail, P., Choi, K., Bendall, S., Friedman, N., Pe’er, D.: Wishbone identifies bifurcating developmental trajectories from single-cell data. Nat. Biotechnol. 34(6), 637 (2016)
Shapiro, E., Biezuner, T., Linnarsson, S.: Single-cell sequencing-based technologies will revolutionize whole-organism science. Nat. Rev. Genet. 14(9), 618 (2013)
Shendure, J., Balasubramanian, S., Church, G.M., Gilbert, W., Rogers, J., Schloss, J.A., Waterston, R.H.: DNA sequencing at 40: past, present and future. Nature 550(7676), 345 (2017)
Stegle, O., Teichmann, S.A., Marioni, J.C.: Computational and analytical challenges in single-cell transcriptomics. Nat. Rev. Genet. 16(3), 133 (2015)
Svensson, V., Vento-Tormo, R., Teichmann, S.A.: Exponential scaling of single-cell RNA-seq in the past decade. Nat. Protoc. 13(4), 599 (2018)
Tang, F., Barbacioru, C., Wang, Y., Nordman, E., Lee, C., Xu, N., Wang, X., Bodeau, J., Tuch, B.B., Siddiqui, A., et al.: mRNA-seq whole-transcriptome analysis of a single cell. Nat. Methods 6(5), 377 (2009)
Tang, J., Liu, J., Zhang, M., Mei, Q.: Visualizing large-scale and high-dimensional data. In: Proceedings of the 25th International Conference on World wide web, pp. 287–297. International World Wide Web Conferences Steering Committee (2016)
Tasoulis, S.K., Vrahatis, A.G., Georgakopoulos, S.V., Plagianakos, V.P.: Biomedical data ensemble classification using random projections. In: 2018 IEEE International Conference on Big Data (Big Data), pp. 166–172 (2018). https://doi.org/10.1109/BigData.2018.8622606
Tasoulis, S.K., Vrahatis, A.G., Georgakopoulos, S.V., Plagianakos, V.P.: Visualizing high-dimensional single-cell RNA-sequencing data through multiple random projections. In: 2018 IEEE International Conference on Big Data (Big Data), pp. 5448–5450. IEEE (2018)
Todorov, H., Saeys, Y.: Computational approaches for high-throughput single-cell data analysis. FEBS J. 286(8), 1451–1467 (2018)
Van Dijk, D., Sharma, R., Nainys, J., Yim, K., Kathail, P., Carr, A.J., Burdziak, C., Moon, K.R., Chaffer, C.L., Pattabiraman, D., et al.: Recovering gene interactions from single-cell data using data diffusion. Cell 174(3), 716–729 (2018)
Vrahatis, A.G., Tasoulis, S.K., Dimitrakopoulos, G.N., Plagianakos, V.P.: Visualizing high-dimensional single-cell RNA-seq data via random projections and geodesic distances. In: 2019 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), pp. 1–6. IEEE (2019)
Wang, B., Zhu, J., Pierson, E., Ramazzotti, D., Batzoglou, S.: Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning. Nat. Methods 14(4), 414 (2017)
Weinreb, C., Wolock, S., Klein, A.M.: Spring: a kinetic interface for visualizing high dimensional single-cell expression data. Bioinformatics 34(7), 1246–1248 (2017)
Wetterstrand, K.A.: DNA sequencing costs: data from the NHGRI genome sequencing program (GSP). 2013. http://www.genome.gov/sequencingcosts (2016)
Witten, D.M., et al.: Classification and clustering of sequencing data using a poisson model. Ann. Appl. Stat. 5(4), 2493–2518 (2011)
Wolf, F.A., Angerer, P., Theis, F.J.: Scanpy: large-scale single-cell gene expression data analysis. Genome Biol. 19(1), 15 (2018)
Wu, Y., Tamayo, P., Zhang, K.: Visualizing and interpreting single-cell gene expression datasets with similarity weighted nonnegative embedding. Cell Syst. 7(6), 656–666 (2018)
Xu, C., Su, Z.: Identification of cell types from single-cell transcriptomes using a novel clustering method. Bioinformatics 31(12), 1974–1980 (2015)
Zhao, Y., Tasoulis, S., Roos, T.: Manifold visualization via short walks. In: Proceedings of the Eurographics/IEEE VGTC Conference on Visualization: Short Papers, pp. 85–89. Eurographics Association (2016)
Acknowledgements
This project has received funding from the Hellenic Foundation for Research and Innovation (HFRI) and the General Secretariat for Research and Technology (GSRT), under grant agreement No 1901.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer-Verlag GmbH Germany, part of Springer Nature
About this chapter
Cite this chapter
Vrahatis, A.G., Tasoulis, S.K., Maglogiannis, I., Plagianakos, V.P. (2020). Recent Machine Learning Approaches for Single-Cell RNA-seq Data Analysis. In: Maglogiannis, I., Brahnam, S., Jain, L. (eds) Advanced Computational Intelligence in Healthcare-7. Studies in Computational Intelligence, vol 891. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-61114-2_5
Download citation
DOI: https://doi.org/10.1007/978-3-662-61114-2_5
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-61112-8
Online ISBN: 978-3-662-61114-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)