dropClust: efficient clustering of ultra-large scRNA-seq data - PubMed Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Apr 6;46(6):e36.
doi: 10.1093/nar/gky007.

dropClust: efficient clustering of ultra-large scRNA-seq data

Affiliations

dropClust: efficient clustering of ultra-large scRNA-seq data

Debajyoti Sinha et al. Nucleic Acids Res. .

Abstract

Droplet based single cell transcriptomics has recently enabled parallel screening of tens of thousands of single cells. Clustering methods that scale for such high dimensional data without compromising accuracy are scarce. We exploit Locality Sensitive Hashing, an approximate nearest neighbour search technique to develop a de novo clustering algorithm for large-scale single cell data. On a number of real datasets, dropClust outperformed the existing best practice methods in terms of execution time, clustering accuracy and detectability of minor cell sub-types.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
(A) 2D embedding of 20K PBMC transcriptomes, chosen randomly from the complete dataset. Separate colours are used for the Louvain- predicted clusters. (B) For each cluster, the number of sampled cells is shown using both SPS and random sampling. Size of the Louvain clusters are indicated by the size of the bubbles. X-axis shows cluster ID, whereas Y-axis shows the sampling fraction. True number of cells are also indicated on each of the bubbles. In this case 500 transcriptomes are sampled through SPS. (C) Similar figure with 2000 as sample size.
Figure 2.
Figure 2.
Barplot depicting the number of estimated Gaussian components for each of the top 50 principal components derived from the PBMC data.
Figure 3.
Figure 3.
Bars show the ARI indexes obtained by comparing clustering outcomes with cell-type annotations.
Figure 4.
Figure 4.
Localization of PBMC transcriptomes of same type (based on annotation) on the 2D embedding produced by dropClust. Each sub-figure corresponds to one of the well known immune cell types considered for benchmarking clustering accuracy by Zheng et al. (2).
Figure 5.
Figure 5.
Clustering of ∼68K PBMC data. dropClust based visualization (a modified version of tSNE) of the transcriptomes. Fourteen clusters, retrieved by the algorithm are marked with their respective cluster IDs. Legends show the names of the inferred cell types.
Figure 6.
Figure 6.
Trend of increase in analysis (preprocessing, clustering and vizualization)) time for different pipelines with growing number of transcriptomes under analysis.
Figure 7.
Figure 7.
Detectability of minor cell types. Bars showing average of F1-scores, obtained on 10 simulated datasets at each concentration of the minor population. A dataset containing mixture of Jurkat and 293T cells was used for this study.
Figure 8.
Figure 8.
(A) Boxplots depicting average Silhouette scores computed on 100 bootstrap samples from the mouse retina cell data (7). A separate boxplot is used for each concerned clustering method. (B) Similar plots for the mouse ESC dataset (10).

Similar articles

Cited by

References

    1. Tanay A., Regev A.. Scaling single-cell genomics from phenomenology to mechanism. Nature. 2017; 541:331–338. - PMC - PubMed
    1. Zheng G.X.Y., Terry J.M., Belgrader P., Ryvkin P., Bent Z.W., Wilson R., Ziraldo S.B., Wheeler T.D., McDermott G.P., Zhu J. et al. . Massively parallel digital transcriptional profiling of single cells. Nat. Commun. 2017; 8:14049. - PMC - PubMed
    1. Li H., Courtois E.T., Sengupta D., Tan Y., Chen K.H., Goh J.J.L., Kong S.L., Chua C., Hon L.K., Tan W.S. et al. . Reference component analysis of single-cell transcriptomes elucidates cellular heterogeneity in human colorectal tumors. Nat. Genet. 2017; 49:708–718. - PubMed
    1. Kiselev V.Y., Kirschner K., Schaub M.T., Andrews T., Yiu A., Chandra T., Natarajan K.N., Reik W., Barahona M., Green A.R. et al. . SC3: consensus clustering of single-cell RNA-seq data. Nat. Methods. 2017; 14:483–486. - PMC - PubMed
    1. Zeisel A., Munoz-Manchado A.B., Codeluppi S., Lonnerberg P., La Manno G., Jureus A., Marques S., Munguba H., He L., Betsholtz C. et al. . Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq. Science. 2015; 347:1138–1142. - PubMed

Publication types

MeSH terms

Substances