Abstract
The integration of orthogonal data modalities greatly supports the interpretation of transcriptomic landscapes in complex tissues. In particular, spatially resolved gene expression profiles are key to understand tissue organization and function. However, spatial transcriptomics (ST) profiling techniques lack single-cell resolution and require a combination with single-cell RNA sequencing (scRNA-seq) information to deconvolute the spatially indexed datasets. Leveraging the strengths of both data types, we developed SPOTlight, a computational tool that enables the integration of ST with scRNA-seq data to infer the location of cell types and states within a complex tissue. SPOTlight is centered around a seeded non-negative matrix factorization (NMF) regression, initialized using cell-type marker genes, and non-negative least squares (NNLS) to subsequently deconvolute ST capture locations (spots). Using synthetic spots, simulating varying reference quantities and qualities, we confirmed high prediction accuracy also with shallowly sequenced or small-sized scRNA-seq reference datasets. We trained the NMF regression model with sample-matched or external datasets, resulting in accurate and sensitive spatial predictions. SPOTlight deconvolution of the mouse brain correctly mapped subtle neuronal cell states of the cortical layers and the defined architecture of the hippocampus. In human pancreatic cancer, we successfully segmented patient sections into healthy and cancerous areas, and further fine-mapped normal and neoplastic cell states. Trained on an external pancreatic tumor immune reference, we charted the localization of clinical-relevant and tumor-specific immune cell states. Using SPOTlight to detect regional enrichment of immune cells and their co-localization with tumor and adjacent stroma provides an illustrative example in its flexible application spectrum and future potential in digital pathology.
Competing Interest Statement
The authors have declared no competing interest.