Abstract
Gene expression could be perceived as a form of “cell language”, with underlying regulatory mechanisms akin to biological grammar. Decoding this language is critical in understanding cellular functions and behaviors. In this study, we proposed a new pre-training paradigm by integrating rich metadata and pre-training tasks, and developed scMulan, a multitask generative pre-trained language model for single-cell analyses. scMulan can accomplish multiple tasks in zero-shot manner such as cell-type annotation, batch integration, and conditional cell generation, guided by different task prompts. scMulan is also ready to be expanded for novel tasks through fine-tuning.
H. Bian, Y. Chen, and X. Dong—Equal contribution.
Full-paper arvix: https://www.biorxiv.org/content/10.1101/2024.01.25.577152v1.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Availability.
The pretrained model of scMulan can be found at https://github.com/SuperBianC/scMulan.
References
Bommasani, R., et al.: On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258. (2021)
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Others: language models are unsupervised multitask learners. OpenAI Blog. 1, 9 (2019)
Brown, T., et al.: Others: language models are few-shot learners. Adv. Neural. Inf. Process. Syst. 33, 1877–1901 (2020)
Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21, 5485–5551 (2020)
Yang, F., et al.: ScBERT as a large-scale pretrained deep language model for cell type annotation of single-cell RNA-seq data. Nat. Mach. Intell. 4, 852–866 (2022)
Cui, H., Wang, C., Maan, H., Pang, K., Luo, F., Wang, B.: scGPT: towards building a foundation model for single-cell multi-omics using generative AI (2023). https://www.biorxiv.org/content/10.1101/2023.04.30.538439v2
Hao, M., et al.: Large scale foundation model on single-cell transcriptomics (2023). https://www.biorxiv.org/content/10.1101/2023.05.29.542705v1
Theodoris, C.V., et al.: Transfer learning enables predictions in network biology. Nature 618(7965), 616–624 (2023). https://doi.org/10.1038/s41586-023-06139-9
Kedzierska, K.Z., Crawford, L., Amini, A.P., Lu, A.X.: Assessing the limits of zero-shot foundation models in single-cell biology. Bioinformatics (2023). https://doi.org/10.1101/2023.10.16.561085
Chen, S., et al.: hECA: the cell-centric assembly of a cell atlas. Iscience 25 (2022)
Acknowledgments
The work is supported in part of National Natural Science Foundation of China (grants 62373210, 62250005), and National Key R&D Program of China (grant 2021YFF1200900).
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Bian, H. et al. (2024). scMulan: A Multitask Generative Pre-Trained Language Model for Single-Cell Analysis. In: Ma, J. (eds) Research in Computational Molecular Biology. RECOMB 2024. Lecture Notes in Computer Science, vol 14758. Springer, Cham. https://doi.org/10.1007/978-1-0716-3989-4_57
Download citation
DOI: https://doi.org/10.1007/978-1-0716-3989-4_57
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-1-0716-3988-7
Online ISBN: 978-1-0716-3989-4
eBook Packages: Computer ScienceComputer Science (R0)