scMulan: A Multitask Generative Pre-Trained Language Model for Single-Cell Analysis | SpringerLink
Skip to main content

scMulan: A Multitask Generative Pre-Trained Language Model for Single-Cell Analysis

  • Conference paper
  • First Online:
Research in Computational Molecular Biology (RECOMB 2024)

Abstract

Gene expression could be perceived as a form of “cell language”, with underlying regulatory mechanisms akin to biological grammar. Decoding this language is critical in understanding cellular functions and behaviors. In this study, we proposed a new pre-training paradigm by integrating rich metadata and pre-training tasks, and developed scMulan, a multitask generative pre-trained language model for single-cell analyses. scMulan can accomplish multiple tasks in zero-shot manner such as cell-type annotation, batch integration, and conditional cell generation, guided by different task prompts. scMulan is also ready to be expanded for novel tasks through fine-tuning.

H. Bian, Y. Chen, and X. Dong—Equal contribution.

Full-paper arvix: https://www.biorxiv.org/content/10.1101/2024.01.25.577152v1.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 14871
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 18589
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Availability.

The pretrained model of scMulan can be found at https://github.com/SuperBianC/scMulan.

References

  1. Bommasani, R., et al.: On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258. (2021)

  2. Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Others: language models are unsupervised multitask learners. OpenAI Blog. 1, 9 (2019)

    Google Scholar 

  3. Brown, T., et al.: Others: language models are few-shot learners. Adv. Neural. Inf. Process. Syst. 33, 1877–1901 (2020)

    Google Scholar 

  4. Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21, 5485–5551 (2020)

    MathSciNet  Google Scholar 

  5. Yang, F., et al.: ScBERT as a large-scale pretrained deep language model for cell type annotation of single-cell RNA-seq data. Nat. Mach. Intell. 4, 852–866 (2022)

    Article  Google Scholar 

  6. Cui, H., Wang, C., Maan, H., Pang, K., Luo, F., Wang, B.: scGPT: towards building a foundation model for single-cell multi-omics using generative AI (2023). https://www.biorxiv.org/content/10.1101/2023.04.30.538439v2

  7. Hao, M., et al.: Large scale foundation model on single-cell transcriptomics (2023). https://www.biorxiv.org/content/10.1101/2023.05.29.542705v1

  8. Theodoris, C.V., et al.: Transfer learning enables predictions in network biology. Nature 618(7965), 616–624 (2023). https://doi.org/10.1038/s41586-023-06139-9

    Article  Google Scholar 

  9. Kedzierska, K.Z., Crawford, L., Amini, A.P., Lu, A.X.: Assessing the limits of zero-shot foundation models in single-cell biology. Bioinformatics (2023). https://doi.org/10.1101/2023.10.16.561085

    Article  Google Scholar 

  10. Chen, S., et al.: hECA: the cell-centric assembly of a cell atlas. Iscience 25 (2022)

    Google Scholar 

Download references

Acknowledgments

The work is supported in part of National Natural Science Foundation of China (grants 62373210, 62250005), and National Key R&D Program of China (grant 2021YFF1200900).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Lei Wei or Xuegong Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bian, H. et al. (2024). scMulan: A Multitask Generative Pre-Trained Language Model for Single-Cell Analysis. In: Ma, J. (eds) Research in Computational Molecular Biology. RECOMB 2024. Lecture Notes in Computer Science, vol 14758. Springer, Cham. https://doi.org/10.1007/978-1-0716-3989-4_57

Download citation

  • DOI: https://doi.org/10.1007/978-1-0716-3989-4_57

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-1-0716-3988-7

  • Online ISBN: 978-1-0716-3989-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics