Exploring the Transfer Learning Capabilities of CLIP in Domain Generalization for Diabetic Retinopathy

Baliah, Sanoojan; Maani, Fadillah A.; Sanjeev, Santosh; Khan, Muhammad Haris

doi:10.1007/978-3-031-45673-2_44

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14348))

Included in the following conference series:

International Workshop on Machine Learning in Medical Imaging

1583 Accesses
3 Citations

Abstract

Diabetic Retinopathy (DR), a leading cause of vision impairment, requires early detection and treatment. Developing robust AI models for DR classification holds substantial potential, but a key challenge is ensuring their generalization in unfamiliar domains with varying data distributions. To address this, our paper investigates cross-domain generalization, also known as domain generalization (DG), within the context of DR classification. DG, a challenging problem in the medical domain, is complicated by the difficulty of gathering labeled data across different domains, such as patient demographics and disease stages. Some recent studies have shown the effectiveness of using CLIP to handle the DG problem in natural images. In this study, we investigate CLIP’s transfer learning capabilities and its potential for cross-domain generalization in diabetic retinopathy (DR) classification. We carry out comprehensive experiments to assess the efficacy and potential of CLIP in addressing DG for DR classification. Further, we introduce a multi-modal fine-tuning strategy named Context Optimization with Learnable Visual Tokens (CoOpLVT), which enhances context optimization by conditioning on visual features. Our findings demonstrate that the proposed method increases the F1-score by 1.8% over the baseline, thus underlining its promise for effective DG in DR classification. Our code is publicly available at https://github.com/Sanoojan/CLIP-DRDG.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 8464; Price includes VAT (Japan)

Softcover Book: JPY 10581; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

APTOS: APTOS 2019 Blindness Detection. https://www.kaggle.com/competitions/aptos2019-blindness-detection/data (2019)
Asiri, N., Hussain, M., Al Adel, F., Alzaidi, N.: Deep learning based computer-aided diagnosis systems for diabetic retinopathy: a survey. Artif. Intell. Med. 99 (2019). https://doi.org/10.1016/j.artmed.2019.07.009
Atwany, M., Yaqub, M.: DRGen: domain generalization in diabetic retinopathy classification. In: MICCAI 2022: Proceedings, Part II. pp. 635–644. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-16434-7_61
Ben-David, S., Blitzer, J., Crammer, K., Kulesza, A., Pereira, F.C., Vaughan, J.W.: A theory of learning from different domains. Mach. Learn. 79, 151–175 (2010)
Article MathSciNet MATH Google Scholar
Ben-David, S., Blitzer, J., Crammer, K., Pereira, F.: Analysis of representations for domain adaptation. In: Advances in Neural Information Processing Systems 19 (2006)
Google Scholar
Bodapati, J.D., Shaik, N.S., Naralasetti, V.: Composite deep neural network with gated-attention mechanism for diabetic retinopathy severity classification. J. Ambient. Intell. Humaniz. Comput. 12(10), 9825–9839 (2021)
Article Google Scholar
Bose, S., Fini, E., Jha, A., Singha, M., Banerjee, B., Ricci, E.: StyLIP: multi-scale style-conditioned prompt learning for clip-based domain generalization (2023)
Google Scholar
Cha, J., et al.: SWAD: domain generalization by seeking flat minima. In: NeurIPS 34 (2021)
Google Scholar
Decencière, E., et al.: Feedback on a publicly distributed image database: the Messidor database. Image Anal. Stereol. 33(3), 231–234 (2014). https://doi.org/10.5566/ias.1155
Article MATH Google Scholar
Dosovitskiy, A., et al.: An image is worth 16\(\,\times \,\)16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Dou, Q., de Castro, D.C., Kamnitsas, K., Glocker, B.: Domain generalization via model-agnostic learning of semantic features. In: NeurIPS, pp. 6450–6461 (2019)
Google Scholar
Eslami, S., de Melo, G., Meinel, C.: Does clip benefit visual question answering in the medical domain as much as it does in the general domain? (2021)
Google Scholar
Ghifary, M., Bastiaan Kleijn, W., Zhang, M., Balduzzi, D.: Domain generalization for object recognition with multi-task autoencoders. In: ICCV (2015)
Google Scholar
Gulrajani, I., Lopez-Paz, D.: In search of lost domain generalization. ArXiv:2007.01434 (2021)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
Google Scholar
Huang, K., Altosaar, J., Ranganath, R.: ClinicalBERT: modeling clinical notes and predicting hospital readmission (2020)
Google Scholar
Huang, S.C., Shen, L., Lungren, M.P., Yeung, S.: Gloria: a multimodal global-local representation learning framework for label-efficient medical image recognition. In: ICCV, pp. 3942–3951 (2021)
Google Scholar
Kaggle: diabetic retinopathy detection. https://www.kaggle.com/c/diabetic-retinopathy-detection. Accessed 28 Jan 2023
Kempen, J.H., et al.: The prevalence of diabetic retinopathy among adults in the united states. Archives of Ophthalmology (Chicago, Ill.: 1960) (2004)
Google Scholar
Khan, M.H., Zaidi, T., Khan, S., Khan, F.S.: Mode-guided feature augmentation for domain generalization. In: Proceedings of British Machine Vision Conference (2021)
Google Scholar
Kim, D., Yoo, Y., Park, S., Kim, J., Lee, J.: SelfReg: self-supervised contrastive regularization for domain generalization. In: ICCV, pp. 9619–9628 (2021)
Google Scholar
Kumar, A., Raghunathan, A., Jones, R.M., Ma, T., Liang, P.: Fine-tuning can distort pretrained features and underperform out-of-distribution. In: ICLR (2022)
Google Scholar
Lee, J., et al.: BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36(4), 1234–1240 (2019)
Article Google Scholar
Li, C., et al.: Domain generalization on medical imaging classification using episodic training with task augmentation. Comput. Biol. Med. 141, 105144 (2022)
Article Google Scholar
Li, H., Wang, Y., Wan, R., Wang, S., Li, T.Q., Kot, A.: Domain generalization for medical imaging classification with linear-dependency regularization. In: NeurIPS (2020)
Google Scholar
Liu, J., et al.: Clip-driven universal model for organ segmentation and tumor detection (2023)
Google Scholar
Motiian, S., Piccirilli, M., Adjeroh, D.A., Doretto, G.: Unified deep supervised domain adaptation and generalization. In: ICCV, pp. 5715–5725 (2017)
Google Scholar
Muandet, K., Balduzzi, D., Schölkopf, B.: Domain generalization via invariant feature representation. In: ICML (2013)
Google Scholar
Niu, H., Li, H., Zhao, F., Li, B.: Domain-unified prompt representations for source-free domain generalization (2023)
Google Scholar
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: ICML, pp. 8748–8763. PMLR (2021)
Google Scholar
Radford, A., et al.: Language models are unsupervised multitask learners. OpenAI blog 1(8), 9 (2019)
Google Scholar
Rame, A., Dancette, C., Cord, M.: Fishr: Invariant gradient variances for out-of-distribution generalization. In: ICML. PMLR (2022)
Google Scholar
Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., Jégou, H.: Training data-efficient image transformers & distillation through attention. In: ICML (2021)
Google Scholar
Vapnik, V.: The Nature of Statistical Learning Theory. Springer science & business media (1999). https://doi.org/10.1007/978-1-4757-3264-1
Wang, Z., Wu, Z., Agarwal, D., Sun, J.: MedCLIP: contrastive learning from unpaired medical images and text (2022)
Google Scholar
Wortsman, M., et al.: Robust fine-tuning of zero-shot models. CoRR abs/2109.01903 (2021). https://arxiv.org/abs/2109.01903
Wu, Z., et al.: Coarse-to-fine classification for diabetic retinopathy grading using convolutional neural network. In: Artificial Intelligence in Medicine 108 (2020)
Google Scholar
Zhang, X., Gu, S.S., Matsuo, Y., Iwasawa, Y.: Domain prompt learning for efficiently adapting clip to unseen domains (2022)
Google Scholar
Zhang, Y., Jiang, H., Miura, Y., Manning, C.D., Langlotz, C.P.: Contrastive learning of medical visual representations from paired images and text (2022)
Google Scholar
Zhou, K., Yang, J., Loy, C.C., Liu, Z.: Conditional prompt learning for vision-language models. In: CVPR, pp. 16816–16825 (2022)
Google Scholar
Zhou, K., Yang, J., Loy, C.C., Liu, Z.: Learning to prompt for vision-language models. Int. J. Comput. Vis. 130(9), 2337–2348 (2022)
Google Scholar
Zhou, K., Yang, Y., Hospedales, T., Xiang, T.: Learning to generate novel domains for domain generalization. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12361, pp. 561–578. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58517-4_33
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, UAE
Sanoojan Baliah, Fadillah A. Maani, Santosh Sanjeev & Muhammad Haris Khan

Authors

Sanoojan Baliah
View author publications
You can also search for this author in PubMed Google Scholar
Fadillah A. Maani
View author publications
You can also search for this author in PubMed Google Scholar
Santosh Sanjeev
View author publications
You can also search for this author in PubMed Google Scholar
Muhammad Haris Khan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sanoojan Baliah .

Editor information

Editors and Affiliations

Shanghai United Imaging Intelligence Co., Ltd., Shanghai, China
Xiaohuan Cao
Rensselaer Polytechnic Institute, Troy, NY, USA
Xuanang Xu
Imperial College London, London, UK
Islem Rekik
ShanghaiTech University, Shanghai, China
Zhiming Cui
Shanghai United Imaging Intelligence Co., Ltd., Shanghai, China
Xi Ouyang

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 8614 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Baliah, S., Maani, F.A., Sanjeev, S., Khan, M.H. (2024). Exploring the Transfer Learning Capabilities of CLIP in Domain Generalization for Diabetic Retinopathy. In: Cao, X., Xu, X., Rekik, I., Cui, Z., Ouyang, X. (eds) Machine Learning in Medical Imaging. MLMI 2023. Lecture Notes in Computer Science, vol 14348. Springer, Cham. https://doi.org/10.1007/978-3-031-45673-2_44

Download citation

DOI: https://doi.org/10.1007/978-3-031-45673-2_44
Published: 15 October 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-45672-5
Online ISBN: 978-3-031-45673-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)

Exploring the Transfer Learning Capabilities of CLIP in Domain Generalization for Diabetic Retinopathy