Abstract
Diabetic Retinopathy (DR), a leading cause of vision impairment, requires early detection and treatment. Developing robust AI models for DR classification holds substantial potential, but a key challenge is ensuring their generalization in unfamiliar domains with varying data distributions. To address this, our paper investigates cross-domain generalization, also known as domain generalization (DG), within the context of DR classification. DG, a challenging problem in the medical domain, is complicated by the difficulty of gathering labeled data across different domains, such as patient demographics and disease stages. Some recent studies have shown the effectiveness of using CLIP to handle the DG problem in natural images. In this study, we investigate CLIP’s transfer learning capabilities and its potential for cross-domain generalization in diabetic retinopathy (DR) classification. We carry out comprehensive experiments to assess the efficacy and potential of CLIP in addressing DG for DR classification. Further, we introduce a multi-modal fine-tuning strategy named Context Optimization with Learnable Visual Tokens (CoOpLVT), which enhances context optimization by conditioning on visual features. Our findings demonstrate that the proposed method increases the F1-score by 1.8% over the baseline, thus underlining its promise for effective DG in DR classification. Our code is publicly available at https://github.com/Sanoojan/CLIP-DRDG.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
APTOS: APTOS 2019 Blindness Detection. https://www.kaggle.com/competitions/aptos2019-blindness-detection/data (2019)
Asiri, N., Hussain, M., Al Adel, F., Alzaidi, N.: Deep learning based computer-aided diagnosis systems for diabetic retinopathy: a survey. Artif. Intell. Med. 99 (2019). https://doi.org/10.1016/j.artmed.2019.07.009
Atwany, M., Yaqub, M.: DRGen: domain generalization in diabetic retinopathy classification. In: MICCAI 2022: Proceedings, Part II. pp. 635–644. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-16434-7_61
Ben-David, S., Blitzer, J., Crammer, K., Kulesza, A., Pereira, F.C., Vaughan, J.W.: A theory of learning from different domains. Mach. Learn. 79, 151–175 (2010)
Ben-David, S., Blitzer, J., Crammer, K., Pereira, F.: Analysis of representations for domain adaptation. In: Advances in Neural Information Processing Systems 19 (2006)
Bodapati, J.D., Shaik, N.S., Naralasetti, V.: Composite deep neural network with gated-attention mechanism for diabetic retinopathy severity classification. J. Ambient. Intell. Humaniz. Comput. 12(10), 9825–9839 (2021)
Bose, S., Fini, E., Jha, A., Singha, M., Banerjee, B., Ricci, E.: StyLIP: multi-scale style-conditioned prompt learning for clip-based domain generalization (2023)
Cha, J., et al.: SWAD: domain generalization by seeking flat minima. In: NeurIPS 34 (2021)
Decencière, E., et al.: Feedback on a publicly distributed image database: the Messidor database. Image Anal. Stereol. 33(3), 231–234 (2014). https://doi.org/10.5566/ias.1155
Dosovitskiy, A., et al.: An image is worth 16\(\,\times \,\)16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Dou, Q., de Castro, D.C., Kamnitsas, K., Glocker, B.: Domain generalization via model-agnostic learning of semantic features. In: NeurIPS, pp. 6450–6461 (2019)
Eslami, S., de Melo, G., Meinel, C.: Does clip benefit visual question answering in the medical domain as much as it does in the general domain? (2021)
Ghifary, M., Bastiaan Kleijn, W., Zhang, M., Balduzzi, D.: Domain generalization for object recognition with multi-task autoencoders. In: ICCV (2015)
Gulrajani, I., Lopez-Paz, D.: In search of lost domain generalization. ArXiv:2007.01434 (2021)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
Huang, K., Altosaar, J., Ranganath, R.: ClinicalBERT: modeling clinical notes and predicting hospital readmission (2020)
Huang, S.C., Shen, L., Lungren, M.P., Yeung, S.: Gloria: a multimodal global-local representation learning framework for label-efficient medical image recognition. In: ICCV, pp. 3942–3951 (2021)
Kaggle: diabetic retinopathy detection. https://www.kaggle.com/c/diabetic-retinopathy-detection. Accessed 28 Jan 2023
Kempen, J.H., et al.: The prevalence of diabetic retinopathy among adults in the united states. Archives of Ophthalmology (Chicago, Ill.: 1960) (2004)
Khan, M.H., Zaidi, T., Khan, S., Khan, F.S.: Mode-guided feature augmentation for domain generalization. In: Proceedings of British Machine Vision Conference (2021)
Kim, D., Yoo, Y., Park, S., Kim, J., Lee, J.: SelfReg: self-supervised contrastive regularization for domain generalization. In: ICCV, pp. 9619–9628 (2021)
Kumar, A., Raghunathan, A., Jones, R.M., Ma, T., Liang, P.: Fine-tuning can distort pretrained features and underperform out-of-distribution. In: ICLR (2022)
Lee, J., et al.: BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36(4), 1234–1240 (2019)
Li, C., et al.: Domain generalization on medical imaging classification using episodic training with task augmentation. Comput. Biol. Med. 141, 105144 (2022)
Li, H., Wang, Y., Wan, R., Wang, S., Li, T.Q., Kot, A.: Domain generalization for medical imaging classification with linear-dependency regularization. In: NeurIPS (2020)
Liu, J., et al.: Clip-driven universal model for organ segmentation and tumor detection (2023)
Motiian, S., Piccirilli, M., Adjeroh, D.A., Doretto, G.: Unified deep supervised domain adaptation and generalization. In: ICCV, pp. 5715–5725 (2017)
Muandet, K., Balduzzi, D., Schölkopf, B.: Domain generalization via invariant feature representation. In: ICML (2013)
Niu, H., Li, H., Zhao, F., Li, B.: Domain-unified prompt representations for source-free domain generalization (2023)
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: ICML, pp. 8748–8763. PMLR (2021)
Radford, A., et al.: Language models are unsupervised multitask learners. OpenAI blog 1(8), 9 (2019)
Rame, A., Dancette, C., Cord, M.: Fishr: Invariant gradient variances for out-of-distribution generalization. In: ICML. PMLR (2022)
Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., Jégou, H.: Training data-efficient image transformers & distillation through attention. In: ICML (2021)
Vapnik, V.: The Nature of Statistical Learning Theory. Springer science & business media (1999). https://doi.org/10.1007/978-1-4757-3264-1
Wang, Z., Wu, Z., Agarwal, D., Sun, J.: MedCLIP: contrastive learning from unpaired medical images and text (2022)
Wortsman, M., et al.: Robust fine-tuning of zero-shot models. CoRR abs/2109.01903 (2021). https://arxiv.org/abs/2109.01903
Wu, Z., et al.: Coarse-to-fine classification for diabetic retinopathy grading using convolutional neural network. In: Artificial Intelligence in Medicine 108 (2020)
Zhang, X., Gu, S.S., Matsuo, Y., Iwasawa, Y.: Domain prompt learning for efficiently adapting clip to unseen domains (2022)
Zhang, Y., Jiang, H., Miura, Y., Manning, C.D., Langlotz, C.P.: Contrastive learning of medical visual representations from paired images and text (2022)
Zhou, K., Yang, J., Loy, C.C., Liu, Z.: Conditional prompt learning for vision-language models. In: CVPR, pp. 16816–16825 (2022)
Zhou, K., Yang, J., Loy, C.C., Liu, Z.: Learning to prompt for vision-language models. Int. J. Comput. Vis. 130(9), 2337–2348 (2022)
Zhou, K., Yang, Y., Hospedales, T., Xiang, T.: Learning to generate novel domains for domain generalization. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12361, pp. 561–578. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58517-4_33
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Baliah, S., Maani, F.A., Sanjeev, S., Khan, M.H. (2024). Exploring the Transfer Learning Capabilities of CLIP in Domain Generalization for Diabetic Retinopathy. In: Cao, X., Xu, X., Rekik, I., Cui, Z., Ouyang, X. (eds) Machine Learning in Medical Imaging. MLMI 2023. Lecture Notes in Computer Science, vol 14348. Springer, Cham. https://doi.org/10.1007/978-3-031-45673-2_44
Download citation
DOI: https://doi.org/10.1007/978-3-031-45673-2_44
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-45672-5
Online ISBN: 978-3-031-45673-2
eBook Packages: Computer ScienceComputer Science (R0)