Improving Data-Efficiency and Robustness of Medical Imaging Segmentation Using Inpainting-Based Self-Supervised Learning

doi:10.3390/bioengineering10020207

. 2023 Feb 4;10(2):207.

doi: 10.3390/bioengineering10020207.

Improving Data-Efficiency and Robustness of Medical Imaging Segmentation Using Inpainting-Based Self-Supervised Learning

Jeffrey Dominic¹, Nandita Bhaskhar², Arjun D Desai^{1

2}, Andrew Schmidt¹, Elka Rubin¹, Beliz Gunel², Garry E Gold¹, Brian A Hargreaves^{1

2

3}, Leon Lenchik⁴, Robert Boutin¹, Akshay S Chaudhari^{1

5

6}

Affiliations

¹ Department of Radiology, Stanford University, Stanford, CA 94305, USA.
² Department of Electrical Engineering, Stanford University, Stanford, CA 94305, USA.
³ Department of Bioengineering, Stanford University, Stanford, CA 94305, USA.
⁴ Department of Radiology, Wake Forest University School of Medicine, Winston-Salem, NC 27101, USA.
⁵ Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA.
⁶ Stanford Cardiovascular Institute, Stanford University, Stanford, CA 94305, USA.

PMID: 36829701
PMCID: PMC9951871
DOI: 10.3390/bioengineering10020207

Improving Data-Efficiency and Robustness of Medical Imaging Segmentation Using Inpainting-Based Self-Supervised Learning

Jeffrey Dominic et al. Bioengineering (Basel). 2023.

. 2023 Feb 4;10(2):207.

doi: 10.3390/bioengineering10020207.

Authors

Affiliations

¹ Department of Radiology, Stanford University, Stanford, CA 94305, USA.
² Department of Electrical Engineering, Stanford University, Stanford, CA 94305, USA.
³ Department of Bioengineering, Stanford University, Stanford, CA 94305, USA.
⁴ Department of Radiology, Wake Forest University School of Medicine, Winston-Salem, NC 27101, USA.
⁵ Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA.
⁶ Stanford Cardiovascular Institute, Stanford University, Stanford, CA 94305, USA.

PMID: 36829701
PMCID: PMC9951871
DOI: 10.3390/bioengineering10020207

Abstract

We systematically evaluate the training methodology and efficacy of two inpainting-based pretext tasks of context prediction and context restoration for medical image segmentation using self-supervised learning (SSL). Multiple versions of self-supervised U-Net models were trained to segment MRI and CT datasets, each using a different combination of design choices and pretext tasks to determine the effect of these design choices on segmentation performance. The optimal design choices were used to train SSL models that were then compared with baseline supervised models for computing clinically-relevant metrics in label-limited scenarios. We observed that SSL pretraining with context restoration using 32 × 32 patches and Poission-disc sampling, transferring only the pretrained encoder weights, and fine-tuning immediately with an initial learning rate of 1 × 10-3 provided the most benefit over supervised learning for MRI and CT tissue segmentation accuracy (p < 0.001). For both datasets and most label-limited scenarios, scaling the size of unlabeled pretraining data resulted in improved segmentation performance. SSL models pretrained with this amount of data outperformed baseline supervised models in the computation of clinically-relevant metrics, especially when the performance of supervised learning was low. Our results demonstrate that SSL pretraining using inpainting-based pretext tasks can help increase the robustness of models in label-limited scenarios and reduce worst-case errors that occur with supervised learning.

Keywords: CT; MRI; deep learning; machine learning; segmentation; self-supervised learning.

PubMed Disclaimer

Conflict of interest statement

The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Figures

**Figure A1**
Box plots displaying the spread of Dice scores among the volumes in the MRI test set. The top row displays the spread of Dice scores after each type of model was trained once, with the initial learning rate set to the appropriate value on the x-axis. The remaining four rows display the spread of Dice scores after each model in the first row was trained again, with the initial learning rate set to the appropriate value on the x-axis. We used the following structure for acronyms to distinguish between the different types of models: **ABC**. If A is F, the pretrained weights were fine-tuned immediately, and if A is FF, the pretrained weights were first frozen and then fine-tuned. If B is E, only the pretrained encoder weights were transferred, and if B is B, both the pretrained encoder and decoder weights were transferred. If C is F, the model was trained only once (the first training run), and if C is S, the model was trained a second time (the second training run).

**Figure A2**
The downstream segmentation performance on the MRI dataset for the Context Prediction pretext task as measured by the Dice score for every combination of patch size and sampling method used during pretraining, evaluated in five different scenarios of training data availability. In each scenario, every model is trained for segmentation using one of the five different subsets of training data as described in Section 2.1.1. The black dotted line in each plot indicates the performance of a fully-supervised model trained using all available training images. The light blue curve indicates the performance of a fully-supervised model when trained using each of the five different subsets of training data.

**Figure A3**
The downstream segmentation performance on the CT dataset for the Context Prediction pretext task as measured by the Dice score for every combination of patch size and sampling method used during pretraining, evaluated in five different scenarios of training data availability. In each scenario, every model is trained for segmentation using one of the five different subsets of training data as described in Section 2.1.2. The black dotted line in each plot indicates the performance of a fully-supervised model trained using all available training images. The light blue curve indicates the performance of a fully-supervised model when trained using each of the five different subsets of training data.

**Figure 1**
Example ground truth segmentations for the MRI and CT datasets (both with dimensions 512 × 512), and example image corruptions for context prediction (zero-ing image patches) and context restoration (swapping image patches). Since image corruption happens after normalization, the zero-ed out image patches for context prediction were actually replaced with the mean of the image. The “Inpainting” section depicts image corruptions with four different patch sizes: 64 × 64, 32 × 32, 16 × 16, and 8 × 8. The locations of these patches were determined using Poisson-disc sampling to prevent randomly overlapping patches.

**Figure 2**
The U-Net architecture used for both inpainting and segmentation, which includes layers grouped into three categories: the “encoder” (in red), the “decoder” (in blue), and the “post-processing” layer (the final convolutional layer). Each dotted rectangular box represents a feature map from the encoder that was concatenated to the first feature map in the decoder at the same level.

**Figure 3**
The downstream segmentation performance on the MRI dataset for the Context Restoration pretext task as measured by the Dice score for every combination of patch size and sampling method used during pretraining, evaluated in five different scenarios of training data availability. In each scenario, every model is trained for segmentation using one of the five different subsets of training data as described in Section 2.1.1. The black dotted line in each plot indicates the performance of a fully-supervised model trained using all available training images. The light blue curve indicates the performance of a fully-supervised model when trained using each of the five different subsets of training data. Similar plots for the Context Prediction pretext task are given in Appendix C.

**Figure 4**
The downstream segmentation performance on the CT dataset for the Context Restoration pretext task as measured by the Dice score for every combination of patch size and sampling method used during pretraining, evaluated in five different scenarios of training data availability. In each scenario, every model is trained for segmentation using one of the five different subsets of training data as described in Section 2.1.2. The black dotted line in each plot indicates the performance of a fully-supervised model trained using all available training images. The light blue curve indicates the performance of a fully-supervised model when trained using each of the five different subsets of training data. Similar plots for the Context Prediction pretext task are given in Appendix C.

**Figure 5**
The downstream segmentation performance of the optimally trained model when pretrained with different amounts of pretraining data and fine-tuned using each of the five training data subsets. 100% pretraining data refers to the regular training set for each dataset. The data point for 0% pretraining data is the performance of a fully-supervised model. The black dotted line indicates the performance of a fully-supervised model trained on all available training data for the appropriate dataset. The other dotted lines are the best-fit curves for each of the training data subsets, modeled as a power-law relationship of the form $y = a x^{k} + c$ . The values of a, k, c, and the Residual Standard Error (S) for the best-fit curves are displayed in the two tables.

**Figure 6**
A comparison of the percent error in calculating clinical metrics for the MRI and CT datasets between when the tissue segmentations are generated by fully-supervised models and when the tissue segmentations are generated by optimally trained models, pretrained using 200% data for MRI and 1200% data for CT. Each bar represents the median percent error across the test set for a particular tissue, clinical metric, and label regime. The percent error in the calculation of tissue cross-sectional area and mean HU for intramuscular fat extends beyond the limits of the y-axis when 10% and 5% labeled training data for segmentation is used.

**Figure 7**
The relationship between the percent error when using supervised learning and the percent error when using SSL. Each blue point represents an image in the test set for the appropriate dataset. The percent error was averaged over all classes and label-limited scenarios. For CT, the intramuscular fat was excluded to prevent large percent error values. For MRI T2 relaxation time, one point with a high percent error for supervised learning was excluded to reduce the range of the x-axis.

See this image and copyright information in PMC

Cited by

Self-Supervised Learning Improves Accuracy and Data Efficiency for IMU-Based Ground Reaction Force Estimation.
Tan T, Shull PB, Hicks JL, Uhlrich SD, Chaudhari AS. Tan T, et al. bioRxiv [Preprint]. 2024 Jan 25:2023.10.25.564057. doi: 10.1101/2023.10.25.564057. bioRxiv. 2024. Update in: IEEE Trans Biomed Eng. 2024 Jul;71(7):2095-2104. doi: 10.1109/TBME.2024.3361888 PMID: 38328126 Free PMC article. Updated. Preprint.
AI in MRI: Computational Frameworks for a Faster, Optimized, and Automated Imaging Workflow.
Shimron E, Perlman O. Shimron E, et al. Bioengineering (Basel). 2023 Apr 20;10(4):492. doi: 10.3390/bioengineering10040492. Bioengineering (Basel). 2023. PMID: 37106679 Free PMC article.
Self-Supervised Learning Improves Accuracy and Data Efficiency for IMU-Based Ground Reaction Force Estimation.
Tan T, Shull PB, Hicks JL, Uhlrich SD, Chaudhari AS. Tan T, et al. IEEE Trans Biomed Eng. 2024 Jul;71(7):2095-2104. doi: 10.1109/TBME.2024.3361888. Epub 2024 Jun 19. IEEE Trans Biomed Eng. 2024. PMID: 38315597
A vision-language foundation model for the generation of realistic chest X-ray images.
Bluethgen C, Chambon P, Delbrouck JB, van der Sluijs R, Połacin M, Zambrano Chaves JM, Abraham TM, Purohit S, Langlotz CP, Chaudhari AS. Bluethgen C, et al. Nat Biomed Eng. 2024 Aug 26. doi: 10.1038/s41551-024-01246-y. Online ahead of print. Nat Biomed Eng. 2024. PMID: 39187663

References

1. Campello V.M., Gkontra P., Izquierdo C., Martín-Isla C., Sojoudi A., Full P.M., Maier-Hein K., Zhang Y., He Z., Ma J., et al. Multi-Centre, Multi-Vendor and Multi-Disease Cardiac Segmentation: The M&Ms Challenge. IEEE Trans. Med. Imaging. 2021;40:3543–3554. - PubMed
1. Kavur A.E., Gezer N.S., Barış M., Aslan S., Conze P.H., Groza V., Pham D.D., Chatterjee S., Ernst P., Özkan S., et al. CHAOS challenge-combined (CT-MR) healthy abdominal organ segmentation. Med. Image Anal. 2021;69:101950. doi: 10.1016/j.media.2020.101950. - DOI - PubMed
1. Desai A.D., Caliva F., Iriondo C., Mortazi A., Jambawalikar S., Bagci U., Perslev M., Igel C., Dam E.B., Gaj S., et al. The international workshop on osteoarthritis imaging knee MRI segmentation challenge: A multi-institute evaluation and analysis framework on a standardized dataset. Radiol. Artif. Intell. 2021;3:e200078. doi: 10.1148/ryai.2021200078. - DOI - PMC - PubMed
1. Fan D.P., Zhou T., Ji G.P., Zhou Y., Chen G., Fu H., Shen J., Shao L. Inf-net: Automatic COVID-19 lung infection segmentation from ct images. IEEE Trans. Med. Imaging. 2020;39:2626–2637. doi: 10.1109/TMI.2020.2996645. - DOI - PubMed
1. Desai A.D., Gold G.E., Hargreaves B.A., Chaudhari A.S. Technical considerations for semantic segmentation in MRI using convolutional neural networks. arXiv. 20191902.01977

Grants and funding

LinkOut - more resources

Full Text Sources

[1] Campello V.M., Gkontra P., Izquierdo C., Martín-Isla C., Sojoudi A., Full P.M., Maier-Hein K., Zhang Y., He Z., Ma J., et al. Multi-Centre, Multi-Vendor and Multi-Disease Cardiac Segmentation: The M&Ms Challenge. IEEE Trans. Med. Imaging. 2021;40:3543–3554. - PubMed

[2] Campello V.M., Gkontra P., Izquierdo C., Martín-Isla C., Sojoudi A., Full P.M., Maier-Hein K., Zhang Y., He Z., Ma J., et al. Multi-Centre, Multi-Vendor and Multi-Disease Cardiac Segmentation: The M&Ms Challenge. IEEE Trans. Med. Imaging. 2021;40:3543–3554. - PubMed

[3] Kavur A.E., Gezer N.S., Barış M., Aslan S., Conze P.H., Groza V., Pham D.D., Chatterjee S., Ernst P., Özkan S., et al. CHAOS challenge-combined (CT-MR) healthy abdominal organ segmentation. Med. Image Anal. 2021;69:101950. doi: 10.1016/j.media.2020.101950. - DOI - PubMed

[4] Kavur A.E., Gezer N.S., Barış M., Aslan S., Conze P.H., Groza V., Pham D.D., Chatterjee S., Ernst P., Özkan S., et al. CHAOS challenge-combined (CT-MR) healthy abdominal organ segmentation. Med. Image Anal. 2021;69:101950. doi: 10.1016/j.media.2020.101950. - DOI - PubMed

[5] Desai A.D., Caliva F., Iriondo C., Mortazi A., Jambawalikar S., Bagci U., Perslev M., Igel C., Dam E.B., Gaj S., et al. The international workshop on osteoarthritis imaging knee MRI segmentation challenge: A multi-institute evaluation and analysis framework on a standardized dataset. Radiol. Artif. Intell. 2021;3:e200078. doi: 10.1148/ryai.2021200078. - DOI - PMC - PubMed

[6] Desai A.D., Caliva F., Iriondo C., Mortazi A., Jambawalikar S., Bagci U., Perslev M., Igel C., Dam E.B., Gaj S., et al. The international workshop on osteoarthritis imaging knee MRI segmentation challenge: A multi-institute evaluation and analysis framework on a standardized dataset. Radiol. Artif. Intell. 2021;3:e200078. doi: 10.1148/ryai.2021200078. - DOI - PMC - PubMed

[7] Fan D.P., Zhou T., Ji G.P., Zhou Y., Chen G., Fu H., Shen J., Shao L. Inf-net: Automatic COVID-19 lung infection segmentation from ct images. IEEE Trans. Med. Imaging. 2020;39:2626–2637. doi: 10.1109/TMI.2020.2996645. - DOI - PubMed

[8] Fan D.P., Zhou T., Ji G.P., Zhou Y., Chen G., Fu H., Shen J., Shao L. Inf-net: Automatic COVID-19 lung infection segmentation from ct images. IEEE Trans. Med. Imaging. 2020;39:2626–2637. doi: 10.1109/TMI.2020.2996645. - DOI - PubMed

[9] Desai A.D., Gold G.E., Hargreaves B.A., Chaudhari A.S. Technical considerations for semantic segmentation in MRI using convolutional neural networks. arXiv. 20191902.01977

[10] Desai A.D., Gold G.E., Hargreaves B.A., Chaudhari A.S. Technical considerations for semantic segmentation in MRI using convolutional neural networks. arXiv. 20191902.01977

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Improving Data-Efficiency and Robustness of Medical Imaging Segmentation Using Inpainting-Based Self-Supervised Learning

Affiliations

Improving Data-Efficiency and Robustness of Medical Imaging Segmentation Using Inpainting-Based Self-Supervised Learning

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Grants and funding

LinkOut - more resources

Full Text Sources