Abstract
White Matter Hyperintensities (WMH) are important neuroradiological markers of small vessel disease in brain MRI, with automatic segmentation tasks essential in research and clinical settings to understand their role in individuals’ health. However accurate segmentation of WMH is difficult due to their heterogeneous shape, intensity, size and location. Furthermore, image analysts working on different studies have adopted different approaches for providing accurate WMH segmentations, resulting in high inter-analyst variability. We assess the effectiveness of stochastic uncertainty quantification (UQ) techniques for bridging the variability in approaches and criteria in WMH segmentation. We first train six such techniques on an in-house dataset with two segmentation approaches, and then evaluate performance across three studies unseen by the model when training: Mild Stroke Study 3, the Lothian Birth Cohort 1936 and the WMH Challenge dataset. To aid in our analysis, we introduce two metrics: Uncertainty Inter Rater Overlap (UIRO) and Joint Uncertainty Error Overlap (JUEO). Our results show that changes in analyst policy between datasets dominates the uncertainty in the WMH segmentation task. Crucially, the distribution of segmentations predicted by stochastic models can fail to match the distribution of segmentations provided by analysts who are following approaches that differ from those used during training. We further suggest how to modify the task and cost function to overcome these difficulties.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Abdar, M., et al.: A review of uncertainty quantification in deep learning: techniques, applications and challenges. Inf. Fusion 76, 243–297 (2021)
Abe, T., Buchanan, E.K., Pleiss, G., Zemel, R., Cunningham, J.P.: Deep ensembles work, but are they necessary? Adv. Neural. Inf. Process. Syst. 35, 33646–33660 (2022)
Altman, D.G., Bland, J.M.: Measurement in medicine: the analysis of method comparison studies. J. R. Stat. Soc. Ser. D: Stat. 32(3), 307–317 (1983)
Balakrishnan, R., del C. Valdes Hernández, M., Farrall, A.J.: Automatic segmentation of white matter hyperintensities from brain magnetic resonance images in the era of deep learning and big data–a systematic review. Comput. Med. Imaging Graph. 88, 101867 (2021)
Begoli, E., Bhattacharya, T., Kusnezov, D.: The need for uncertainty quantification in machine-assisted medical decision making. Nat. Mach. Intell. 1(1), 20–23 (2019)
Bhat, I., Pluim, J.P., Kuijf, H.J.: Generalized probabilistic U-Net for medical image segmentation. In: Sudre, C.H., et al. (eds.) International Workshop on Uncertainty for Safe Utilization of Machine Learning in Medical Imaging, pp. 113–124. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-16749-2_11
Billot, B., et al.: SynthSeg: segmentation of brain MRI scans of any contrast and resolution without retraining. Med. Image Anal. 86, 102789 (2023). https://doi.org/10.1016/j.media.2023.102789
Blundell, C., Cornebise, J., Kavukcuoglu, K., Wierstra, D.: Weight uncertainty in neural network. In: International Conference on Machine Learning, pp. 1613–1622. PMLR (2015)
Clancy, U., et al.: Rationale and design of a longitudinal study of cerebral small vessel diseases, clinical and imaging outcomes in patients presenting with mild ischaemic stroke: mild stroke study 3. Eur. Stroke J. 6(1), 81–88 (2021)
Czolbe, S., Arnavaz, K., Krause, O., Feragen, A.: Is segmentation uncertainty useful? In: Feragen, A., Sommer, S., Schnabel, J., Nielsen, M. (eds.) Information Processing in Medical Imaging: 27th International Conference, IPMI 2021, Virtual Event, 28 June–30 June 2021, Proceedings 27, pp. 715–726. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-78191-0_55
Ding, T., et al.: An improved algorithm of white matter hyperintensity detection in elderly adults. NeuroImage: Clinical 25, 102151 (2020)
Fazekas, F., Chawluk, J.B., Alavi, A., Hurtig, H.I., Zimmerman, R.A.: MR signal abnormalities at 1.5 T in Alzheimer’s dementia and normal aging. Am. J. Neuroradiol. 8(3), 421–426 (1987)
Gal, Y., Ghahramani, Z.: Dropout as a Bayesian approximation: representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059. PMLR (2016)
Gal, Y., Hron, J., Kendall, A.: Concrete dropout. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Galdran, A., Verjans, J.W., Carneiro, G., González Ballester, M.A.: Multi-head multi-loss model calibration. In: Greenspan, H., et al. (eds.) International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 108–117. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-43898-1_11
Gaubert, M., et al.: Performance evaluation of automated white matter hyperintensity segmentation algorithms in a multicenter cohort on cognitive impairment and dementia. Front. Psychiatry 13, 2928 (2023)
Gouw, A.A., et al.: Reliability and sensitivity of visual scales versus volumetry for evaluating white matter hyperintensity progression. Cerebrovasc. Dis. 25(3), 247–253 (2008)
Griffanti, L., et al.: Classification and characterization of periventricular and deep white matter hyperintensities on MRI: a study in older adults. Neuroimage 170, 174–181 (2018)
Guerrero, R., et al.: White matter hyperintensity and stroke lesion segmentation and differentiation using convolutional neural networks. NeuroImage: Clinical 17, 918–934 (2018)
Han, Z., Zhang, C., Fu, H., Zhou, J.T.: Trusted multi-view classification with dynamic evidential fusion. IEEE Trans. Pattern Anal. Mach. Intell. 45(2), 2551–2566 (2022)
Harrison, J., Willes, J., Snoek, J.: Variational Bayesian last layers. In: Fifth Symposium on Advances in Approximate Bayesian Inference (2023)
del C. Valdes Hernández, M.V., et al.: Morphologic, distributional, volumetric, and intensity characterization of periventricular hyperintensities. Am. J. Neuroradiol. 35(1), 55–62 (2014)
Hubin, A., Storvik, G.: Variational inference for Bayesian neural networks under model and parameter uncertainty. arXiv preprint arXiv:2305.00934 (2023)
Iglesias, J.E., Liu, C.Y., Thompson, P.M., Tu, Z.: Robust brain extraction across datasets and comparison with publicly available methods. IEEE Trans. Med. Imaging 30(9), 1617–1634 (2011)
Jungo, A., Balsiger, F., Reyes, M.: Analyzing the quality and challenges of uncertainty estimations for brain tumor segmentation. Front. Neurosci. 14, 282 (2020)
Kendall, A., Gal, Y.: What uncertainties do we need in Bayesian deep learning for computer vision? In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Kim, K.W., MacFall, J.R., Payne, M.E.: Classification of white matter lesions on magnetic resonance imaging in elderly persons. Biol. Psychiat. 64(4), 273–280 (2008)
Kohl, S., et al.: A probabilistic U-Net for segmentation of ambiguous images. In: Advances in Neural Information Processing Systems, vol. 31 (2018)
Kohl, S.A.A., et al.: A hierarchical probabilistic U-Net for modeling multi-scale ambiguities (2019)
Kuijf, H.J., et al.: Standardized assessment of automatic segmentation of white matter hyperintensities and results of the WMH segmentation challenge. IEEE Trans. Med. Imaging 38(11), 2556–2568 (2019)
Kushibar, K., Campello, V.M., Moras, L.G., Linardos, A., Radeva, P., Lekadir, K.: Layer ensembles: a single-pass uncertainty estimation in deep learning for segmentation. arXiv preprint arXiv:2203.08878 (2022)
Lakshminarayanan, B., Pritzel, A., Blundell, C.: Simple and scalable predictive uncertainty estimation using deep ensembles. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Lambert, B., Forbes, F., Doyle, S., Dehaene, H., Dojat, M.: Trustworthy clinical AI solutions: a unified review of uncertainty quantification in deep learning models for medical image analysis. Artif. Intell. Med. 102830 (2024)
Li, H., Nan, Y., Del Ser, J., Yang, G.: Region-based evidential deep learning to quantify uncertainty and improve robustness of brain tumor segmentation. arXiv preprint arXiv:2208.06038 (2022)
Litjens, G., et al.: A survey on deep learning in medical image analysis. Med. Image Anal. 42, 60–88 (2017)
Liu, J.Z., et al.: A simple approach to improve single-model deep uncertainty via distance-awareness. J. Mach. Learn. Res. 24, Article no. 42, 1667–1729 (2023)
Maillard, P., et al.: White matter hyperintensity penumbra. Stroke 42(7), 1917–1922 (2011)
Mojiri Forooshani, P., et al.: Deep Bayesian networks for uncertainty estimation and adversarial resistance of white matter hyperintensity segmentation. Technical report. Wiley Online Library (2022)
Monteiro, M., et al.: Stochastic segmentation networks: modelling spatially correlated aleatoric uncertainty. Adv. Neural Inf. Process. Syst. 33, 12756–12767 (2020)
Mukhoti, J., Kirsch, A., van Amersfoort, J., Torr, P.H., Gal, Y.: Deterministic neural networks with inductive biases capture epistemic and aleatoric uncertainty. arXiv preprint arXiv:2102.11582 (2021)
Mukhoti, J., Kirsch, A., van Amersfoort, J., Torr, P.H., Gal, Y.: Deep deterministic uncertainty: a new simple baseline. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 24384–24394 (2023)
Osband, I., et al.: Epistemic neural networks. In: Advances in Neural Information Processing Systems, vol. 36 (2024)
Park, G., Hong, J., Duffy, B.A., Lee, J.M., Kim, H.: White matter hyperintensities segmentation using the ensemble U-Net with multi-scale highlighting foregrounds. Neuroimage 237, 118140 (2021)
Rachmadi, M.F., del C. Valdés-Hernández, M., Makin, S., Wardlaw, J., Komura, T.: Automatic spatial estimation of white matter hyperintensities evolution in brain MRI using disease evolution predictor deep neural networks. Med. Image Anal. 63, 101712 (2020)
Reinhold, J.C., Dewey, B.E., Carass, A., Prince, J.L.: Evaluating the impact of intensity normalization on MR image synthesis. In: Medical Imaging 2019: Image Processing, vol. 10949, pp. 890–898. SPIE (2019)
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Sensoy, M., Kaplan, L., Kandemir, M.: Evidential deep learning to quantify classification uncertainty. In: Advances in Neural Information Processing Systems, vol. 31 (2018)
Sohn, K., Lee, H., Yan, X.: Learning structured output representation using deep conditional generative models. In: Cortes, C., Lawrence, N., Lee, D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 28. Curran Associates, Inc. (2015)
Székely, G.J., Rizzo, M.L.: Energy statistics: a class of statistics based on distances. J. Stat. Plann. Inference 143(8), 1249–1272 (2013)
del C. Valdés Hernández, M., et al.: Step-by-step pipeline for segmenting enlarged perivascular spaces from 3D T2-weighted MRI (2018–2023). https://doi.org/10.7488/ds/7486
Vettoruzzo, A., Bouguelia, M.R., Vanschoren, J., Rognvaldsson, T., Santosh, K.: Advances and challenges in meta-learning: a technical review. IEEE Trans. Pattern Anal. Mach. Intell. (2024)
Viviers, C.G., Valiuddin, M.A., van der Sommen, F., et al.: Probabilistic 3D segmentation for aleatoric uncertainty quantification in full 3D medical data. In: Medical Imaging 2023: Computer-Aided Diagnosis, vol. 12465, pp. 341–351. SPIE (2023)
Wardlaw, J.M., et al.: Brain aging, cognition in youth and old age and vascular disease in the Lothian Birth Cohort 1936: rationale, design and methodology of the imaging protocol. Int. J. Stroke 6(6), 547–559 (2011)
Wardlaw, J.M., et al.: Neuroimaging standards for research into small vessel disease and its contribution to ageing and neurodegeneration. Lancet Neurol. 12(8), 822–838 (2013)
Wardlaw, J.M., del C. Valdés Hernández, M., Muñoz-Maniega, S.: What are white matter hyperintensities made of? Relevance to vascular cognitive impairment. J. Am. Heart Assoc. 4(6), e001140 (2015)
Wimmer, L., Sale, Y., Hofman, P., Bischl, B., Hüllermeier, E.: Quantifying aleatoric and epistemic uncertainty in machine learning: are conditional entropy and mutual information appropriate measures? In: Uncertainty in Artificial Intelligence, pp. 2282–2292. PMLR (2023)
Zhang, R., Frei, S., Bartlett, P.L.: Trained transformers learn linear models in-context. J. Mach. Learn. Res. 25(49), 1–55 (2024)
Zhang, Y., Brady, M., Smith, S.: Segmentation of brain MR images through a hidden Markov random field model and the expectation-maximization algorithm. IEEE Trans. Med. Imaging 20(1), 45–57 (2001)
Zhao, X., et al.: Robust white matter hyperintensity segmentation on unseen domain. In: 2021 IEEE 18th International Symposium on Biomedical Imaging (ISBI), pp. 1047–1051. IEEE (2021)
Zhu, W., et al.: Automatic segmentation of white matter hyperintensities in routine clinical brain MRI by 2D VB-Net: a large-scale study. Front. Aging Neurosci. 14, 915009 (2022)
Zou, K., Chen, Z., Yuan, X., Shen, X., Wang, M., Fu, H.: A review of uncertainty estimation and its application in medical imaging. Meta-Radiol. 100003 (2023)
Zou, K., et al.: EvidenceCap: towards trustworthy medical image segmentation via evidential identity cap. arXiv preprint arXiv:2301.00349 (2023)
Acknowledgements
BP was funded by the United Kingdom Research and Innovation Centre for Doctoral Training in Biomedical AI Programme scholarships (grant EP/S02431X/1). For the purpose of open access, the author has applied a creative commons attribution (CC BY) licence to any author accepted manuscript version arising. Funding from Row Fogo Charitable Trust (Ref No: AD.ROW4.35. BRO-D.FID3668413), and the UK Medical Research Council (UK Dementia Research Institute at the University of Edinburgh, award number UK DRI-4002;G0700704/84698) are also gratefully acknowledged. M.O.B. gratefully acknowledges funding from: Foundation Leducq Transatlantic Network of Excellence (17 CVD 03); EPSRC grant no. EP/X025705/1; British Heart Foundation and The Alan Turing Institute Cardiovascular Data Science Award (C-10180357); Diabetes UK (20/0006221); Fight for Sight (5137/5138); the SCONe projects funded by Chief Scientist Office, Edinburgh & Lothians Health Foundation, Sight Scotland, the Royal College of Surgeons of Edinburgh, the RS Macdonald Charitable Trust, and Fight For Sight; the Neurii initiative which is a partnership among Eisai Co., Ltd, Gates Ventures, LifeArc and HDR UK. Data collection and processing in the primary studies that provided data were funded by the Wellcome Trust (grant 088134/Z/09/A), the European Union Horizon 2020, PHC-03-15, project No. 666881 SVDs@Target, the Fondation Leducq Transatlantic Network of Excellence for the Study of Perivascular Spaces in Small Vessel Disease, ref no. 16 CVD 05, the Stroke Association, The Alzheimer’s Society UK, the UKRI, and the Scottish Chief Scientist Office through the NHS Lothian Research and Development Department.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Philps, B. et al. (2024). Stochastic Uncertainty Quantification Techniques Fail to Account for Inter-analyst Variability in White Matter Hyperintensity Segmentation. In: Yap, M.H., Kendrick, C., Behera, A., Cootes, T., Zwiggelaar, R. (eds) Medical Image Understanding and Analysis. MIUA 2024. Lecture Notes in Computer Science, vol 14859. Springer, Cham. https://doi.org/10.1007/978-3-031-66955-2_3
Download citation
DOI: https://doi.org/10.1007/978-3-031-66955-2_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-66954-5
Online ISBN: 978-3-031-66955-2
eBook Packages: Computer ScienceComputer Science (R0)