Learning Shape Priors for Robust Cardiac MR Segmentation from Multi-view Images

Chen, Chen; Biffi, Carlo; Tarroni, Giacomo; Petersen, Steffen; Bai, Wenjia; Rueckert, Daniel

doi:10.1007/978-3-030-32245-8_58

Chen Chen¹⁶,
Carlo Biffi¹⁶,
Giacomo Tarroni¹⁶,
Steffen Petersen¹⁷,
Wenjia Bai^18,19 &
…
Daniel Rueckert¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11765))

Included in the following conference series:

International Conference on Medical Image Computing and Computer-Assisted Intervention

13k Accesses
27 Citations

Abstract

Cardiac MR image segmentation is essential for the morphological and functional analysis of the heart. Inspired by how experienced clinicians assess the cardiac morphology and function across multiple standard views (i.e. long- and short-axis views), we propose a novel approach which learns anatomical shape priors across different 2D standard views and leverages these priors to segment the left ventricular (LV) myocardium from short-axis MR image stacks. The proposed segmentation method has the advantage of being a 2D network but at the same time incorporates spatial context from multiple, complementary views that span a 3D space. Our method achieves accurate and robust segmentation of the myocardium across different short-axis slices (from apex to base), outperforming baseline models (e.g. 2D U-Net, 3D U-Net) while achieving higher data efficiency. Compared to the 2D U-Net, the proposed method reduces the mean Hausdorff distance (mm) from 3.24 to 2.49 on the apical slices, from 2.34 to 2.09 on the middle slices and from 3.62 to 2.76 on the basal slices on the test set, when only 10% of the training data was used.

You have full access to this open access chapter, Download conference paper PDF

SMOD - Data Augmentation Based on Statistical Models of Deformation to Enhance Segmentation in 2D Cine Cardiac MRI

Unsupervised Cardiac Segmentation Utilizing Synthesized Images from Anatomical Labels

Segmentation and Quantification of Bi-Ventricles and Myocardium Using 3D SERes-U-Net

1 Introduction

Accurate segmentation of cardiac magnetic resonance (CMR) images is fundamental for assessing cardiac morphology and diagnosing heart conditions [10]. Manual segmentation of the anatomical structures is tedious, time-consuming and prone to subjective errors, which is not suitable for large-scale studies such as UK Biobank^{Footnote 1} [1]. Therefore, it is essential to develop automated, fast and accurate CMR segmentation techniques.

Recently, convolutional neural network (CNN) based methods have achieved very good performance for cardiac image segmentation in terms of both speed and accuracy [1, 2, 12]. However, they may still produce sub-optimal segmentation results in some circumstances. For example, in the Automatic Cardiac Diagnosis Challenge (ACDC) [2], the top segmentation methods (all CNN-based) achieve high overall segmentation scores for mid-ventricular short-axis slices. However, they sometimes produce poor results or even fail to locate the myocardium in basal slices (due to its more complex shape) and apical slices (due to its small size). This problem is not uncommon and has been reported in the related literature [2, 7, 15]. Methods based on 2D networks, trained in a slice-by-slice fashion, are particularly affected by this problem since they do not incorporate spatial context from neighboring SA images or long-axis (LA) views. On the other hand, 3D networks are capable of incorporating 3D spatial information to perform the segmentation task. Yet the 3D spatial context can be affected by potential inter-slice motion artefacts [13] and the low through-plane spatial resolution in cardiac SA stacks, thus limiting their segmentation performance. Compared to 2D ones, 3D networks usually contain more parameter and are prone to over-fitting especially when the training set is limited in size since they use 3D volumes rather than 2D slices as input, significantly reducing the number of training samples.

Experienced clinicians are able to assess the cardiac morphology and function from multiple standard views, using both SA and LA images to form an understanding of the cardiac anatomy. Inspired by this, we propose a method which learns the anatomical prior knowledge across four standard views and leverages this to perform segmentation on 2D SA images. The intuition behind our work is that the representation learnt from multiple standard views is beneficial for the segmentation task on the SA slices as different views should share the same representation of the 3D anatomy if they are from the same subject.

The main contributions of this paper are the following: (a) we developed a novel autoencoder architecture (Shape MAE) which learns latent representation of cardiac shapes from multiple standard views; (b) we developed a segmentation network (multi-view U-Net, adapted from [11]), which is capable of incorporating the anatomical shape priors learned from multi-view images to guide the segmentation on SA images; (c) we assessed the segmentation accuracy and the data efficiency of the proposed segmentation method against common 2D and 3D segmentation baselines by limiting the number of training images, demonstrating that the proposed method is more robust, and less dependent on the size of training data.

Related Literature. A large number of methods have been developed to improve the robustness of the cardiac segmentation. One approach is to learn an ensemble model where the predictions of a 2D and a 3D network are combined [6]. This method is capable of producing accurate results, but has a relatively high computational cost and requires an extra post-processing step to merge the predictions from the two networks. Another approach is to incorporate cardiac anatomical prior knowledge into segmentation networks [5, 9]. In [9], the learned representation of the 3D cardiac shape is employed to constrain the segmentation model to predict anatomically plausible shapes. The main bottleneck of this method is the requirement of fully annotated 3D high-resolution CMR images which are free from inter-slice motion artefacts and have high through-plane spatial resolution. However, compared to the standard 2D imaging protocol, the 3D one requires the subjects to hold their breath for a relatively long time and therefore is often not feasible for patients with cardiovascular diseases. Instead of using 3D images, we exploit routinely acquired 2D standard views to learn the shape representation of the cardiac structures. The learned representation is then injected into a segmentation network to improve its performance on SA CMR images. Of note, the approach in [8] also injects shape priors produced from an autoencoder into a segmentation network. However, the aim of that approach is to generate multiple segmentation hypotheses for ambiguous images, and cannot be readily employed to learn shape priors from different views to enhance cardiac segmentation.

2 Methods

The proposed method consists of two novel architectures: (1) a shape-aware multi-view autoencoder (Shape MAE) which aims at learning anatomical shape priors from standard cardiac acquisition planes incl. short-axis and long-axis views and (2) a multi-view U-Net which performs cardiac short-axis image segmentation by incorporating anatomical priors learned by Shape MAE into a modified U-Net architecture.

Shape MAE: Shape-Aware Multi-view Autoencoder. As illustrated in Fig. 1, we first present a novel architecture named shape-aware multi-view autoencoder (Shape MAE) which learns anatomical shape priors from standard cardiac views through multi-task learning. Given a source view $X_i$, the network learns the low-dimensional representation $z_i$ of $X_i$ that best reconstructs all the j target views segmentations $Y_j$. In this work, we employ four source views $X_i \; (i=1,\dots , 4)$ which are three LA images - the two-chamber view (LA1), three-chamber view (LA2), the four-chamber view (LA3) - and one mid-ventricular slice (Mid-V) from the SA view. The target segmentations views $Y_j$ ($j=1,\dots , 6$) correspond to the four previous views plus two SA slices: the apical one and the basal one. All encoders $E_i: z_i=E_i(X_i)$ and all decoders $D_j: Y_j=D_j(z_i)$ in the Shape MAE share the same architecture (see Fig. 1b).

The loss function $\mathcal {L}_\text {Shape~MAE}$ for the whole network is defined as follows:

$$\begin{aligned} \mathcal {L}_\text {Shape~MAE}= \mathcal {L}_{intra} + \alpha \mathcal {L}_{inter} + \beta \mathcal {L}_{reg} \end{aligned}$$

(1)

The first two terms of Eq. 1 are defined as the cross entropy loss $\mathcal {F}$ between the predicted myocardium segmentation $\hat{Y}_{i\rightarrow j}=D_j(E_i(X_i))$ for the target view j given a source image $X_i$ of the same subject and its ground truth segmentation $Y_j$. $\mathcal {L}_{intra}$ denotes the segmentation loss when the source view $X_i$ and the target view $Y_j$ correspond to the same view: $\mathcal {L}_{intra}=\sum _{i=1, i=j}^{4}\mathcal {F}(Y_{j},\hat{Y}_{i\rightarrow j})$, whereas the second term $\mathcal {L}_{inter}$ denotes the loss when two views are different: $\mathcal {L}_{inter}=\sum _{i=1}^{4}\sum _{j=1, i \ne j}^{6} {\mathcal {F}}(Y_{j},\hat{Y}_{i\rightarrow j})$. The third term is a regularisation term on the latent representations $z_i, z_i \in Z$: $\mathcal {L}_{reg}= \frac{1}{|Z|} \sum _{i=1}^{4}{\left| \left| z_{i} -\bar{z} \right| \right| ^2}$, which penalises the L2 distance between $z_i$ and $\bar{z}$, with $\bar{z} = \frac{1}{|Z|}\sum _{i=1}^{4}{z_i}$ being the average z for a subject. Although the latent shape codes from different views of the same subject are not directly shared, this regularisation term forces them to be close to each other. We use coefficients $\alpha $ and $\beta $ to control the relative importance of $\mathcal {L}_{inter} $ and $\mathcal {L}_{reg}$.

The principle behind the proposed network is that different views require independent functions to map them to the latent space that describes global shape characteristics; whereas translating this latent space to another view or plane also requires a specific projection function. Predicting the shape of the myocardium based on the six target views instead of a single view encourages the network to learn and exploit correlations between different views, resulting in a global, view-invariant shape representation rather than a local representation for a particular view. All the encoders and the decoders in this framework are trained jointly in a multi-task learning fashion, with the benefit of avoiding over-fitting and encouraging model generalisation [3].

MV U-Net: Multi-view U-Net. As shown in Fig. 2, we propose a segmentation network called multi-view U-Net (MV U-Net) based on the original U-Net [11] for cardiac SA image segmentation. The proposed network is capable of incorporating the anatomical shape priors learned by Shape MAE. Similar to the original architecture, the proposed architecture comprises 4 down-sampling blocks and 4 up-sampling blocks to learn multi-scale features. Differently from the original U-Net, we reduced the number of filters at each level by four times to account for the fact that cardiac segmentation is simpler than the lesion segmentation (with multiple candidates) which was the task that the original U-Net was applied to. In addition, a module called ‘Fuse Block’ is introduced in the bottleneck of the network (see Fig. 2b) to inject the latent codes into the segmentation network. This fusing approach is different from that in [8] where the latent codes are simply concatenated with U-Net activations. The proposed module consists of two convolutional (Conv) kernels ($3\times 3$) and a residual connection to combine the shape representations from different views through learnable weights. Thanks to this module, given an arbitrary short-axis image slice $I^p$ from a subject p and its correspondent shape representations ${z_1^p,z_2^p,z_3^p,z_4^p}$ obtained by Shape MAE (one for each of the four standard views), the network can predict a segmentation by distilling the prior knowledge to the high-level features of the network, allowing it to efficiently refine the segmentations through multi-view information. The network is trained using standard training procedure with a cross entropy loss to optimise the parameters $\theta $ of MV U-Net.

3 Experiments and Results

Cardiac Multi-view Image Dataset. Experiments were performed on a dataset acquired from 734 subjects. For each subject, a stack of 2D SA slices and three orthogonal 2D LA images are available. All the LV myocardium were annotated on the SA images as well as the LA images at the end-diastolic (ED) frame using an automated method followed by manual quality control. All the images were acquired using one scanner. The spatial resolution of the images is $1.8 \times 1.8 \times 10$ mm.

In our experiments, the dataset was randomly split into two subsets: a training set (570 cases), a test set (164 cases). All LA images were registered to a template subject using rigid transformation with MIRTK toolkit^{Footnote 2}. All 2D SA slices have been cropped to the size of $128 \times 128$ pixels where the left ventricle is roughly in the center of every image. Benefiting from the view planning (which is a standard step during the cardiac image acquisition), we simply use the intersection point of the three orthogonal LA images on every SA slice to determine its center of the interest region. All the networks were trained for 200 epochs on an NVIDIA GeForce 2080 Ti, using an Adam optimizer with a batch size of 10. The learning rate for Shape MAE was set to 0.0001 whereas the learning rate for the segmentation network was set to 0.001. In our experiments, $\alpha $ was empirically set to 0.5 and $\beta $ to 0.001 in $\mathcal {L}_\text {{Shape~MAE}}$. The proposed algorithm was implemented in Pytorch.

Segmentation Results. To evaluate the segmentation accuracy, we use two measurements: the Dice score and the Hausdorff distance (HD). The proposed method is compared against: a 2D U-Net [11], a state-of-the-art 2D FCN for cardiac MR image segmentation [1], and a 3D U-Net [4]. For fairness and ease of comparison, all models were set with the same number of filters at each level (starting with 16 filters in the first layer) and trained with the same pre-processing and training schedule. For the 3D network, we resampled SA images to a voxel size of $1.8 \times 1.8 \times 1.8$ mm and cropped each to a size of $128 \times 128 \times 64$ during pre-processing. We trained MV U-Net and the baseline networks with two settings: in one case we used 10% of the training set, while in the other one we used 100%. Of note, in each setting, we first trained a Shape MAE and then trained a MV U-Net where shape priors of four standard views were obtained using corresponding encoders in the Shape MAE.

Results on the test set are shown in Table 1. From the table, it can be observed that the proposed method outperforms the baseline models in both the low-data setting and the high-data setting, with improved Dice scores at the apex, middle, and base of the left ventricular myocardium. In particular, when only 10% data was used, the proposed method reduces the mean HD from 3.24 to 2.49 mm on the apical slices, from 2.34 to 2.09 on the middle slices and from 3.62 to 2.76 on the basal slices, compared to the 2D U-Net. Figure 3 shows examples of the segmentation results from all the networks where the proposed method not only produces more robust segmentation across slices compared to the results from the 2D networks, but also achieves more anatomically plausible results in comparison to the 3D one (see the red arrows in this figure). Visualization results of the segmentation networks trained in the high-data setting and Shape MAE are provided in the supplementary material.

Table 1. Comparison of the myocardium segmentation accuracy of the baseline models and the proposed method in terms of the mean and the standard deviation of Dice score and HD distance (mm) obtained on the test set (n = 164). The comparison has been carried out separately for apical, mid-ventricular, and basal slices.

Full size table

4 Discussion and Conclusion

In this work, we presented a shape-aware multi-view autoencoder, a neural network capable of learning anatomical shape priors from multiple standard views, as well as a multi-view U-Net, a modification of the original U-Net architecture that incorporates the learned shape priors to improve the robustness of cardiac segmentation. In contrast to existing works which treat long-axis CMR segmentation and short-axis CMR segmentation as two separate tasks [1, 14], our approach, to the best of our knowledge, is the first that exploits the spatial context from the long-axis images to guide the segmentation on the short-axis images. The reported experimental results show that the proposed segmentation method not only demonstrates superior segmentation accuracy over state-of-the-art 2D baseline methods [1, 11], but also outperforms a 3D U-Net [4]. This improvement is particularly evident on the basal and apical slices in the low-data setting, as expected. When training data is limited, segmenting these challenging slices particularly benefits from the additional anatomical information extracted from the LA views and injected into the segmentation network. Of note, our approach does not require a dedicated acquisition protocol, since LA images are routinely acquired in most CMR imaging schemes. Moreover, the proposed MV U-Net maintains the computational advantage of a 2D network, using fewer parameters ($\sim $1.2 million weights) than the 3D U-Net ($\sim $2.5 million weights) during training. This advantage also contributes to the data efficiency of our method, achieving high segmentation performance with limited training data. Importantly, our method could be extended in the future to multi-structure cardiac segmentation. The proposed approach could also be potentially adopted to other medical image segmentation tasks.

Notes

References

Bai, W., et al.: Automated cardiovascular magnetic resonance image analysis with fully convolutional networks. JCMR 20(1), 65 (2018)
Google Scholar
Bernard, O., et al.: Deep learning techniques for automatic MRI cardiac multi-structures segmentation and diagnosis: is the problem solved? IEEE TMI 37, 2514–2525 (2018)
Google Scholar
Caruana, R.: Multitask learning. Mach. Learn. 28(1), 41–75 (1997)
Article MathSciNet Google Scholar
Çiçek, Ö., Abdulkadir, A., Lienkamp, S.S., Brox, T., Ronneberger, O.: 3D U-Net: learning dense volumetric segmentation from sparse annotation. In: Ourselin, S., Joskowicz, L., Sabuncu, M.R., Unal, G., Wells, W. (eds.) MICCAI 2016. LNCS, vol. 9901, pp. 424–432. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46723-8_49
Chapter Google Scholar
Duan, J., et al.: Automatic 3D bi-ventricular segmentation of cardiac images by a shape-refined multi-task deep learning approach. IEEE TMI 38, 2151–2164 (2019)
Google Scholar
Isensee, F., et al.: Automatic cardiac disease assessment on cine-MRI via time-series segmentation and domain specific features. In: Pop, M., et al. (eds.) STACOM 2017. LNCS, vol. 10663, pp. 120–129. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-75541-0_13
Chapter Google Scholar
Khened, M., et al.: Fully convolutional multi-scale residual densenets for cardiac segmentation and automated cardiac diagnosis using ensemble of classifiers. MedIA 51, 21–45 (2019)
Google Scholar
Kohl, S., et al.: A probabilistic U-Net for segmentation of ambiguous images. In: NeuralIPS, pp. 6965–6975 (2018)
Google Scholar
Oktay, O., et al.: Anatomically constrained neural networks (ACNNs): application to cardiac image enhancement and segmentation. IEEE TMI 37(2), 384–395 (2018)
Google Scholar
Petersen, S.E., et al.: Reference ranges for cardiac structure and function using cardiovascular magnetic resonance (CMR) in caucasians from the UK biobank population cohort. JCMR 19(1), 18 (2017)
Article Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Tao, Q., et al.: Deep learning-based method for fully automatic quantification of left ventricle function from cine MR images: a multivendor, multicenter study. Radiology 290(1), 81–88 (2018)
Article Google Scholar
Tarroni, G., et al.: A comprehensive approach for learning-based fully-automated inter-slice motion correction for short-axis cine cardiac MR image stacks. In: Frangi, A.F., Schnabel, J.A., Davatzikos, C., Alberola-López, C., Fichtinger, G. (eds.) MICCAI 2018. LNCS, vol. 11070, pp. 268–276. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00928-1_31
Chapter Google Scholar
Vigneault, D.M., et al.: Omega-Net: fully automatic, multi-view cardiac MR detection, orientation, and segmentation with deep neural networks. MedIA 48, 95–106 (2018)
Google Scholar
Zheng, Q., et al.: 3-D consistent and robust segmentation of cardiac images by deep learning with spatial propagation. IEEE TMI 37(9), 2137–2148 (2018)
Google Scholar

Download references

Acknowledgements

This work was supported by the SmartHeart EPSRC Programme Grant (EP/P001009/1). Steffen Petersen acknowledges support from the National Institute for Health Research Barts Biomedical Research Centre. The cardiac multi-view image dataset has been provided under UK Biobank Access Application 18545.

Author information

Authors and Affiliations

Biomedical Image Analysis Group, Imperial College London, London, UK
Chen Chen, Carlo Biffi, Giacomo Tarroni & Daniel Rueckert
NIHR Barts BRC, Queen Mary University of London, London, UK
Steffen Petersen
Data Science Institute, Imperial College London, London, UK
Wenjia Bai
Department of Medicine, Imperial College London, London, UK
Wenjia Bai

Authors

Chen Chen
View author publications
You can also search for this author in PubMed Google Scholar
Carlo Biffi
View author publications
You can also search for this author in PubMed Google Scholar
Giacomo Tarroni
View author publications
You can also search for this author in PubMed Google Scholar
Steffen Petersen
View author publications
You can also search for this author in PubMed Google Scholar
Wenjia Bai
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Rueckert
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chen Chen .

Editor information

Editors and Affiliations

University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Dinggang Shen
University of Georgia, Athens, GA, USA
Tianming Liu
Western University, London, ON, Canada
Terry M. Peters
Yale University, New Haven, CT, USA
Lawrence H. Staib
University of Strasbourg, Illkirch, France
Caroline Essert
United Imaging Intelligence, Shanghai, China
Sean Zhou
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Pew-Thian Yap
Western University, London, ON, Canada
Ali Khan

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1634 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, C., Biffi, C., Tarroni, G., Petersen, S., Bai, W., Rueckert, D. (2019). Learning Shape Priors for Robust Cardiac MR Segmentation from Multi-view Images. In: Shen, D., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2019. MICCAI 2019. Lecture Notes in Computer Science(), vol 11765. Springer, Cham. https://doi.org/10.1007/978-3-030-32245-8_58

Download citation

DOI: https://doi.org/10.1007/978-3-030-32245-8_58
Published: 10 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32244-1
Online ISBN: 978-3-030-32245-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)

Learning Shape Priors for Robust Cardiac MR Segmentation from Multi-view Images

Abstract

Similar content being viewed by others

SMOD - Data Augmentation Based on Statistical Models of Deformation to Enhance Segmentation in 2D Cine Cardiac MRI

Unsupervised Cardiac Segmentation Utilizing Synthesized Images from Anatomical Labels

Segmentation and Quantification of Bi-Ventricles and Myocardium Using 3D SERes-U-Net

1 Introduction

2 Methods

3 Experiments and Results

4 Discussion and Conclusion

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 1634 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

Learning Shape Priors for Robust Cardiac MR Segmentation from Multi-view Images

Abstract

Similar content being viewed by others

SMOD - Data Augmentation Based on Statistical Models of Deformation to Enhance Segmentation in 2D Cine Cardiac MRI

Unsupervised Cardiac Segmentation Utilizing Synthesized Images from Anatomical Labels

Segmentation and Quantification of Bi-Ventricles and Myocardium Using 3D SERes-U-Net

1 Introduction

2 Methods

3 Experiments and Results

4 Discussion and Conclusion

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 1634 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation