ARS-Net: Adaptively Rectified Supervision Network for Automated 3D Ultrasound Image Segmentation

Liu, Chaoyue; Dong, Guohao; Lin, Muqing; Zou, Yaoxian; Liang, Tianzhu; He, Xujin; Chen, Zhijie; Ni, Dong; Xiong, Yi; Zhu, Lei

doi:10.1007/978-3-030-32248-9_42

Chaoyue Liu¹⁶,
Guohao Dong¹⁶,
Muqing Lin¹⁶,
Yaoxian Zou¹⁶,
Tianzhu Liang¹⁶,
Xujin He¹⁶,
Zhijie Chen¹⁶,
Dong Ni¹⁷,
Yi Xiong¹⁸ &
…
Lei Zhu¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11766))

Included in the following conference series:

International Conference on Medical Image Computing and Computer-Assisted Intervention

10k Accesses
1 Citations

Abstract

3D Ultrasound (3DUS) has been widely used in clinical diagnosis. However, volume segmentation of 3DUS is very challenging due to relatively poor image quality and usually small datasets. We propose an efficient and robust method (ARS-Net) for single organ segmentation in 3DUS. Our contributions are twofold. (i) We propose a 2.5D framework based on a 2D segmentation network with 2.5D input which can provide more contextual and spatial information. The proposed framework also overcomes the limitation of 3D networks, such as small datasets, lack of pre-trained models and high memory cost. (ii) To further enhance the performance in low signal-to-noise ratio (SNR) regions, we incorporate a new mechanism of adaptively rectified supervision (ARS) into the proposed 2.5D framework at training stage. Specifically, both pixel-wise reweighted dice loss and image-wise shape regularization loss are applied to improve the sensitivity and the specificity of segmentation. The experiment results on two representative and challenging datasets of 3DUS show that the proposed ARS-Net outperforms state-of-the-art methods with higher accuracy but lower complexity. The proposed novel network is robust to small datasets and can provide an accurate and fast volume segmentation tool for 3DUS.

You have full access to this open access chapter, Download conference paper PDF

Ultrasound segmentation analysis via distinct and completed anatomical borders

Article Open access 25 May 2024

Segmentation of Intra-operative Ultrasound Using Self-supervised Learning Based 3D-ResUnet Model with Deep Supervision

Scribble-Based 3D Multiple Abdominal Organ Segmentation via Triple-Branch Multi-Dilated Network with Pixel- and Class-Wise Consistency

Keywords

1 Introduction

Ultrasound is the most commonly used imaging modality in prenatal examinations and disease diagnosis due to several advantages such as low cost, non-invasion, real-time imaging and free of radiation. Recently, 3DUS has been used in many clinical applications, for instance, fetal intracranial volume segmentation for assessing brain development [1] and prostate segmentation in transrectal ultrasound (TRUS) for diagnosis of prostate cancer [2]. However, image segmentation for 3DUS is challenging due to acoustic shadow, speckle noise and low tissue contrast which may cause missing boundaries and structures [3], leading to inaccurate results.

With the development of 3D Convolutional Neural Networks (CNNs), some 3D models have been proposed to solve the problem of image segmentation in 3DUS [4, 5]. Although 3D CNNs can generate plausible results by combining contextual and spatial information, there are still some critical limitations in real clinical use for ultrasound. Firstly, lack of pre-trained 3D models makes training process require much more volume data which is usually difficult to acquire. Secondly, in order to reduce memory consumption, 3D CNNs often have to reduce batch size or divide volume into several cubes which may decrease the robustness of model. Thirdly, the deployment and application of 3D CNNs in many ultrasound imaging systems are unpractical due to limited computational units and memory (e.g., no GPU).

On the other hand, 2D network generally has less computational complexity and more options of pre-trained model. Moreover, sharing the same merits, 2.5D network is able to adopt more contextual and spatial information which could be ideal for volume segmentation. Wang et al. [6] segmented lung nodule (CT) based on voxel-by-voxel classification with three orthogonal image patches as input. Motazi et al. [7] applied 2D FCN to segment left atrium and proximal pulmonary (MRI) in cross sectional image along axial, sagittal and coronal direction respectively and further fused the 2D segmentation results into final 3D output. Both [6] and [7] require three individual CNNs to extract features or segment objects which may result in parameter redundancy and inefficiency of deployment.

In this paper, we propose an efficient and universal method for single organ segmentation in 3DUS. Our contributions are twofold. (i) A novel 2.5D volume segmentation framework is proposed which can achieve high accuracy but low complexity. We believe this is the first and successful attempt to employ 2.5D end-to-end segmentation network for this problem. (ii) To further improve the performance in low SNR regions, we incorporate a new mechanism of adaptively rectified supervision (ARS) at training stage. Specifically, a pixel-wise reweighted dice (PRD) loss is calculated to improve the sensitivity of segmentation; image-wise shape regularization (ISR) loss is calculated to provide domain knowledge of shape for more plausible results. The proposed method is extensively evaluated on two different tasks: fetal intracranial volume segmentation (132 volumes) and TRUS prostate volume segmentation (18 volumes).

2 Methods

Figure 1 illustrates the proposed 2.5D segmentation framework incorporated with ARS which employs pixel-wise reweighted dice loss and image-wise shape regularization loss at training stage to improve the performance of the segmentation network.

2.1 Data Preprocessing

The task of 3D segmentation is converted to 2.5D by radially resampling an input volume into multiple planes (coronal) together with their orthogonal planes (sagittal) and 45° diagonal planes. It is worth noting that for 2.5D input, 45° diagonal plane is adopted rather than axial plane because of two reasons: (i) all the target planes share the same axial plane in radial sampling which cannot provide additional contextual and spatial information; (ii) Empirically, 45° diagonal plane in 3DUS is often less sensitive to acoustic shadow. Besides, all the 2D planes are sampled radially along with an automatically detected central axis by using STN [8] in order to reduce the variability of image positioning.

2.2 Network Architecture

We choose 2D FCN [9] as our basic segmentation network but any other 2D segmentation networks are also applicable, such as U-Net [13]. Three modifications of the basic segmentation network are needed. Firstly, a pre-trained Vgg-16 [10] is used as the backbone architecture. Secondly, PRD loss is calculated by adaptive weight map (AWM) to improve the performance in low SNR regions. AWM is generated by a specific attention mechanism based on: (i) ground truth segmentation, (ii) probability map of predicted segmentation from last epoch, and (iii) AWM from last epoch. Thirdly, ISR loss is adopted to avoid shape distortion by adding an auxiliary module of discriminator network (DN) which is similar to [11] but the backbone architecture is replaced by the same pre-trained Vgg-16 [10] as mentioned above. The proposed mechanism of ARS consists of PRD loss and ISR loss.

2.3 Adaptively Rectified Supervision

There are already many powerful CNN-based segmentation methods for natural images. However, segmentation for 3DUS is challenging due to ambiguous or missing boundaries (low SNR regions) which often lead to irregular and erroneous segmentation results. To overcome this issue, we introduce a new ARS loss function to replace the regular dice loss function of FCN. Specifically, PRD loss $ L_{prd} $ and ISR loss $ L_{isr} $ are combined to jointly supervise the generation of segmentation probability map. The ARS loss function is defined as:

$$ L_{ars} \left( {X,W,G,Y} \right) = w_{prd} L_{prd} \left( {X,W,G} \right) + w_{isr} L_{isr} \left( {X,G,Y} \right) $$

(1)

where $ X $ is an input image, $ G $ and $ Y $ denote the corresponding ground truth of pixel-wise label (segmentation) and image-wise shape authenticity (real or fake) respectively, $ W $ is the corresponding AWM, $ w_{prd} $ and $ w_{isr} $ are the weights of $ L_{prd} $ and $ L_{isr} $. The details of $ L_{prd} $ and $ L_{isr} $ will be explained in the following paragraphs.

Pixel-Wise Reweighted Dice Loss.

Since the ratio between low SNR regions and whole volume is usually low (<10%), most segmentation networks are not sensitive to those regions as they may suffer from overfitting problems. Inspired by focal loss [12], PRD loss is designed to solve the issue of imbalanced image distribution. Specifically, based on AWM (please refer to Sect. 2.4 for details), the dice loss function is recalculated as PRD loss which can be defined as:

$$ L_{prd} \left( {X,W,G} \right) = \frac{{2\mathop \sum \nolimits_{j} (W_{j,k} *G_{j} )*\left( {W_{j,k} *P_{j,k} } \right)}}{{\mathop \sum \nolimits_{j} (W_{j,k} *G_{j} ) + \mathop \sum \nolimits_{j} \left( {W_{j,k} *P_{j,k} } \right)}} $$

(2)

where $ G_{j} \in \left\{ {0,1} \right\} $ is the ground truth label at location $ j $. $ W_{j,k} \in \left( {0,1} \right) $ and $ P_{j,k} = \frac{1}{{1 + e^{{ - z_{j,k} }} }} \in \left( {0,1} \right) $ represent the value of AWM and the probability of predicted segmentation at location $ j $ in epoch $ k $, respectively. $ z $ denotes the output of the last convolutional layer in FCN.

Image-wise Shape Regularization Loss.

To further improve the specificity of segmentation, the discriminator network mentioned in Sect. 2.2 is applied to calculate ISR loss which can help generate robust results with plausible shapes. A training image and its output prediction from FCN are used as inputs for the discriminator network to identify real or fake shape based on binary cross entropy. The ISR loss function is defined as:

$$ L_{isr} \left( {X,G,Y} \right) = Y\log P + \left( {1 - Y} \right)\log \left( {1 - P} \right) $$

(3)

where $ {\text{Y}} \in \left\{ {0,1} \right\} $ and $ {\text{P}} \in \left( {0,1} \right) $ denote the ground truth label and the prediction of real or fake shape, respectively.

2.4 Adaptively Weight Map

AWM is generated based on a specific attention mechanism which adaptively decreases the weights in regions of high accuracy while maintains the weights in regions of low accuracy (low SNR regions). In detail, AWM $ W_{j,k} $ is iteratively updated by ground truth segmentation $ G_{j} $, probability map of predicted segmentation from last epoch $ P_{j,k - 1} $ and AWM from last epoch $ W_{j,k - 1} $. AWM is defined as:

$$ W_{j,k} = \left\{ {\begin{array}{*{20}l} 1 \hfill & {if\; k = 1,} \hfill \\ {\left( {1 - E_{j,k - 1} } \right) + \alpha E_{j,k - 1} W_{j,k - 1} } \hfill & {otherwise.} \hfill \\ \end{array} } \right. $$

(4)

$$ E_{j,k - 1} = \left\{ {\begin{array}{*{20}l} {P_{j,k - 1} } \hfill & {if\;G_{j} = 1,} \hfill \\ {1 - P_{j,k - 1} } \hfill & {otherwise.} \hfill \\ \end{array} } \right. $$

(5)

The modulation factor $ \upalpha $ is set to 0.8 by empirical. When $ E_{j,k - 1} \to 1 $, prediction $ P_{j,k - 1} $ is accordant with ground truth $ G_{j} $ which means it is well segmented at the pixel location of $ j $. Similarly, when $ E_{j,k - 1} \to 0 $, it means it is erroneously segmented at the pixel location of $ j $. Furthermore, when $ E_{j,k - 1} \to 1 $, $ W_{j,k} \to \alpha E_{j,k - 1} W_{j,k - 1} < W_{j,k - 1} $ which implies the weight for PRD loss is lower in well segmented regions. On the other hand, when $ E_{j,k - 1} \to 0 $, $ W_{j,k} \to \left( {1 - E_{j,k - 1} } \right) \approx 1 $ which implies the weight for PRD loss is higher in poorly segmented regions. Based on the above adaptively reweighting mechanism, the regions of low accuracy contribute more in the calculation of the dice loss function and vice versa.

2.5 Postprocessing for Final Result

For the final output of volume segmentation, the results of 2D segmentation are further combined and reconstructed by cubic-spline interpolation. Additional 3D Gaussian filter can be also applied to smooth the volume segmentation which can reduce the discontinuity of multi-plane reconstruction.

3 Experiments

Materials.

Experiments were carried on two representative and challenging datasets of 3DUS using DC-8 and Resona 7 Ultrasound Imaging System (Mindray, Shenzhen, China). The first dataset consists of 132 volumes (94 for training, 38 for testing) of fetal brain with gestational age (GA) ranged from 20 to 32 weeks which were scanned by curved array volume probe. The second dataset is made up of 18 TRUS prostate volumes (10 for training, 8 for testing) which were scanned by endocavity volume probe. These two types of probe are most commonly used for volume analysis in 3DUS. For data preprocessing, all images were standardized by resampling in the same resolution of 0.5 × 0.5 × 0.5 mm and all the 2.5D images were resized to 224 × 224.

Implementation Details.

Our proposed network was implemented with the popular library Keras for Tensorflow and both training and testing were performed on a 16G NVidia V100 GPU. Each volume was radially resampled into 60 planes (each plane combined with its orthogonal and 45° views as 3-channel input for ARS-Net) which significantly increase the number of training samples, thus solving the overfitting problem due to limited volume data. We further adopted data augmentation (flipping, cropping, rotating, and translation) at training stage. An initial FCN was pre-trained with PRD loss while batch size = 32 and learning rate = le-3 (decreased iteratively by a factor of 0.95 for every epoch). After that, the pre-trained 2.5D FCN together with a DN were further trained alternately. In every epoch, DN (ISR loss only with lr = 1e-4) was trained with 3 batches followed by 2.5D FCN (PRD + ISR loss with lr = 1e-5) trained with 1 batch. The optimizer was Adam with momentum set as 0.9, furthermore $ L_{prd} $ and $ L_{isr} $ were equally weighted. The total training time was about 6–7 h.

Segmentation Performance.

We compared the proposed method with several advanced methods, including 3D FCN [4], 2D FCN [9], U-Net [13], DAF [2] and a two-stage framework of FCN + LSTM [14]. We also demonstrate the efficacy of the proposed method with different loss functions including PRD alone and PRD + ISR. The evaluation metrics included Dice Similarity Coefficient (DSC), Hausdorff distance (HD, in mm), Conformity Coefficient (CC) and Jaccard Index.

Figure 2 shows the results of fetal intracranial volume segmentation with different methods and loss functions. With PRD loss, the proposed ARS-Net is robust to blurry boundaries and low SNR regions while ISR loss is able to correct the results of irregular shapes. PRD loss can overall improve the sensitivity of segmentation while ISR loss is able to further gain the specificity.

Table 1 lists the quantitative comparison results of fetal intracranial volume segmentation. The proposed ARS-Net (PRD + ISR) shows DSC improvement of 3.01% and 9.37% comparing to 2D and 3D FCN respectively and achieves comparable accuracy with FCN + LSTM but 9 times faster as shown in Table 3. Also, ARS-Net reaches the lowest mean Hausdorff distance (1.31 mm) while bi-parietal diameter (BPD) of normal fetus (GA 20-32w) ranges from 46 to 80 mm which implies the accuracy of the proposed method is acceptable for clinical use. Table 2 lists the quantitative comparison results of TRUS prostate volume segmentation. The proposed method is slightly more accurate than FCN + LSTM [14] and DAF [2] but it is significantly faster and smaller as shown in Table 3. It is worth noting that the accuracy of 3D FCN is lower than 2D FCN mainly because of limited training samples, small batch size and lack of suitable pre-trained models. The proposed method shows advantages in accuracy, speed, model size and memory occupation and it can be an ideal solution for deployment in most ultrasound imaging systems.

Table 1. Quantitative comparison of fetal intracranial volume segmentation

Full size table

Table 2. Quantitative comparison of TRUS prostate volume segmentation

Full size table

Table 3. Runtifferent algorithms (volume size 224 × 224 × 224)

Full size table

4 Conclusion

In this paper, we propose an efficient 2.5D framework that enables single organ segmentation in 3DUS image with high accuracy but low complexity. To the best of our knowledge, we are the first to use 2.5D end-to-end segmentation network for this problem. In the proposed ARS-Net, a novel attention mechanism is introduced to reweight the dice loss function at each pixel which can improve the sensitivity of segmentation in low SNR regions. Furthermore, a discriminator network is used to constrain results into plausible shapes which can gain the specificity of segmentation. Given the additional modules at training stage, the complexity of ARS-Net is as low as a regular 2D FCN at inference stage. Compared to 3D FCN, the performance of ARS-Net is more robust with small datasets. The validation on fetal brain (132 volumes) and TRUS prostate (18 volumes) shows that ARS-Net achieves DSC of 97.64% and 95.30%, respectively. Our method can provide an accurate and fast volume segmentation tool for 3DUS and it also has the potential to be applied to other imaging modalities.

References

Namburete, A.I.L., Xie, W., Yaqub, M., Zisserman, A., Nobel, J.A.: Fully-automated alignment of 3D fetal brain ultrasound to canonical reference space using multi-task learning. Med. Image Anal. 46, 1–14 (2018)
Article Google Scholar
Wang, Y., et al.: Deep attentional features for prostate segmentation in ultrasound. In: Frangi, A.F., Schnabel, J.A., Davatzikos, C., Alberola-López, C., Fichtinger, G. (eds.) MICCAI 2018. LNCS, vol. 11073, pp. 523–530. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00937-3_60
Chapter Google Scholar
Nobel, J.A., Boukerroui, D.: Ultrasound image segmentation: a survey. IEEE Trans. Med. Imaging 25(8), 987–1010 (2006)
Article Google Scholar
Yang, X., et al.: Towards automated semantic segmentation in prenatal volumetric ultrasound. IEEE Trans. Med. Imaging 38(1), 180–193 (2018)
Article Google Scholar
Degel, M.A., Navab, N., Albarqouni, S.: Domain and geometry agnostic CNNs for left atrium segmentation in 3D ultrasound. In: Frangi, A.F., Schnabel, J.A., Davatzikos, C., Alberola-López, C., Fichtinger, G. (eds.) MICCAI 2018. LNCS, vol. 11073, pp. 630–637. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00937-3_72
Chapter Google Scholar
Wang, S., et al.: A multi-view deep convolutional neural networks for lung nodule segmentation. In: EMBC (2017)
Google Scholar
Mortazi, A., Karim, R., Rhode, K., Burt, J., Bagci, U.: CardiacNET: Segmentation of Left Atrium and Proximal Pulmonary Veins from MRI Using Multi-view CNN. In: Descoteaux, M., Maier-Hein, L., Franz, A., Jannin, P., Collins, D.L., Duchesne, S. (eds.) MICCAI 2017. LNCS, vol. 10434, pp. 377–385. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66185-8_43
Chapter Google Scholar
Jaderberg, M., Simonyan, K., Zisserman, A., Kavukcuoglu K.: Spatial transform networks. In: NIPS (2015)
Google Scholar
Long, J., Shelhamer, E., Darrell T.: Fully convolutional networks for semantic segmentation. In: CVPR (2015)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolution networks for large-scale image recognition. In: ICLR (2015)
Google Scholar
Yang, D., et al.: Automatic liver segmentation using an adversarial image-to-image network. In: Descoteaux, M., Maier-Hein, L., Franz, A., Jannin, P., Collins, D.L., Duchesne, S. (eds.) MICCAI 2017. LNCS, vol. 10435, pp. 507–515. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66179-7_58
Chapter Google Scholar
Lin, T., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: CVPR (2018)
Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Chen, J., Yang, L., Zhang, Y., Alber, M., Chen, D.Z.: Combining fully convolutional and recurrent neural networks for 3d biomedical image segmentation. In: NIPS (2015)
Google Scholar

Download references

Acknowledgments

This work was supported by the National Key R&D Program of China [2016YFC0104700].

Author information

Authors and Affiliations

Shenzhen Mindray Bio-Medical Electronics, Co., Ltd., Shenzhen, China
Chaoyue Liu, Guohao Dong, Muqing Lin, Yaoxian Zou, Tianzhu Liang, Xujin He, Zhijie Chen & Lei Zhu
National-Regional Key Technology Engineering Laboratory for Medical Ultrasound, School of Biomedical Engineering, Health Science Center, Shenzhen University, Shenzhen, China
Dong Ni
Third Affiliated Hospital of Shenzhen University, Shenzhen Luohu People’s Hospital, Shenzhen, China
Yi Xiong

Authors

Chaoyue Liu
View author publications
You can also search for this author in PubMed Google Scholar
Guohao Dong
View author publications
You can also search for this author in PubMed Google Scholar
Muqing Lin
View author publications
You can also search for this author in PubMed Google Scholar
Yaoxian Zou
View author publications
You can also search for this author in PubMed Google Scholar
Tianzhu Liang
View author publications
You can also search for this author in PubMed Google Scholar
Xujin He
View author publications
You can also search for this author in PubMed Google Scholar
Zhijie Chen
View author publications
You can also search for this author in PubMed Google Scholar
Dong Ni
View author publications
You can also search for this author in PubMed Google Scholar
Yi Xiong
View author publications
You can also search for this author in PubMed Google Scholar
Lei Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lei Zhu .

Editor information

Editors and Affiliations

University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Dinggang Shen
University of Georgia, Athens, GA, USA
Tianming Liu
Western University, London, ON, Canada
Terry M. Peters
Yale University, New Haven, CT, USA
Lawrence H. Staib
University of Strasbourg, Illkirch, France
Caroline Essert
United Imaging Intelligence, Shanghai, China
Sean Zhou
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Pew-Thian Yap
Western University, London, ON, Canada
Ali Khan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, C. et al. (2019). ARS-Net: Adaptively Rectified Supervision Network for Automated 3D Ultrasound Image Segmentation. In: Shen, D., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2019. MICCAI 2019. Lecture Notes in Computer Science(), vol 11766. Springer, Cham. https://doi.org/10.1007/978-3-030-32248-9_42

Download citation

DOI: https://doi.org/10.1007/978-3-030-32248-9_42
Published: 10 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32247-2
Online ISBN: 978-3-030-32248-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)

ARS-Net: Adaptively Rectified Supervision Network for Automated 3D Ultrasound Image Segmentation

Abstract

Similar content being viewed by others

Ultrasound segmentation analysis via distinct and completed anatomical borders

Segmentation of Intra-operative Ultrasound Using Self-supervised Learning Based 3D-ResUnet Model with Deep Supervision

Scribble-Based 3D Multiple Abdominal Organ Segmentation via Triple-Branch Multi-Dilated Network with Pixel- and Class-Wise Consistency

Keywords

1 Introduction