Abstract
Convolutional neural networks (CNNs) have been a prevailing technique in the field of medical CT image processing. Although encoder-decoder CNNs exploit locality for efficiency, they cannot adequately model remote pixel relationships. Recent works prove it possible to stack self-attention or transformer layers to effectively learn long-range dependencies. Transformers have been extended to computer vision tasks by creating and treating image patches as embeddings. However, transformer-based architectures lack global semantic information interaction and require large-scale dataset for training, making it difficult to effectively train with limited data samples. To address these issues, we propose a hierarchical context-attention transformer network (HT-Net), which integrates the multi-scale, transformer and hierarchical context extraction modules in skip-connections. The multi-scale module captures richer CT semantic information, enabling transformers to better encode feature maps of tokenized image patches from different stages of CNN as input attention sequences.The hierarchical context attention module complements global information and re-weights the pixels to capture semantic context. Extensive experiments on three datasets demonstrate that the proposed HT-Net outperforms state-of-the-art approaches.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Liskowski P, Krawiec K (2016) Segmenting retinal blood vessels with deep neural networks. IEEE Trans Med Imaging 35(11):2369–2380
Ben Abdallah M, Azar A, Guedri H, et al. (2018) Noise-estimation-based anisotropic diffusion approach for retinal blood vessel segmentation. Neural Comput Appl 29:159–180
Tong H, Fang Z, Wei Z, et al. (2021) SAT-Net: a side attention network for retinal image segmentation. Appl Intell 51: 5146–5156
Deniz C M, Xiang S, Hallyburton R S, Welbeck A, Babb J S, Honig S, Cho K, Chang G (2018) Segmentation of the proximal femur from mr images using deep convolutional neural networks. Sci Rep 8(1):1–14
Fan DP, Ji GP, Zhou T, Chen G, Fu H, Shen J, Shao L (2020) Pranet: Parallel reverse attention network for polyp segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, pp 263–273
Zhou Z, Siddiquee M M R, Tajbakhsh N, Liang J (2018) UNEt++: A Nested U-Net Architecture for Medical Image Segmentation. In: 4th Deep Learning in Medical Image Analysis, DLMIA, Workshop, Granada, DLMIA 2018, LNCS 11045, pp 3–11
Khened M, Kollerathu V A, Krishnamurthi G (2019) Fully convolutional multi-scale residual densenets for cardiac segmentation and automated cardiac diagnosis using ensemble of classifiers. Med Image Anal 51:21–45
Pitchai R, Madhu Babu C, Supraja P, et al. (2020) Cerebrum tumor segmentation of high resolution magnetic resonance images using 2D-Convolutional network with skull stripping. Neural Process Lett 53:2567–2580
Pereira S, Pinto A, Alves V, Silva C A (2016) Brain tumor segmentation using convolutional neural networks in MRI images. IEEE Trans Med Imaging 35(5):1240–1251
Pitchai R, Supraja P, Victoria A H, et al. (2020) Brain tumor segmentation using deep learning and fuzzy K-Means clustering for magnetic resonance images. Neural Process Lett 53:2519–2532
Zhao X, Ji J, Wang X (2019) Dynamic brain functional parcellation via sliding window and artificial bee colony algorithm. Appl Intell 49:1748–1770
Soliman A, et al. (2017) Accurate lungs segmentation on CT chest images by adaptive Appearance-Guided shape modeling. IEEE Trans Med Imaging 36(1):263–276
Song J, et al. (2016) Lung lesion extraction using a toboggan based growing automatic segmentation approach. IEEE Trans Med Imaging 35(1):337–353
Jiang J, et al. (2019) Multiple resolution residually connected feature streams for automatic lung tumor segmentation from CT images. IEEE Trans Med Imaging 38(1):134–144
Zhao B, Chen X, Li Z, Yu Z, Yao S, Yan L, Wang Y, Liu Z, Liang C, Han C (2020) Triple U-net: Hematoxylin-aware nuclei segmentation with progressive dense feature aggregation. Med Image Anal 65:101786
Wang Y, Ye H, Cao F (2021) A novel multi-discriminator deep network for image segmentation. Appl Intell. https://doi.org/10.1007/s10489-021-02427-x
Li X, Chen H, Qi X, Dou Q, Fu C W, Heng P A (2018) H-denseunet: hybrid densely connected unet for liver and tumor segmentation from ct volumes. IEEE Trans Med Imaging 37(12):2663–2674
Esteva A, Kuprel B, Novoa R A, Ko J, Swetter S M, Blau H M, et al. (2017) Dermatologist-level classification of skin cancer with deep neural networks. Nature 542(7639):115–118
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp 3431–3440
Ronneberger O, Fischer P, Brox TN (2015) Convolutional networks for biomedical image segmentation. In: Paper presented at international conference on medical image computing and computer-assisted intervention (ICCV). Springer, pp 234– 241
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 770–778
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Uszkoreit J, Houlsby N (2021) An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations, ICLR, arXiv:2010.11929
Touvron H, Cord M, Douze M, Massa F, Sablayrolles A, J egou H (2020) Training data-efficient image transformers & distillation through attention. arXiv:2012.12877
Wang H, Zhu Y, Green B, Adam H, Yuille A, Chen LC (2020) Axial-deeplab: Stand-alone axial-attention for panoptic segmentation. In: ECCV, vol 12349. Springer. https://doi.org/10.1007/978-3-030-58548-8_7
Zheng S, Lu J, Zhao H, Zhu X, Luo Z, Wang Y, Fu Y, Feng J, Xiang T, Torr PH et al (2021) Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR
Zhou B, Zhao H, Puig X, Fidler S, Barriuso A, Torralba A (2016) Semantic understanding of scenes through the ade20k dataset. Int J Comput Vis (IJCV) 127(3):302–321
Chen L, Papandreou G, Kokkinos I, Murphy K, Yuille A L (2018) Deeplab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848
Milletari F, Navab N, Ahmadi SA (2016) V-Net: Fully convolutional neural networks for volumetric medical image segmentation. In: 2016 Fourth International Conference on 3D Vision (3DV). IEEE, pp 565–571
Schlemper J, Oktay O, Schaap M, Heinrich M, Kainz B, Glocker B, et al. (2019) Attention gated networks: Learning to leverage salient regions in medical images. Med Image Anal 53:197–207
Alom M Z, Yakopcic C, Taha T M, Asari V K (2018) Nuclei Segmentation with Recurrent Residual Convolutional Neural Networks based U-Net (R2U-Net). NAECON 2018 - IEEE National Aerospace and Electronics Conference, pp 228–233
Xiao X, Lian S, Luo Z, Li S (2018) Weighted Res-Unet for High-Quality Retina Vessel Segmentation. In: 2018 9th International Conference on Information Technology in Medicine and Education (ITME). IEEE, pp 327–331
Guan S, Khan A A, Sikdar S, Chitnis P V (2020) Fully dense unet for 2-D sparse photoacoustic tomography artifact removal. IEEE J Biomed Health Inf 24(2):568–576
Ibtehaz N, Rahman M S (2020) MultiresUNet: Rethinking the U-Net Architecture for Multimodal Biomedical Image Segmentation. Neural Netw 121:74–87
Szegedy C, et al (2015) Going deeper with convolutions. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1–9
He K, Gkioxari G, Dollr P, Girshick R (2017) Mask r-CNN. in IEEE international conference on computer vision (ICCV), Venice, pp 2980–2988
Gu Z, Cheng J, Fu H, Zhou K, Hao H, Zhao Y, et al. (2019) CE-Net: context encoder network for 2D medical image segmentation. IEEE Trans Med Imaging 38(10):2281–2292
Zhang J, Xie Y, Wang Y, Xia Y (2020) Inter-slice Context Residual Learning for 3D Medical Image Segmentation. In: IEEE Transactions on Medical Imaging(Early Access), pp 1–1
Devlin J, Chang M, Lee K, Toutanova K (2019) BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. NAACL-HLT arXiv:2103.05940
Dai Y, Gao Y (2021) TransMed: Transformers Advance Multi-modal Medical Image Classification. Diagnostics. https://doi.org/10.3390/diagnostics11081384
Chen J, Lu Y, Yu Q, Luo X, Adeli E, Wang Y, Lu L, Yuille A, Zhou Y (2021) TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation. arXiv:2102.04306
Valanarasu J M, Oza P, Hacihaliloglu I, Patel V (2021) Medical transformer: Gated Axial-Attention for medical image Segmentation.Medical image computing and computer assisted intervention, MICCAI. arXiv:2102.10662
Cao H, Wang Y, Chen J, Jiang D, Zhang X, Tian Q, Wang M (2021) Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation. arXiv:2105.05537
Hu J, Shen L, Albanie S, Sun G, Wu E (2020) Squeeze-and-excitation Networks. IEEE Trans Pattern Anal Mach Intell (TPAMI) 42(8):2011–2023. https://doi.org/10.1109/TPAMI.2019.2913372
Oktay O et al (2018) Attention U-Net: Learning Where to Look for the Pancreas. In: 1st Conference on Medical Imaging with Deep Learning (MIDL). arXiv:1804.03999
Chen L, Zhang H, Xiao J, Nie L, Shao J, Liu W, Chua T (2017) SCA-CNN: Spatial and channel-wise attention in convolutional networks for image captioning. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 6298–6306
Wang X, Han S, Chen Y, Gao D, Vasconcelos N (2019) Volumetric attention for 3D medical image segmentation and detection. In: Shen D et al (eds) Medical image computing and computer assisted intervention, MICCAI. Springer, Cham, p 11769
Acknowledgements
This work was supported in part by the National Natural Science Foundation of China under Grant 61762014, in part by the Science and Technology Project of Guangxi under Grant 2018GXNSFAA281351, and in part by the Innovation Project of Guangxi Graduate Education under Grant YCSW2021096.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Ma, M., Xia, H., Tan, Y. et al. HT-Net: hierarchical context-attention transformer network for medical ct image segmentation. Appl Intell 52, 10692–10705 (2022). https://doi.org/10.1007/s10489-021-03010-0
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-021-03010-0