Abstract
Multi-modal MRI has become a valuable tool in medical imaging for diagnosing and investigating brain tumors, as it provides complementary information from multiple modalities. However, traditional methods for multi-modal MRI segmentation using UNet architecture typically fuse the modalities at an early or mid-stage of the network, without considering the inter-modal feature fusion or dependencies. To address this, a novel CMMFNet (cross-modal multi-scale fusion network) is proposed in this work, which explores both intra-modality and inter-modality relationships in brain tumor segmentation. The network is built on a transformer-based multi-encoder and single-decoder structure, which performs nested multi-modal fusion for high-level representations of different modalities. Additionally, the proposed CMMFNet uses a focusing mechanism that extracts larger receptive fields more effectively at the low-level scale and connects them to the decoding layer effectively. The multi-modal feature fusion module nests modality-aware feature aggregation, and the multi-modal features are better fused through long-term dependencies within each modality in the self-attention and cross-attention layers. The experiments showed that our CMMFNet outperformed state-of-the-art methods on the BraTS2020 benchmark dataset in brain tumor segmentation.
Similar content being viewed by others
References
Goldman, L.W.: Principles of ct and ct technology. J. Nucl. Med. Technol. 35(3), 115–128 (2007)
Plewes, D.B., Kucharczyk, W.: Physics of mri: a primer. J. Magn. Reson. Imaging 35(5), 1038–1054 (2012)
Ulku, I., Akagündüz, E.: A survey on deep learning-based architectures for semantic segmentation on 2d images. Appl. Artif. Intell. 36(1), 2032924 (2022)
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv:2010.11929 (arXiv preprint) (2020)
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł, Polosukhin, I.: Attention is all you need. Adv. Neural. Inf. Process. Syst. 30, 25 (2017)
Shafiq, M., Gu, Z.: Deep residual learning for image recognition: a survey. Appl. Sci. 12(18), 8972 (2022)
Zou, Z., Chen, K., Shi, Z., Guo, Y., Ye, J.: Object detection in 20 years: a survey. In: Proceedings of the IEEE (2023)
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015: 18th International Conference, Munich, Germany, October 5–9, 2015, Proceedings, Part III 18. Springer, pp. 234–241 (2015)
Zhou, Z., Siddiquee, M.M.R., Tajbakhsh, N., Liang, J.: Unet++: redesigning skip connections to exploit multiscale features in image segmentation. IEEE Trans. Med. Imaging 39(6), 1856–1867 (2019)
Diakogiannis, F.I., Waldner, F., Caccetta, P., Wu, C.: Resunet-a: a deep learning framework for semantic segmentation of remotely sensed data. ISPRS J. Photogramm. Remote. Sens. 162, 94–114 (2020)
Kong, X., Sun, G., Wu, Q., Liu, J., Lin, F.: Hybrid pyramid u-net model for brain tumor segmentation. In: Intelligent Information Processing IX: 10th IFIP TC 12 International Conference, IIP 2018, Nanning, China, October 19–22, 2018, Proceedings 10. Springer, pp. 346–355 (2018)
Milletari, F., Navab, N., Ahmadi, S.-A.: V-net: fully convolutional neural networks for volumetric medical image segmentation. In: 2016 Fourth International Conference on 3D Vision (3DV). Ieee, pp. 565–571 (2016)
Dong, F., Wu, D., Guo, C., Zhang, S., Yang, B., Gong, X.: Craunet: a cascaded residual attention u-net for retinal vessel segmentation. Comput. Biol. Med. 147, 105651 (2022)
Hatamizadeh, A., Nath, V., Tang, Y., Yang, D., Roth, H.R., Xu, D.: Swin unetr: Swin transformers for semantic segmentation of brain tumors in MRI images. In: Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries: 7th International Workshop, BrainLes 2021, Held in Conjunction with MICCAI 2021, Virtual Event, September 27, 2021, Revised Selected Papers, Part I. Springer, pp. 272–284 (2022)
Shaker, A., Maaz, M., Rasheed, H., Khan, S., Yang, M.-H., Khan, F.S.: Unetr++: delving into efficient and accurate 3d medical image segmentation. arXiv:2212.04497 (arXiv preprint) (2022)
Li, J., Wang, W., Chen, C., Zhang, T., Zha, S., Wang, J., Yu, H.: Transbtsv2: towards better and more efficient volumetric segmentation of medical images. arXiv (2022)
Lin, X., Yan, Z., Yu, L., Cheng, K.-T.: C2ftrans: coarse-to-fine transformers for medical image segmentation. arXiv:2206.14409 (arXiv preprint) (2022)
Tragakis, A., Kaul, C., Murray-Smith, R., Husmeier, D.: The fully convolutional transformer for medical image segmentation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 3660–3669 (2023)
Hatamizadeh, A., Tang, Y., Nath, V., Yang, D., Myronenko, A., Landman, B., Roth, H.R., Xu, D.: Unetr: Transformers for 3d medical image segmentation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 574–584 (2022)
Lee, H.H., Bao, S., Huo, Y., Landman, B.A.: 3d ux-net: a large kernel volumetric convnet modernizing hierarchical transformer for medical image segmentation. arXiv:2209.15076 (arXiv preprint) (2022)
Roy, S., Koehler, G., Ulrich, C., Baumgartner, M., Petersen, J., Isensee, F., Jaeger, P.F., Maier-Hein, K.: Mednext: transformer-driven scaling of convnets for medical image segmentation. arXiv:2303.09975 (arXiv preprint) (2023)
Qin, C., Zhang, A., Zhang, Z., Chen, J., Yasunaga, M., Yang, D.: Is chatgpt a general-purpose natural language processing task solver? arXiv:2302.06476 (arXiv preprint) (2023)
Karamcheti, S., Nair, S., Chen, A.S., Kollar, T., Finn, C., Sadigh, D., Liang, P.: Language-driven representation learning for robotics. arXiv:2302.12766 (arXiv preprint) (2023)
Liu, H., Huang, R., Lin, X., Xu, W., Zheng, M., Chen, H., He, J., Zhao, Z.: Vit-tts: visual text-to-speech with scalable diffusion transformer. arXiv:2305.12708 (arXiv preprint) (2023)
Chen, C., Dou, Q., Jin, Y., Liu, Q., Heng, P.A.: Learning with privileged multimodal knowledge for unimodal segmentation. IEEE Trans. Med. Imaging 41(3), 621–632 (2021)
Zhou, T., Canu, S., Vera, P., Ruan, S.: Latent correlation representation learning for brain tumor segmentation with missing MRI modalities. IEEE Trans. Image Process. 30, 4263–4274 (2021)
Zhou, T., Canu, S., Vera, P., Ruan, S.: 3d medical multi-modal segmentation network guided by multi-source correlation constraint. In: 2020 25th International Conference on Pattern Recognition (ICPR). IEEE, pp. 10243–10250 (2021)
Zhang, Y., He, N., Yang, J., Li, Y., Wei, D., Huang, Y., Zhang, Y., He, Z., Zheng, Y.: mmformer: Multimodal medical transformer for incomplete multimodal learning of brain tumor segmentation. In: Medical Image Computing and Computer Assisted Intervention–MICCAI 2022: 25th International Conference, Singapore, September 18–22, 2022, Proceedings, Part V. Springer, pp. 107–117 (2022)
Fidon, L., Ourselin, S., Vercauteren, T.: Generalized wasserstein dice score, distributionally robust deep learning, and ranger for brain tumor segmentation: Brats 2020 challenge. In: Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries: 6th International Workshop, BrainLes 2020, Held in Conjunction with MICCAI 2020, Lima, Peru, October 4, 2020, Revised Selected Papers, Part II 6. Springer, pp. 200–214 (2021)
Imambi, S., Prakash, K.B., Kanagachidambaresan, G.: Pytorch. Programming with TensorFlow: Solution for Edge Computing Applications, pp. 87–104 (2021)
Cardoso, M.J., Li, W., Brown, R., Ma, N., Kerfoot, E., Wang, Y., Murrey, B., Myronenko, A., Zhao, C., Yang, D., et al.: Monai: an open-source framework for deep learning in healthcare. arXiv:2211.02701 (arXiv preprint) (2022)
Kumar, S.K.: On weight initialization in deep neural networks. arXiv:1704.08863 (arXiv preprint) (2017)
Galdran, A., Carneiro, G., Ballester, M.A.G.: On the optimal combination of cross-entropy and soft dice losses for lesion segmentation with out-of-distribution robustness. In: Diabetic Foot Ulcers Grand Challenge: Third Challenge, DFUC 2022, Held in Conjunction with MICCAI 2022, Singapore, September 22, 2022, Proceedings. Springer, pp. 40–51 (2023)
Bertels, J., Eelbode, T., Berman, M., Vandermeulen, D., Maes, F., Bisschops, R., Blaschko, M.B.: Optimizing the dice score and jaccard index for medical image segmentation: Theory and practice. In: Medical Image Computing and Computer Assisted Intervention–MICCAI 2019: 22nd International Conference, Shenzhen, China, October 13–17, 2019, Proceedings, Part II 22. Springer, pp. 92–100 (2019)
Fedorov, A., Billet, E., Prastawa, M., Gerig, G., Radmanesh, A., Warfield, S.K., Kikinis, R., Chrisochoides, N.: Evaluation of brain MRI alignment with the robust Hausdorff distance measures. In: Advances in Visual Computing: 4th International Symposium, ISVC 2008, Las Vegas, NV, USA, December 1-3, 2008. Proceedings, Part I 4. Springer, pp. 594–603 (2008)
Çiçek, Ö., Abdulkadir, A., Lienkamp, S.S., Brox, T., Ronneberger, O.: 3d u-net: learning dense volumetric segmentation from sparse annotation. In: Medical Image Computing and Computer-Assisted Intervention–MICCAI 2016: 19th International Conference, Athens, Greece, October 17–21, 2016, Proceedings, Part II 19. Springer, pp. 424–432 (2016)
Myronenko, A., Siddiquee, M.M.R., Yang, D., He, Y., Xu, D.: Automated head and neck tumor segmentation from 3d pet/ct. arXiv:2209.10809 (arXiv preprint) (2022)
Zhang, Y., Yang, J., Tian, J., Shi, Z., Zhong, C., Zhang, Y., He, Z.: Modality-aware mutual learning for multi-modal medical image segmentation. In: Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part I 24. Springer, pp. 589–599 (2021)
Isensee, F., Jaeger, P.F., Kohl, S.A., Petersen, J., Maier-Hein, K.H.: nnu-net: a self-configuring method for deep learning-based biomedical image segmentation. Nat. Methods 18(2), 203–211 (2021)
Wang, W., Chen, C., Ding, M., Yu, H., Zha, S., Li, J.: Transbts: multimodal brain tumor segmentation using transformer. In: Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part I 24. Springer, pp. 109–119 (2021)
Xing, Z., Yu, L., Wan, L., Han, T., Zhu, L.: Nestedformer: nested modality-aware transformer for brain tumor segmentation. In: Medical Image Computing and Computer Assisted Intervention–MICCAI 2022: 25th International Conference, Singapore, September 18–22, 2022, Proceedings, Part V. Springer, pp. 140–150 (2022)
Acknowledgements
The authors gratefully acknowledge the support from the National Natural Science Foundation of China under Grant numbers 62272342, 62020106004, and 92048301.
Author information
Authors and Affiliations
Corresponding authors
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zheng, J., Shi, F., Zhao, M. et al. Learning intra-inter-modality complementary for brain tumor segmentation. Multimedia Systems 29, 3771–3780 (2023). https://doi.org/10.1007/s00530-023-01138-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00530-023-01138-2