MeFD-Net: multi-expert fusion diagnostic network for generating radiology image reports | Applied Intelligence Skip to main content
Log in

MeFD-Net: multi-expert fusion diagnostic network for generating radiology image reports

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Recently, the task of radiology image report generation has been highly favored by researchers. Research on this task not only alleviates the tedious work of radiologists but also enhances the healthcare standards in underdeveloped regions. The previous methods primarily followed the image captioning task, using an encoder-decoder architecture to forcibly align the visual and textual domains. However, they overlooked the cross-modal semantic gap between the visual and textual fields. Based on the multi-expert collaborative diagnosis model used in hospitals, we have developed a “multi-expert diagnostic” mechanism to bridge the gap between these modalities. To achieve this, we propose Multi expert Diagnostic Module(MeDM), whose key design involves introducing multiple learnable matrices to replace the expert’s brain for interactive learning between radiology images and their corresponding reports. Specifically, we interact each expert matrix with visual-textual features to capture abundant multimodal information. To ensure that different expert matrices focus on various feature information, they are constrained by an orthogonal loss. Additionally, we have designed a lightweight Diagnostic Fusion Module(DFM) to integrate and summarize the results from multiple expert matrices. The experimental results on two widely used datasets show that the proposed method leads in most metrics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data availability and access

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

References

  1. Rettenberger L, Schilling M, Elser S, Böhland M, Reischl M (2023) Self-supervised learning for annotation efficient biomedical image segmentation. IEEE Trans Biomed Eng 70(9):2519–2528

    Article  Google Scholar 

  2. Deng R, Liu Q, Cui C, Yao T, Long J, Asad Z, Womick RM, Zhu Z, Fogo AB, Zhao S, Yang H, Huo Y (2023) Omni-seg: A scale-aware dynamic network for renal pathological image segmentation. IEEE Trans Biomed Eng 70(9):2636–2644

    Article  Google Scholar 

  3. Xia S, Zhu H, Liu X, Gong M, Huang X, Xu L, Zhang H, Guo J (2020) Vessel segmentation of x-ray coronary angiographic image sequence. IEEE Trans Biomed Eng 67(5):1338–1348

    Article  Google Scholar 

  4. Guan H, Liu M (2022) Domain adaptation for medical image analysis: A survey. IEEE Trans Biomed Eng 69(3):1173–1185

    Article  Google Scholar 

  5. Li M, Liu R, Wang F, Chang X, Liang X (2023) Auxiliary signal-guided knowledge encoder-decoder for medical report generation. World Wide Web 26(1):253–270

    Article  Google Scholar 

  6. Wang Z, Tang M, Wang L, Li X, Zhou L (2022) A medical semantic-assisted transformer for radiographic report generation. In: International conference on medical image computing and computer-assisted intervention, Springer, pp 655–664

  7. Yang S, Wu X, Ge S, Zhou SK, Xiao L (2022) Knowledge matters: Chest radiology report generation with general and specific knowledge. Med Image Anal 80:102510

    Article  Google Scholar 

  8. Chen Z, Shen Y, Song Y, Wan X (2021) Cross-modal memory networks for radiology report generation. In: Proceedings of the joint conference of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing, pp 5904–5914

  9. Wang Z, Liu L, Wang L, Zhou L (2023) Metransformer: Radiology report generation by transformer with multiple learnable expert tokens. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 11558–11567

  10. Vinyals O, Toshev A, Bengio S, Erhan D (2015) Show and tell: A neural image caption generator. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3156–3164

  11. Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, Zemel R, Bengio Y (2015) Show, attend and tell: Neural image caption generation with visual attention. In: International conference on machine learning, PMLR, pp 2048–2057

  12. Zhu X, Wang W, Guo L, Liu J (2020) Autocaption: Image captioning with neural architecture search. arXiv preprint arXiv:2012.09742

  13. Lu J, Xiong C, Parikh D, Socher R (2017) Knowing when to look: Adaptive attention via a visual sentinel for image captioning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 375–383

  14. Pan Y, Yao T, Li Y, Mei T (2020) X-linear attention networks for image captioning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 10971–10980

  15. Cornia M, Stefanini M, Baraldi L, Cucchiara R (2020) Meshed-memory transformer for image captioning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 10578–10587

  16. Zhou Y, Wang M, Liu D, Hu Z, Zhang H (2020) More grounded image captioning by distilling image-text matching model. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4777–4786

  17. Sur C (2021) aitpr: attribute interaction-tensor product representation for image caption. Neural Process Lett 53(2):1229–1251

    Article  Google Scholar 

  18. Liu F, You C, Wu X, Ge S, Sun X et al (2021) Auto-encoding knowledge graph for unsupervised medical report generation. Adv Neural Inf Process Syst 34:16266–16279

    Google Scholar 

  19. Qin H, Song Y (2022) Reinforced cross-modal alignment for radiology report generation. In: Findings of the association for computational linguistics: ACL 2022:448–458

    Article  Google Scholar 

  20. Wang L, Ning M, Lu D, Wei D, Zheng Y, Chen J (2022) An inclusive task-aware framework for radiology report generation. In: International conference on medical image computing and computer-assisted intervention, Springer, pp 568–577

  21. Pan R, Ran R, Hu W, Zhang W, Qin Q, Cui S (2023) S3-net: A self-supervised dual-stream network for radiology report generation. IEEE J Biomed Health Inform pp 1–12

  22. Liu F, Yin C, Wu X, Ge S, Zhang P, Sun X (2021) Contrastive attention for automatic chest X-ray report generation. In: Zong C, Xia F, Li W, Navigli R (eds) Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. Online. Association for Computational Linguistics, pp 269–280

    Chapter  Google Scholar 

  23. Wang J, Bhalerao A, He Y (2022) Cross-modal prototype driven network for radiology report generation. In: european conference on computer vision, Springer, pp 563–579

  24. Wang Z, Han H, Wang L, Li X, Zhou L (2022) Automated radiographic report generation purely on transformer: A multicriteria supervised approach. IEEE Trans Med Imaging 41(10):2803–2813

    Article  Google Scholar 

  25. You D, Liu F, Ge S, Xie X, Zhang J, Wu X (2021) Aligntransformer: Hierarchical alignment of visual regions and disease tags for medical report generation. In: Medical image computing and computer assisted intervention–MICCAI 2021: 24th international conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part III 24, Springer, pp 72–82

  26. Li Y, Liang X, Hu Z, Xing EP (2018) Hybrid retrieval-generation reinforced agent for medical image report generation. Adv Neural Inform Process Syst 31

  27. Tanida T, Mller P, Kaissis G, Rueckert D (2023) Interactive and explainable region-guided radiology report generation. In: Proceedings of the ieee conference on computer vision and pattern recognition, pp 7433–7442

  28. Huang Z, Zhang X, Zhang S (2023) Kiut: Knowledge-injected u-transformer for radiology report generation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 19809–19818

  29. Chen Z, Song Y, Chang T-H, Wan X (2020) Generating radiology reports via memory-driven transformer. In: Proceedings of the 2020 conference on empirical methods in natural language processing, pp 1439–1449

  30. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  31. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Łukasz K, Polosukhin I (2017) Attention is all you need. Adv Neural Inform Process Syst 30

  32. Demner-Fushman D, Kohli MD, Rosenman MB, Shooshan SE, Rodriguez L, Antani S, Thoma GR, McDonald CJ (2016) Preparing a collection of radiology examinations for distribution and retrieval. J Am Med Inform Assoc 23(2):304–310

    Article  Google Scholar 

  33. Johnson AE, Pollard TJ, Greenbaum NR, Lungren MP, Deng C-y, Peng Y, Lu Z, Mark RG, Berkowitz SJ, Horng S (2019) Mimic-cxr-jpg, a large publicly available database of labeled chest radiographs. arXiv preprint arXiv:1901.07042

  34. Papineni K, Roukos S, Ward T, Zhu W-J (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the association for computational linguistics, pp 311–318

  35. Banerjee S, Lavie A (2005) Meteor: An automatic metric for mt evaluation with improved correlation with human judgments. In: Proceedings of the ACL workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization, pp 65–72

  36. Lin C-Y (2004) Rouge: A package for automatic evaluation of summaries. In: Text summarization branches out, pp 74–81

  37. Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980

Download references

Acknowledgements

This work was partially supported by the Natural Science Foundation of Chongqing (No.CSTB2023NSCQ-MSX0407), Science and Technology Research Program of Chongqing Municipal Education Commission (No.KJQN202200551),Key Project for Science and Technology Research Program of Chongqing Municipal Education Commission (No.KJZD-K202100505),Chongqing Technology Innovation and Application Development Project (No.cstc2020jscx-msxmX0190), Chongqing Normal University Foundation (No.21XLB026).

Author information

Authors and Affiliations

Authors

Contributions

Ruisheng Ran: conceptualization, writing - review and editing. Renjie Pan: writing - original draft, writing - review and editing. Wen Yang: data curation, validation. Yan Deng: investigation, writing - review Wenfeng Zhang: conceptualization, funding acquisition, resources, supervision, writing - review and editing. Wei Hu: coding, experiment. Qibing Qing: review, revise.

Corresponding authors

Correspondence to Wenfeng Zhang or Wei Hu.

Ethics declarations

Ethical and informed consent for data used

All datasets used in this paper are public datasets, which can be downloaded through public channels upon request.

Competing Interests

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ran, R., Pan, R., Yang, W. et al. MeFD-Net: multi-expert fusion diagnostic network for generating radiology image reports. Appl Intell 54, 11484–11495 (2024). https://doi.org/10.1007/s10489-024-05680-y

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-024-05680-y

Keywords

Navigation