Abstract
Recently, the task of radiology image report generation has been highly favored by researchers. Research on this task not only alleviates the tedious work of radiologists but also enhances the healthcare standards in underdeveloped regions. The previous methods primarily followed the image captioning task, using an encoder-decoder architecture to forcibly align the visual and textual domains. However, they overlooked the cross-modal semantic gap between the visual and textual fields. Based on the multi-expert collaborative diagnosis model used in hospitals, we have developed a “multi-expert diagnostic” mechanism to bridge the gap between these modalities. To achieve this, we propose Multi expert Diagnostic Module(MeDM), whose key design involves introducing multiple learnable matrices to replace the expert’s brain for interactive learning between radiology images and their corresponding reports. Specifically, we interact each expert matrix with visual-textual features to capture abundant multimodal information. To ensure that different expert matrices focus on various feature information, they are constrained by an orthogonal loss. Additionally, we have designed a lightweight Diagnostic Fusion Module(DFM) to integrate and summarize the results from multiple expert matrices. The experimental results on two widely used datasets show that the proposed method leads in most metrics.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability and access
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.
References
Rettenberger L, Schilling M, Elser S, Böhland M, Reischl M (2023) Self-supervised learning for annotation efficient biomedical image segmentation. IEEE Trans Biomed Eng 70(9):2519–2528
Deng R, Liu Q, Cui C, Yao T, Long J, Asad Z, Womick RM, Zhu Z, Fogo AB, Zhao S, Yang H, Huo Y (2023) Omni-seg: A scale-aware dynamic network for renal pathological image segmentation. IEEE Trans Biomed Eng 70(9):2636–2644
Xia S, Zhu H, Liu X, Gong M, Huang X, Xu L, Zhang H, Guo J (2020) Vessel segmentation of x-ray coronary angiographic image sequence. IEEE Trans Biomed Eng 67(5):1338–1348
Guan H, Liu M (2022) Domain adaptation for medical image analysis: A survey. IEEE Trans Biomed Eng 69(3):1173–1185
Li M, Liu R, Wang F, Chang X, Liang X (2023) Auxiliary signal-guided knowledge encoder-decoder for medical report generation. World Wide Web 26(1):253–270
Wang Z, Tang M, Wang L, Li X, Zhou L (2022) A medical semantic-assisted transformer for radiographic report generation. In: International conference on medical image computing and computer-assisted intervention, Springer, pp 655–664
Yang S, Wu X, Ge S, Zhou SK, Xiao L (2022) Knowledge matters: Chest radiology report generation with general and specific knowledge. Med Image Anal 80:102510
Chen Z, Shen Y, Song Y, Wan X (2021) Cross-modal memory networks for radiology report generation. In: Proceedings of the joint conference of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing, pp 5904–5914
Wang Z, Liu L, Wang L, Zhou L (2023) Metransformer: Radiology report generation by transformer with multiple learnable expert tokens. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 11558–11567
Vinyals O, Toshev A, Bengio S, Erhan D (2015) Show and tell: A neural image caption generator. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3156–3164
Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, Zemel R, Bengio Y (2015) Show, attend and tell: Neural image caption generation with visual attention. In: International conference on machine learning, PMLR, pp 2048–2057
Zhu X, Wang W, Guo L, Liu J (2020) Autocaption: Image captioning with neural architecture search. arXiv preprint arXiv:2012.09742
Lu J, Xiong C, Parikh D, Socher R (2017) Knowing when to look: Adaptive attention via a visual sentinel for image captioning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 375–383
Pan Y, Yao T, Li Y, Mei T (2020) X-linear attention networks for image captioning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 10971–10980
Cornia M, Stefanini M, Baraldi L, Cucchiara R (2020) Meshed-memory transformer for image captioning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 10578–10587
Zhou Y, Wang M, Liu D, Hu Z, Zhang H (2020) More grounded image captioning by distilling image-text matching model. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4777–4786
Sur C (2021) aitpr: attribute interaction-tensor product representation for image caption. Neural Process Lett 53(2):1229–1251
Liu F, You C, Wu X, Ge S, Sun X et al (2021) Auto-encoding knowledge graph for unsupervised medical report generation. Adv Neural Inf Process Syst 34:16266–16279
Qin H, Song Y (2022) Reinforced cross-modal alignment for radiology report generation. In: Findings of the association for computational linguistics: ACL 2022:448–458
Wang L, Ning M, Lu D, Wei D, Zheng Y, Chen J (2022) An inclusive task-aware framework for radiology report generation. In: International conference on medical image computing and computer-assisted intervention, Springer, pp 568–577
Pan R, Ran R, Hu W, Zhang W, Qin Q, Cui S (2023) S3-net: A self-supervised dual-stream network for radiology report generation. IEEE J Biomed Health Inform pp 1–12
Liu F, Yin C, Wu X, Ge S, Zhang P, Sun X (2021) Contrastive attention for automatic chest X-ray report generation. In: Zong C, Xia F, Li W, Navigli R (eds) Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. Online. Association for Computational Linguistics, pp 269–280
Wang J, Bhalerao A, He Y (2022) Cross-modal prototype driven network for radiology report generation. In: european conference on computer vision, Springer, pp 563–579
Wang Z, Han H, Wang L, Li X, Zhou L (2022) Automated radiographic report generation purely on transformer: A multicriteria supervised approach. IEEE Trans Med Imaging 41(10):2803–2813
You D, Liu F, Ge S, Xie X, Zhang J, Wu X (2021) Aligntransformer: Hierarchical alignment of visual regions and disease tags for medical report generation. In: Medical image computing and computer assisted intervention–MICCAI 2021: 24th international conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part III 24, Springer, pp 72–82
Li Y, Liang X, Hu Z, Xing EP (2018) Hybrid retrieval-generation reinforced agent for medical image report generation. Adv Neural Inform Process Syst 31
Tanida T, Mller P, Kaissis G, Rueckert D (2023) Interactive and explainable region-guided radiology report generation. In: Proceedings of the ieee conference on computer vision and pattern recognition, pp 7433–7442
Huang Z, Zhang X, Zhang S (2023) Kiut: Knowledge-injected u-transformer for radiology report generation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 19809–19818
Chen Z, Song Y, Chang T-H, Wan X (2020) Generating radiology reports via memory-driven transformer. In: Proceedings of the 2020 conference on empirical methods in natural language processing, pp 1439–1449
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Łukasz K, Polosukhin I (2017) Attention is all you need. Adv Neural Inform Process Syst 30
Demner-Fushman D, Kohli MD, Rosenman MB, Shooshan SE, Rodriguez L, Antani S, Thoma GR, McDonald CJ (2016) Preparing a collection of radiology examinations for distribution and retrieval. J Am Med Inform Assoc 23(2):304–310
Johnson AE, Pollard TJ, Greenbaum NR, Lungren MP, Deng C-y, Peng Y, Lu Z, Mark RG, Berkowitz SJ, Horng S (2019) Mimic-cxr-jpg, a large publicly available database of labeled chest radiographs. arXiv preprint arXiv:1901.07042
Papineni K, Roukos S, Ward T, Zhu W-J (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the association for computational linguistics, pp 311–318
Banerjee S, Lavie A (2005) Meteor: An automatic metric for mt evaluation with improved correlation with human judgments. In: Proceedings of the ACL workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization, pp 65–72
Lin C-Y (2004) Rouge: A package for automatic evaluation of summaries. In: Text summarization branches out, pp 74–81
Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980
Acknowledgements
This work was partially supported by the Natural Science Foundation of Chongqing (No.CSTB2023NSCQ-MSX0407), Science and Technology Research Program of Chongqing Municipal Education Commission (No.KJQN202200551),Key Project for Science and Technology Research Program of Chongqing Municipal Education Commission (No.KJZD-K202100505),Chongqing Technology Innovation and Application Development Project (No.cstc2020jscx-msxmX0190), Chongqing Normal University Foundation (No.21XLB026).
Author information
Authors and Affiliations
Contributions
Ruisheng Ran: conceptualization, writing - review and editing. Renjie Pan: writing - original draft, writing - review and editing. Wen Yang: data curation, validation. Yan Deng: investigation, writing - review Wenfeng Zhang: conceptualization, funding acquisition, resources, supervision, writing - review and editing. Wei Hu: coding, experiment. Qibing Qing: review, revise.
Corresponding authors
Ethics declarations
Ethical and informed consent for data used
All datasets used in this paper are public datasets, which can be downloaded through public channels upon request.
Competing Interests
The authors have no relevant financial or non-financial interests to disclose.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ran, R., Pan, R., Yang, W. et al. MeFD-Net: multi-expert fusion diagnostic network for generating radiology image reports. Appl Intell 54, 11484–11495 (2024). https://doi.org/10.1007/s10489-024-05680-y
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-024-05680-y