Retrieving Multimodal Information for Augmented Generation: A Survey

Ruochen Zhao; Hailin Chen; Weishi Wang; Fangkai Jiao; Xuan Long Do; Chengwei Qin; Bosheng Ding; Xiaobao Guo; Minzhi Li; Xingxuan Li; Shafiq Joty

doi:10.18653/v1/2023.findings-emnlp.314

Retrieving Multimodal Information for Augmented Generation: A Survey

Ruochen Zhao, Hailin Chen, Weishi Wang, Fangkai Jiao, Xuan Long Do, Chengwei Qin, Bosheng Ding, Xiaobao Guo, Minzhi Li, Xingxuan Li, Shafiq Joty

Abstract

As Large Language Models (LLMs) become popular, there emerged an important trend of using multimodality to augment the LLMs’ generation ability, which enables LLMs to better interact with the world. However, there lacks a unified perception of at which stage and how to incorporate different modalities. In this survey, we review methods that assist and augment generative models by retrieving multimodal knowledge, whose formats range from images, codes, tables, graphs, to audio. Such methods offer a promising solution to important concerns such as factuality, reasoning, interpretability, and robustness. By providing an in-depth review, this survey is expected to provide scholars with a deeper understanding of the methods’ applications and encourage them to adapt existing techniques to the fast-growing field of LLMs.

Anthology ID:: 2023.findings-emnlp.314
Original:: 2023.findings-emnlp.314v1
Version 2:: 2023.findings-emnlp.314v2
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2023
Month:: December
Year:: 2023
Address:: Singapore
Editors:: Houda Bouamor, Juan Pino, Kalika Bali
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 4736–4756
Language:
URL:: https://aclanthology.org/2023.findings-emnlp.314/
DOI:: 10.18653/v1/2023.findings-emnlp.314
Bibkey:
Cite (ACL):: Ruochen Zhao, Hailin Chen, Weishi Wang, Fangkai Jiao, Xuan Long Do, Chengwei Qin, Bosheng Ding, Xiaobao Guo, Minzhi Li, Xingxuan Li, and Shafiq Joty. 2023. Retrieving Multimodal Information for Augmented Generation: A Survey. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 4736–4756, Singapore. Association for Computational Linguistics.
Cite (Informal):: Retrieving Multimodal Information for Augmented Generation: A Survey (Zhao et al., Findings 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.findings-emnlp.314.pdf

PDF (v2) PDF (v1) Cite Search Fix data