Interactive Key-Value Memory-augmented Attention for Image Paragraph Captioning - ACL Anthology

Interactive Key-Value Memory-augmented Attention for Image Paragraph Captioning

Chunpu Xu, Yu Li, Chengming Li, Xiang Ao, Min Yang, Jinwen Tian


Abstract
Image paragraph captioning (IPC) aims to generate a fine-grained paragraph to describe the visual content of an image. Significant progress has been made by deep neural networks, in which the attention mechanism plays an essential role. However, conventional attention mechanisms tend to ignore the past alignment information, which often results in problems of repetitive captioning and incomplete captioning. In this paper, we propose an Interactive key-value Memory- augmented Attention model for image Paragraph captioning (IMAP) to keep track of the attention history (salient objects coverage information) along with the update-chain of the decoder state and therefore avoid generating repetitive or incomplete image descriptions. In addition, we employ an adaptive attention mechanism to realize adaptive alignment from image regions to caption words, where an image region can be mapped to an arbitrary number of caption words while a caption word can also attend to an arbitrary number of image regions. Extensive experiments on a benchmark dataset (i.e., Stanford) demonstrate the effectiveness of our IMAP model.
Anthology ID:
2020.coling-main.279
Volume:
Proceedings of the 28th International Conference on Computational Linguistics
Month:
December
Year:
2020
Address:
Barcelona, Spain (Online)
Editors:
Donia Scott, Nuria Bel, Chengqing Zong
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
3132–3142
Language:
URL:
https://aclanthology.org/2020.coling-main.279
DOI:
10.18653/v1/2020.coling-main.279
Bibkey:
Cite (ACL):
Chunpu Xu, Yu Li, Chengming Li, Xiang Ao, Min Yang, and Jinwen Tian. 2020. Interactive Key-Value Memory-augmented Attention for Image Paragraph Captioning. In Proceedings of the 28th International Conference on Computational Linguistics, pages 3132–3142, Barcelona, Spain (Online). International Committee on Computational Linguistics.
Cite (Informal):
Interactive Key-Value Memory-augmented Attention for Image Paragraph Captioning (Xu et al., COLING 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.coling-main.279.pdf
Data
Image Paragraph Captioning