Human-Like Storyteller: A Hierarchical Network with Gated Memory for Visual Storytelling

Zhang, Lu; Kong, Yawei; Fang, Fang; Cao, Cong; Cao, Yanan; Liu, Yanbing; Ma, Can

doi:10.1007/978-3-030-77964-1_21

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12743))

Included in the following conference series:

International Conference on Computational Science

2010 Accesses

Abstract

Different from the visual captioning that describes an image concretely, the visual storytelling aims at generating an imaginative paragraph with a deep understanding of the given image stream. It is more challenging for the requirements of inferring contextual relationships among images. Intuitively, humans tend to tell the story around a central idea that is constantly expressed with the continuation of the storytelling. Therefore, we propose the Human-Like StoryTeller (HLST), a hierarchical neural network with a gated memory module, which imitates the storytelling process of human beings. First, we utilize the hierarchical decoder to integrate the context information effectively. Second, we introduce the memory module as the story’s central idea to enhance the coherence of generated stories. And the multi-head attention mechanism with a self adjust query is employed to initialize the memory module, which distils the salient information of the visual semantic features. Finally, we equip the memory module with a gated mechanism to guide the story generation dynamically. During the generation process, the expressed information contained in memory is erased with the control of the read and write gate. The experimental results indicate that our approach significantly outperforms all state-of-the-art (SOTA) methods.

L. Zhang and Y. Kong—Equal contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 9380; Price includes VAT (Japan)

Softcover Book: JPY 11725; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A Hierarchical Approach for Visual Storytelling Using Image Description

Towards Visual Storytelling by Understanding Narrative Context Through Scene-Graphs

StoryImager: A Unified and Efficient Framework for Coherent Story Visualization and Completion

Notes

References

Banerjee, S., Lavie, A.: Meteor: an automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, pp. 65–72 (2005)
Google Scholar
Chen, Z., Zhang, X., Boedihardjo, A.P., Dai, J., Lu, C.T.: Multimodal storytelling via generative adversarial imitation learning. In: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI 2017, pp. 3967–3973 (2017)
Google Scholar
Devlin, J., et al.: Language models for image captioning: the quirks and what works. arXiv preprint arXiv:1505.01809 (2015)
Gonzalez-Rico, D., Pineda, G.F.: Contextualize, show and tell: a neural visual storyteller. CoRR abs/1806.00738 (2018)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Hsu, C., Chen, S., Hsieh, M., Ku, L.: Using inter-sentence diverse beam search to reduce redundancy in visual storytelling. CoRR abs/1805.11867 (2018)
Google Scholar
Hu, J., Cheng, Y., Gan, Z., Liu, J., Gao, J., Neubig, G.: What makes a good story? Designing composite rewards for visual storytelling. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 05, pp. 7969–7976, April 2020
Google Scholar
Huang, Q., Gan, Z., Celikyilmaz, A., Wu, D., Wang, J., He, X.: Hierarchically structured reinforcement learning for topically coherent visual story generation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 8465–8472 (2019)
Google Scholar
Huang, T.H.K., et al.: Visual storytelling. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1233–1239 (2016)
Google Scholar
Jung, Y., Kim, D., Woo, S., Kim, K., Kim, S., Kweon, I.S.: Hide-and-tell: learning to bridge photo streams for visual storytelling. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 11213–11220 (2020)
Google Scholar
Kim, T., Heo, M., Son, S., Park, K., Zhang, B.: GLAC net: GLocal attention cascading networks for multi-image cued story generation. CoRR abs/1805.10973 (2018)
Google Scholar
Li, T., Li, S.: Incorporating textual evidence in visual storytelling. In: Proceedings of the 1st Workshop on Discourse Structure in Neural NLG, Tokyo, Japan, pp. 13–17. Association for Computational Linguistics, November 2019
Google Scholar
Lin, C.Y.: ROUGE: a package for automatic evaluation of summaries. In: Text Summarization Branches Out, Barcelona, Spain, pp. 74–81. Association for Computational Linguistics, July 2004
Google Scholar
Lin, C.Y., Och, F.J.: Automatic evaluation of machine translation quality using longest common subsequence and skip-bigram statistics. In: Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL 2004), Barcelona, Spain, pp. 605–612, July 2004
Google Scholar
Liu, F., Perez, J.: Gated end-to-end memory networks. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, pp. 1–10 (2017)
Google Scholar
Park, C.C., Kim, G.: Expressing an image stream with a sequence of natural sentences. In: Cortes, C., Lawrence, N., Lee, D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 28, pp. 73–81. Curran Associates, Inc. (2015)
Google Scholar
Sukhbaatar, S., Szlam, A., Weston, J., Fergus, R.: End-to-end memory networks. In: Cortes, C., Lawrence, N., Lee, D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 28, pp. 2440–2448. Curran Associates, Inc. (2015)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 6000–6010 (2017)
Google Scholar
Vedantam, R., Zitnick, C.L., Parikh, D.: Cider: consensus-based image description evaluation. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4566–4575 (2015)
Google Scholar
Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption generator. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3156–3164 (2015)
Google Scholar
Wang, J., Fu, J., Tang, J., Li, Z., Mei, T.: Show, reward and tell: automatic generation of narrative paragraph from photo stream by adversarial training. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Google Scholar
Wang, X., Chen, W., Wang, Y.F., Wang, W.Y.: No metrics are perfect: adversarial reward learning for visual storytelling. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (vol. 1: Long Papers), pp. 899–909 (2018)
Google Scholar
Yang, P., et al.: Knowledgeable storyteller: a commonsense-driven generative model for visual storytelling. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence, pp. 5356–5362. AAAI Press (2019)
Google Scholar
Yu, L., Bansal, M., Berg, T.: Hierarchically-attentive RNN for album summarization and storytelling. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark, pp. 966–971. Association for Computational Linguistics, September 2017
Google Scholar

Download references

Acknowledgement

This research is supported by the National Key R&D Program of China (No.2017YFC0820700, No.2018YFB1004700).

Author information

Authors and Affiliations

Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China
Lu Zhang & Yawei Kong
School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China
Lu Zhang, Yawei Kong, Fang Fang, Cong Cao, Yanan Cao, Yanbing Liu & Can Ma

Authors

Lu Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yawei Kong
View author publications
You can also search for this author in PubMed Google Scholar
Fang Fang
View author publications
You can also search for this author in PubMed Google Scholar
Cong Cao
View author publications
You can also search for this author in PubMed Google Scholar
Yanan Cao
View author publications
You can also search for this author in PubMed Google Scholar
Yanbing Liu
View author publications
You can also search for this author in PubMed Google Scholar
Can Ma
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Cong Cao .

Editor information

Editors and Affiliations

AGH University of Science and Technology, Krakow, Poland
Maciej Paszynski
Ludwig-Maximilians-Universität München, Munich, Germany
Dieter Kranzlmüller
University of Amsterdam, Amsterdam, The Netherlands
Valeria V. Krzhizhanovskaya
University of Tennessee at Knoxville, Knoxville, TN, USA
Jack J. Dongarra
University of Amsterdam, Amsterdam, The Netherlands
Peter M. A. Sloot

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, L. et al. (2021). Human-Like Storyteller: A Hierarchical Network with Gated Memory for Visual Storytelling. In: Paszynski, M., Kranzlmüller, D., Krzhizhanovskaya, V.V., Dongarra, J.J., Sloot, P.M.A. (eds) Computational Science – ICCS 2021. ICCS 2021. Lecture Notes in Computer Science(), vol 12743. Springer, Cham. https://doi.org/10.1007/978-3-030-77964-1_21

Download citation

DOI: https://doi.org/10.1007/978-3-030-77964-1_21
Published: 09 June 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-77963-4
Online ISBN: 978-3-030-77964-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics