Unsupervised Video Summarization via Attention-Driven Adversarial Learning

Apostolidis, Evlampios; Adamantidou, Eleni; Metsai, Alexandros I.; Mezaris, Vasileios; Patras, Ioannis

doi:10.1007/978-3-030-37731-1_40

Evlampios Apostolidis^16,17,
Eleni Adamantidou¹⁶,
Alexandros I. Metsai¹⁶,
Vasileios Mezaris¹⁶ &
…
Ioannis Patras¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11961))

Included in the following conference series:

International Conference on Multimedia Modeling

3741 Accesses
57 Citations
1 Altmetric

Abstract

This paper presents a new video summarization approach that integrates an attention mechanism to identify the significant parts of the video, and is trained unsupervisingly via generative adversarial learning. Starting from the SUM-GAN model, we first develop an improved version of it (called SUM-GAN-sl) that has a significantly reduced number of learned parameters, performs incremental training of the model’s components, and applies a stepwise label-based strategy for updating the adversarial part. Subsequently, we introduce an attention mechanism to SUM-GAN-sl in two ways: (i) by integrating an attention layer within the variational auto-encoder (VAE) of the architecture (SUM-GAN-VAAE), and (ii) by replacing the VAE with a deterministic attention auto-encoder (SUM-GAN-AAE). Experimental evaluation on two datasets (SumMe and TVSum) documents the contribution of the attention auto-encoder to faster and more stable training of the model, resulting in a significant performance improvement with respect to the original model and demonstrating the competitiveness of the proposed SUM-GAN-AAE against the state of the art (Software publicly available at: https://github.com/e-apostolidis/SUM-GAN-AAE).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 13727; Price includes VAT (Japan)

Softcover Book: JPY 17159; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A two-stage attention augmented fully convolutional network-based dynamic video summarization

Article 21 August 2023

Encoder-Decoder Architectures based Video Summarization using Key-Shot Selection Model

Article 16 September 2023

Video Summarization with LSTM and Deep Attention Models

Notes

1.
Importance scores were defined based on a uniform distribution of probabilities and the experiment was repeated 100 times.

References

Apostolidis, E., et al.: A stepwise, label-based approach for improving the adversarial training in unsupervised video summarization. In: AI4TV, ACM MM 2019 (2019)
Google Scholar
Apostolidis, E., et al.: Fast shot segmentation combining global and local visual descriptors. In: IEEE ICASSP 2014, pp. 6583–6587 (2014)
Google Scholar
Apostolidis, K., Apostolidis, E., Mezaris, V.: A motion-driven approach for fine-grained temporal segmentation of user-generated videos. In: Schoeffmann, K., et al. (eds.) MMM 2018. LNCS, vol. 10704, pp. 29–41. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73603-7_3
Chapter Google Scholar
Bahuleyan, H., et al.: Variational attention for sequence-to-sequence models. In: 27th COLING, pp. 1672–1682 (2018)
Google Scholar
Cho, J.: PyTorch implementation of SUM-GAN (2017). https://github.com/j-min/Adversarial_Video_Summary. Accessed 18 Oct 2019
Elfeki, M., et al.: Video summarization via actionness ranking. In: IEEE WACV 2019, pp. 754–763 (2019)
Google Scholar
Fajtl, J., Sokeh, H.S., Argyriou, V., Monekosso, D., Remagnino, P.: Summarizing videos with attention. In: Carneiro, G., You, S. (eds.) ACCV 2018. LNCS, vol. 11367, pp. 39–54. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-21074-8_4
Chapter Google Scholar
Feng, L., et al.: Extractive video summarizer with memory augmented neural networks. In: ACM MM 2018, pp. 976–983 (2018)
Google Scholar
Fu, T., et al.: Attentive and adversarial learning for video summarization. In: IEEE WACV 2019, pp. 1579–1587 (2019)
Google Scholar
Gygli, M., Grabner, H., Riemenschneider, H., Van Gool, L.: Creating summaries from user videos. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8695, pp. 505–520. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10584-0_33
Chapter Google Scholar
Gygli, M., et al.: Video summarization by learning submodular mixtures of objectives. In: IEEE CVPR 2015, pp. 3090–3098 (2015)
Google Scholar
Hochreiter, S., et al.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Ji, Z., et al.: Video summarization with attention-based encoder-decoder networks. IEEE Trans. Circ. Syst. Video Technol. 1 (2019)
Google Scholar
Kaufman, D., et al.: Temporal tessellation: a unified approach for video analysis. In: IEEE ICCV 2017, pp. 94–104 (2017)
Google Scholar
Lee, S., et al.: A memory network approach for story-based temporal summarization of 360 videos. In: IEEE CVPR 2018, pp. 1410–1419 (2018)
Google Scholar
Mahasseni, B., et al.: Unsupervised video summarization with adversarial LSTM networks. In: IEEE CVPR 2017, pp. 2982–2991 (2017)
Google Scholar
Otani, M., Nakashima, Y., Rahtu, E., Heikkilä, J., Yokoya, N.: Video summarization using deep semantic features. In: Lai, S.-H., Lepetit, V., Nishino, K., Sato, Y. (eds.) ACCV 2016. LNCS, vol. 10115, pp. 361–377. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-54193-8_23
Chapter Google Scholar
Potapov, D., Douze, M., Harchaoui, Z., Schmid, C.: Category-specific video summarization. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 540–555. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10599-4_35
Chapter Google Scholar
Radford, A., et al.: Unsupervised representation learning with deep convolutional generative adversarial networks. In: ICLR 2016 (2016)
Google Scholar
Rochan, M., et al.: Video summarization by learning from unpaired data. In: IEEE CVPR 2019 (2019)
Google Scholar
Rochan, M., Ye, L., Wang, Y.: Video summarization using fully convolutional sequence networks. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11216, pp. 358–374. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01258-8_22
Chapter Google Scholar
Song, Y., et al.: TVSum: summarizing web videos using titles. In: IEEE CVPR 2015, pp. 5179–5187 (2015)
Google Scholar
Szegedy, C., et al.: Going deeper with convolutions. In: IEEE CVPR 2015, pp. 1–9 (2015)
Google Scholar
Wei, H., et al.: Video summarization via semantic attended networks. In: AAAI 2018, pp. 216–223 (2018)
Google Scholar
Yuan, L., et al.: Cycle-SUM: cycle-consistent adversarial LSTM networks for unsupervised video summarization. In: AAAI 2019, pp. 9143–9150 (2019)
Article Google Scholar
Yuan, Y., et al.: Video summarization by learning deep side semantic embedding. IEEE Trans. Circ. Syst. Video Technol. 29(1), 226–237 (2019)
Article Google Scholar
Zhang, K., Chao, W.-L., Sha, F., Grauman, K.: Video summarization with long short-term memory. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 766–782. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46478-7_47
Chapter Google Scholar
Zhang, Y., et al.: DTR-GAN: dilated temporal relational adversarial network for video summarization. In: ACM TURC 2019, pp. 89:1–89:6 (2019)
Google Scholar
Zhang, Y., et al.: Unsupervised object-level video summarization with online motion auto-encoder. Pattern Recogn. Lett. (2018)
Google Scholar
Zhao, B., et al.: Hierarchical recurrent neural network for video summarization. In: ACM MM 2017, pp. 863–871 (2017)
Google Scholar
Zhao, B., et al.: HSA-RNN: hierarchical structure-adaptive RNN for video summarization. In: IEEE/CVF CVPR 2018, pp. 7405–7414 (2018)
Google Scholar
Zhou, K., et al.: Deep reinforcement learning for unsupervised video summarization with diversity-representativeness reward. In: AAAI 2018, pp. 7582–7589 (2018)
Google Scholar
Zhou, K., et al.: Video summarisation by classification with deep reinforcement learning. In: BMVC 2018 (2018)
Google Scholar

Download references

Acknowledgments

This work was supported by the EUs Horizon 2020 research and innovation programme under grant agreement H2020-780656 ReTV. The work of Ioannis Patras has been supported by EPSRC under grant No. EP/R026424/1.

Author information

Authors and Affiliations

Information Technologies Institute, Centre for Research and Technology Hellas, Thermi-Thessaloniki, Greece
Evlampios Apostolidis, Eleni Adamantidou, Alexandros I. Metsai & Vasileios Mezaris
School of Electronic Engineering and Computer Science, Queen Mary University of London, London, UK
Evlampios Apostolidis & Ioannis Patras

Authors

Evlampios Apostolidis
View author publications
You can also search for this author in PubMed Google Scholar
Eleni Adamantidou
View author publications
You can also search for this author in PubMed Google Scholar
Alexandros I. Metsai
View author publications
You can also search for this author in PubMed Google Scholar
Vasileios Mezaris
View author publications
You can also search for this author in PubMed Google Scholar
Ioannis Patras
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vasileios Mezaris .

Editor information

Editors and Affiliations

Korea Advanced Institute of Science and, Daejeon, Korea (Republic of)
Yong Man Ro
National Chiao Tung University, Hsinchu, Taiwan
Wen-Huang Cheng
Korea Advanced Institute of Science and Technology, Daejeon, Korea (Republic of)
Junmo Kim
National Cheng Kung University, Tainan City, Taiwan
Wei-Ta Chu
Tsinghua University, Beijing, China
Peng Cui
Korea Advanced Institute of Science and Technology, Daejeon, Korea (Republic of)
Jung-Woo Choi
National Tsing Hua University, Hsinchu, Taiwan
Min-Chun Hu
Ghent University, Ghent, Belgium
Wesley De Neve

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Apostolidis, E., Adamantidou, E., Metsai, A.I., Mezaris, V., Patras, I. (2020). Unsupervised Video Summarization via Attention-Driven Adversarial Learning. In: Ro, Y., et al. MultiMedia Modeling. MMM 2020. Lecture Notes in Computer Science(), vol 11961. Springer, Cham. https://doi.org/10.1007/978-3-030-37731-1_40

Download citation

DOI: https://doi.org/10.1007/978-3-030-37731-1_40
Published: 24 December 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-37730-4
Online ISBN: 978-3-030-37731-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Unsupervised Video Summarization via Attention-Driven Adversarial Learning

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A two-stage attention augmented fully convolutional network-based dynamic video summarization

Encoder-Decoder Architectures based Video Summarization using Key-Shot Selection Model

Video Summarization with LSTM and Deep Attention Models

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Unsupervised Video Summarization via Attention-Driven Adversarial Learning

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A two-stage attention augmented fully convolutional network-based dynamic video summarization

Encoder-Decoder Architectures based Video Summarization using Key-Shot Selection Model

Video Summarization with LSTM and Deep Attention Models

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation