Abstract
This paper presents a new video summarization approach that integrates an attention mechanism to identify the significant parts of the video, and is trained unsupervisingly via generative adversarial learning. Starting from the SUM-GAN model, we first develop an improved version of it (called SUM-GAN-sl) that has a significantly reduced number of learned parameters, performs incremental training of the model’s components, and applies a stepwise label-based strategy for updating the adversarial part. Subsequently, we introduce an attention mechanism to SUM-GAN-sl in two ways: (i) by integrating an attention layer within the variational auto-encoder (VAE) of the architecture (SUM-GAN-VAAE), and (ii) by replacing the VAE with a deterministic attention auto-encoder (SUM-GAN-AAE). Experimental evaluation on two datasets (SumMe and TVSum) documents the contribution of the attention auto-encoder to faster and more stable training of the model, resulting in a significant performance improvement with respect to the original model and demonstrating the competitiveness of the proposed SUM-GAN-AAE against the state of the art (Software publicly available at: https://github.com/e-apostolidis/SUM-GAN-AAE).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Importance scores were defined based on a uniform distribution of probabilities and the experiment was repeated 100 times.
References
Apostolidis, E., et al.: A stepwise, label-based approach for improving the adversarial training in unsupervised video summarization. In: AI4TV, ACM MM 2019 (2019)
Apostolidis, E., et al.: Fast shot segmentation combining global and local visual descriptors. In: IEEE ICASSP 2014, pp. 6583–6587 (2014)
Apostolidis, K., Apostolidis, E., Mezaris, V.: A motion-driven approach for fine-grained temporal segmentation of user-generated videos. In: Schoeffmann, K., et al. (eds.) MMM 2018. LNCS, vol. 10704, pp. 29–41. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73603-7_3
Bahuleyan, H., et al.: Variational attention for sequence-to-sequence models. In: 27th COLING, pp. 1672–1682 (2018)
Cho, J.: PyTorch implementation of SUM-GAN (2017). https://github.com/j-min/Adversarial_Video_Summary. Accessed 18 Oct 2019
Elfeki, M., et al.: Video summarization via actionness ranking. In: IEEE WACV 2019, pp. 754–763 (2019)
Fajtl, J., Sokeh, H.S., Argyriou, V., Monekosso, D., Remagnino, P.: Summarizing videos with attention. In: Carneiro, G., You, S. (eds.) ACCV 2018. LNCS, vol. 11367, pp. 39–54. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-21074-8_4
Feng, L., et al.: Extractive video summarizer with memory augmented neural networks. In: ACM MM 2018, pp. 976–983 (2018)
Fu, T., et al.: Attentive and adversarial learning for video summarization. In: IEEE WACV 2019, pp. 1579–1587 (2019)
Gygli, M., Grabner, H., Riemenschneider, H., Van Gool, L.: Creating summaries from user videos. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8695, pp. 505–520. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10584-0_33
Gygli, M., et al.: Video summarization by learning submodular mixtures of objectives. In: IEEE CVPR 2015, pp. 3090–3098 (2015)
Hochreiter, S., et al.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Ji, Z., et al.: Video summarization with attention-based encoder-decoder networks. IEEE Trans. Circ. Syst. Video Technol. 1 (2019)
Kaufman, D., et al.: Temporal tessellation: a unified approach for video analysis. In: IEEE ICCV 2017, pp. 94–104 (2017)
Lee, S., et al.: A memory network approach for story-based temporal summarization of 360 videos. In: IEEE CVPR 2018, pp. 1410–1419 (2018)
Mahasseni, B., et al.: Unsupervised video summarization with adversarial LSTM networks. In: IEEE CVPR 2017, pp. 2982–2991 (2017)
Otani, M., Nakashima, Y., Rahtu, E., Heikkilä, J., Yokoya, N.: Video summarization using deep semantic features. In: Lai, S.-H., Lepetit, V., Nishino, K., Sato, Y. (eds.) ACCV 2016. LNCS, vol. 10115, pp. 361–377. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-54193-8_23
Potapov, D., Douze, M., Harchaoui, Z., Schmid, C.: Category-specific video summarization. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 540–555. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10599-4_35
Radford, A., et al.: Unsupervised representation learning with deep convolutional generative adversarial networks. In: ICLR 2016 (2016)
Rochan, M., et al.: Video summarization by learning from unpaired data. In: IEEE CVPR 2019 (2019)
Rochan, M., Ye, L., Wang, Y.: Video summarization using fully convolutional sequence networks. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11216, pp. 358–374. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01258-8_22
Song, Y., et al.: TVSum: summarizing web videos using titles. In: IEEE CVPR 2015, pp. 5179–5187 (2015)
Szegedy, C., et al.: Going deeper with convolutions. In: IEEE CVPR 2015, pp. 1–9 (2015)
Wei, H., et al.: Video summarization via semantic attended networks. In: AAAI 2018, pp. 216–223 (2018)
Yuan, L., et al.: Cycle-SUM: cycle-consistent adversarial LSTM networks for unsupervised video summarization. In: AAAI 2019, pp. 9143–9150 (2019)
Yuan, Y., et al.: Video summarization by learning deep side semantic embedding. IEEE Trans. Circ. Syst. Video Technol. 29(1), 226–237 (2019)
Zhang, K., Chao, W.-L., Sha, F., Grauman, K.: Video summarization with long short-term memory. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 766–782. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46478-7_47
Zhang, Y., et al.: DTR-GAN: dilated temporal relational adversarial network for video summarization. In: ACM TURC 2019, pp. 89:1–89:6 (2019)
Zhang, Y., et al.: Unsupervised object-level video summarization with online motion auto-encoder. Pattern Recogn. Lett. (2018)
Zhao, B., et al.: Hierarchical recurrent neural network for video summarization. In: ACM MM 2017, pp. 863–871 (2017)
Zhao, B., et al.: HSA-RNN: hierarchical structure-adaptive RNN for video summarization. In: IEEE/CVF CVPR 2018, pp. 7405–7414 (2018)
Zhou, K., et al.: Deep reinforcement learning for unsupervised video summarization with diversity-representativeness reward. In: AAAI 2018, pp. 7582–7589 (2018)
Zhou, K., et al.: Video summarisation by classification with deep reinforcement learning. In: BMVC 2018 (2018)
Acknowledgments
This work was supported by the EUs Horizon 2020 research and innovation programme under grant agreement H2020-780656 ReTV. The work of Ioannis Patras has been supported by EPSRC under grant No. EP/R026424/1.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Apostolidis, E., Adamantidou, E., Metsai, A.I., Mezaris, V., Patras, I. (2020). Unsupervised Video Summarization via Attention-Driven Adversarial Learning. In: Ro, Y., et al. MultiMedia Modeling. MMM 2020. Lecture Notes in Computer Science(), vol 11961. Springer, Cham. https://doi.org/10.1007/978-3-030-37731-1_40
Download citation
DOI: https://doi.org/10.1007/978-3-030-37731-1_40
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-37730-4
Online ISBN: 978-3-030-37731-1
eBook Packages: Computer ScienceComputer Science (R0)