Gated Extra Memory Recurrent Unit for Learning Video Representations

Vazhenina, Daria; Kanemura, Atsunori

doi:10.1007/978-3-030-73113-7_15

Daria Vazhenina²³ &
Atsunori Kanemura²³

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1357))

Included in the following conference series:

Annual Conference of the Japanese Society for Artificial Intelligence

328 Accesses

Abstract

Convolutional recurrent neural networks (ConvRNNs) are widely used for spatiotemporal modeling tasks including video frame prediction. A major drawback of existing ConvRNNs is the amount of computing and memory resources, which can hinder practical applications on embedded devices. Thus, to reduce them, we propose 1) a new gated architecture of the recurrent unit with temporal memory and 2) the replacement of computationally demanding convolution with a more light-weight Hadamard product. Adopting such constraints can degrade the performance, but we show that the proposed model produces better results with reduced computation and memory. Quantitative evaluation with the Moving MNIST dataset shows that the overall performance of video frame prediction is improved by 13% in terms of MSE and by 3% in terms of SSIM without increasing the number of parameters and their multiplications, compared with the conventional ConvLSTM baseline. Further, applying the Hadamard product replacement outperforms the baseline MSE by 5%, while reducing the number of parameters by 14% and the number of multiplications by 25%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 19447; Price includes VAT (Japan)

Softcover Book: JPY 24309; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Reducing the Number of Multiplications in Convolutional Recurrent Neural Networks (ConvRNNs)

A Novel Algorithm for Video Frame Prediction Based on Convolutional Neural Network

Inception Recurrent Neural Network Architecture for Video Frame Prediction

Article 27 November 2022

References

Ballas, N., Yao, L., Pal, C., Courville, A.: Delving deeper into convolutional networks for learning video representations. Presented at the (2016)
Google Scholar
Denton, E., Birodkar, V.: Unsupervised learning of disentangled representations from video. In: Advances in Neural Information Processing Systems (NIPS), pp. 4414–4423 (2017)
Google Scholar
Elsayed, N., Maida, A.S., Bayoumi, M.: Reduced-gate convolutional LSTM using predictive coding for spatiotemporal prediction. arXiv:1810.07251 (2018)
Ravanelli, M., Brakel, P., Omologo, M., Bengio, Y.: Light gated recurrent units for speech recognition. IEEE Trans. Emerg. Topics Comput. Intell. 2(2), 92–102 (2018)
Article Google Scholar
Shi, X., Chen, Z., Wang, H., Yeung, D.-Y., Wong, W., Woo, W.: Convolutional LSTM network: a machine learning approach for precipitation nowcasting. In: Advances in Neural Information Processing Systems (NIPS), pp. 802–810 (2015)
Google Scholar
Srivastava, N., Mansimov, E., Salakhudinov, R.: Unsupervised learning of video representations using LSTMs. In: International Conference Machine Learning (ICML), pp. 843–852 (2015)
Google Scholar
Vazhenina, D., Kanemura, A.: Reducing the number of multiplications in convolutional recurrent neural networks (ConvRNNs). In: Ohsawa, Y. (ed.) JSAI 2019. AISC, vol. 1128, pp. 45–52. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-39878-1_5
Chapter Google Scholar
Wang, Y., Gao, Z., Long, M., Wang, J., Yu, P.S.: PredRNN++: towards a resolution of the deep-in-time dilemma in spatiotemporal predictive learning. Presented at the (2018)
Google Scholar
Wang, Y., Jiang, L., Yang, M.H., Li, L.J., Long, M., Fei-Fei, L.: Eidetic 3d lstm: a model for video prediction and beyond. Presented at the (2019)
Google Scholar
Wang, Y., Long, M., Wang, J., Gao, Z., Yu., P.S.: PredRNN: recurrent neural networks for predictive learning using spatiotemporal LSTMs. In: Advances in Neural Information Processing Systems, (NIPS), pp. 879–888 (2017)
Google Scholar
Wang, Y., Zhang, J., Zhu, H., Long, M., Wang, J., Yu, P.S.: Memory in memory: a predictive neural network for learning higher-order non-stationarity from spatiotemporal dynamics. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9154–9162 (2019)
Google Scholar
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)
Article Google Scholar

Download references

Author information

Authors and Affiliations

LeapMind Inc., Tokyo, 150-0044, Japan
Daria Vazhenina & Atsunori Kanemura

Authors

Daria Vazhenina
View author publications
You can also search for this author in PubMed Google Scholar
Atsunori Kanemura
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Atsunori Kanemura .

Editor information

Editors and Affiliations

Kansai University, Suita, Osaka, Japan
Katsutoshi Yada
Department of Applied Computer Science, Tokyo Polytechnic University, Atsugi, Kanagawa, Japan
Daisuke Katagami
Graduate School of System Design, Tokyo Metropolitan University, Hino, Tokyo, Japan
Yasufumi Takama
Department of Social Informatics, Kyoto University, Kyoto, Japan
Takayuki Ito
Division of Behavioral Science, Faculty of Letters, Chiba University, Chiba, Chiba, Japan
Akinori Abe
Department of Computer Science, Graduate School of System Design, Tokyo Metropolitan University, Hino, Tokyo, Japan
Eri Sato-Shimokawara
Mathematics and Informatics Center, The University of Tokyo, Tokyo, Japan
Junichiro Mori
Graduate School of Economics, Osaka University, Toyonaka, Osaka, Japan
Naohiro Matsumura
Department of Intelligence Science and Technology, Kyoto University, Kyoto, Japan
Hisashi Kashima

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Vazhenina, D., Kanemura, A. (2021). Gated Extra Memory Recurrent Unit for Learning Video Representations. In: Yada, K., et al. Advances in Artificial Intelligence. JSAI 2020. Advances in Intelligent Systems and Computing, vol 1357. Springer, Cham. https://doi.org/10.1007/978-3-030-73113-7_15

Download citation

DOI: https://doi.org/10.1007/978-3-030-73113-7_15
Published: 23 July 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-73112-0
Online ISBN: 978-3-030-73113-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Gated Extra Memory Recurrent Unit for Learning Video Representations

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Reducing the Number of Multiplications in Convolutional Recurrent Neural Networks (ConvRNNs)

A Novel Algorithm for Video Frame Prediction Based on Convolutional Neural Network

Inception Recurrent Neural Network Architecture for Video Frame Prediction

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Gated Extra Memory Recurrent Unit for Learning Video Representations

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Reducing the Number of Multiplications in Convolutional Recurrent Neural Networks (ConvRNNs)

A Novel Algorithm for Video Frame Prediction Based on Convolutional Neural Network

Inception Recurrent Neural Network Architecture for Video Frame Prediction

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation