Gated Extra Memory Recurrent Unit for Learning Video Representations | SpringerLink
Skip to main content

Gated Extra Memory Recurrent Unit for Learning Video Representations

  • Conference paper
  • First Online:
Advances in Artificial Intelligence (JSAI 2020)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1357))

Included in the following conference series:

  • 328 Accesses

Abstract

Convolutional recurrent neural networks (ConvRNNs) are widely used for spatiotemporal modeling tasks including video frame prediction. A major drawback of existing ConvRNNs is the amount of computing and memory resources, which can hinder practical applications on embedded devices. Thus, to reduce them, we propose 1) a new gated architecture of the recurrent unit with temporal memory and 2) the replacement of computationally demanding convolution with a more light-weight Hadamard product. Adopting such constraints can degrade the performance, but we show that the proposed model produces better results with reduced computation and memory. Quantitative evaluation with the Moving MNIST dataset shows that the overall performance of video frame prediction is improved by 13% in terms of MSE and by 3% in terms of SSIM without increasing the number of parameters and their multiplications, compared with the conventional ConvLSTM baseline. Further, applying the Hadamard product replacement outperforms the baseline MSE by 5%, while reducing the number of parameters by 14% and the number of multiplications by 25%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 19447
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 24309
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Ballas, N., Yao, L., Pal, C., Courville, A.: Delving deeper into convolutional networks for learning video representations. Presented at the (2016)

    Google Scholar 

  2. Denton, E., Birodkar, V.: Unsupervised learning of disentangled representations from video. In: Advances in Neural Information Processing Systems (NIPS), pp. 4414–4423 (2017)

    Google Scholar 

  3. Elsayed, N., Maida, A.S., Bayoumi, M.: Reduced-gate convolutional LSTM using predictive coding for spatiotemporal prediction. arXiv:1810.07251 (2018)

  4. Ravanelli, M., Brakel, P., Omologo, M., Bengio, Y.: Light gated recurrent units for speech recognition. IEEE Trans. Emerg. Topics Comput. Intell. 2(2), 92–102 (2018)

    Article  Google Scholar 

  5. Shi, X., Chen, Z., Wang, H., Yeung, D.-Y., Wong, W., Woo, W.: Convolutional LSTM network: a machine learning approach for precipitation nowcasting. In: Advances in Neural Information Processing Systems (NIPS), pp. 802–810 (2015)

    Google Scholar 

  6. Srivastava, N., Mansimov, E., Salakhudinov, R.: Unsupervised learning of video representations using LSTMs. In: International Conference Machine Learning (ICML), pp. 843–852 (2015)

    Google Scholar 

  7. Vazhenina, D., Kanemura, A.: Reducing the number of multiplications in convolutional recurrent neural networks (ConvRNNs). In: Ohsawa, Y. (ed.) JSAI 2019. AISC, vol. 1128, pp. 45–52. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-39878-1_5

    Chapter  Google Scholar 

  8. Wang, Y., Gao, Z., Long, M., Wang, J., Yu, P.S.: PredRNN++: towards a resolution of the deep-in-time dilemma in spatiotemporal predictive learning. Presented at the (2018)

    Google Scholar 

  9. Wang, Y., Jiang, L., Yang, M.H., Li, L.J., Long, M., Fei-Fei, L.: Eidetic 3d lstm: a model for video prediction and beyond. Presented at the (2019)

    Google Scholar 

  10. Wang, Y., Long, M., Wang, J., Gao, Z., Yu., P.S.: PredRNN: recurrent neural networks for predictive learning using spatiotemporal LSTMs. In: Advances in Neural Information Processing Systems, (NIPS), pp. 879–888 (2017)

    Google Scholar 

  11. Wang, Y., Zhang, J., Zhu, H., Long, M., Wang, J., Yu, P.S.: Memory in memory: a predictive neural network for learning higher-order non-stationarity from spatiotemporal dynamics. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9154–9162 (2019)

    Google Scholar 

  12. Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Atsunori Kanemura .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Vazhenina, D., Kanemura, A. (2021). Gated Extra Memory Recurrent Unit for Learning Video Representations. In: Yada, K., et al. Advances in Artificial Intelligence. JSAI 2020. Advances in Intelligent Systems and Computing, vol 1357. Springer, Cham. https://doi.org/10.1007/978-3-030-73113-7_15

Download citation

Publish with us

Policies and ethics