Abstract
Convolutional recurrent neural networks (ConvRNNs) are widely used for spatiotemporal modeling tasks including video frame prediction. A major drawback of existing ConvRNNs is the amount of computing and memory resources, which can hinder practical applications on embedded devices. Thus, to reduce them, we propose 1) a new gated architecture of the recurrent unit with temporal memory and 2) the replacement of computationally demanding convolution with a more light-weight Hadamard product. Adopting such constraints can degrade the performance, but we show that the proposed model produces better results with reduced computation and memory. Quantitative evaluation with the Moving MNIST dataset shows that the overall performance of video frame prediction is improved by 13% in terms of MSE and by 3% in terms of SSIM without increasing the number of parameters and their multiplications, compared with the conventional ConvLSTM baseline. Further, applying the Hadamard product replacement outperforms the baseline MSE by 5%, while reducing the number of parameters by 14% and the number of multiplications by 25%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Ballas, N., Yao, L., Pal, C., Courville, A.: Delving deeper into convolutional networks for learning video representations. Presented at the (2016)
Denton, E., Birodkar, V.: Unsupervised learning of disentangled representations from video. In: Advances in Neural Information Processing Systems (NIPS), pp. 4414–4423 (2017)
Elsayed, N., Maida, A.S., Bayoumi, M.: Reduced-gate convolutional LSTM using predictive coding for spatiotemporal prediction. arXiv:1810.07251 (2018)
Ravanelli, M., Brakel, P., Omologo, M., Bengio, Y.: Light gated recurrent units for speech recognition. IEEE Trans. Emerg. Topics Comput. Intell. 2(2), 92–102 (2018)
Shi, X., Chen, Z., Wang, H., Yeung, D.-Y., Wong, W., Woo, W.: Convolutional LSTM network: a machine learning approach for precipitation nowcasting. In: Advances in Neural Information Processing Systems (NIPS), pp. 802–810 (2015)
Srivastava, N., Mansimov, E., Salakhudinov, R.: Unsupervised learning of video representations using LSTMs. In: International Conference Machine Learning (ICML), pp. 843–852 (2015)
Vazhenina, D., Kanemura, A.: Reducing the number of multiplications in convolutional recurrent neural networks (ConvRNNs). In: Ohsawa, Y. (ed.) JSAI 2019. AISC, vol. 1128, pp. 45–52. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-39878-1_5
Wang, Y., Gao, Z., Long, M., Wang, J., Yu, P.S.: PredRNN++: towards a resolution of the deep-in-time dilemma in spatiotemporal predictive learning. Presented at the (2018)
Wang, Y., Jiang, L., Yang, M.H., Li, L.J., Long, M., Fei-Fei, L.: Eidetic 3d lstm: a model for video prediction and beyond. Presented at the (2019)
Wang, Y., Long, M., Wang, J., Gao, Z., Yu., P.S.: PredRNN: recurrent neural networks for predictive learning using spatiotemporal LSTMs. In: Advances in Neural Information Processing Systems, (NIPS), pp. 879–888 (2017)
Wang, Y., Zhang, J., Zhu, H., Long, M., Wang, J., Yu, P.S.: Memory in memory: a predictive neural network for learning higher-order non-stationarity from spatiotemporal dynamics. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9154–9162 (2019)
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Vazhenina, D., Kanemura, A. (2021). Gated Extra Memory Recurrent Unit for Learning Video Representations. In: Yada, K., et al. Advances in Artificial Intelligence. JSAI 2020. Advances in Intelligent Systems and Computing, vol 1357. Springer, Cham. https://doi.org/10.1007/978-3-030-73113-7_15
Download citation
DOI: https://doi.org/10.1007/978-3-030-73113-7_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-73112-0
Online ISBN: 978-3-030-73113-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)