Abstract
Modeling abnormal spatiotemporal events is challenging since data belonging to abnormal activities are less in the course of a surveillance stream. We solve this issue using a normality modeling approach, where abnormalities are detected as deviations from the normal patterns. To this end, we propose a residual spatiotemporal autoencoder, which is trainable end-to-end to carry out the anomaly detection task in surveillance videos. Irregularities are detected using the reconstruction loss, where normal frames are reconstructed well with a low reconstruction cost, and the converse is identified as abnormal frames. We evaluate the effect of residual connections in the STAE architecture and presented good practices to train an autoencoder for video anomaly detection using benchmark datasets, namely CUHK-Avenue, UCSD-Ped2, and Live Videos. Comparisons with the existing approaches prove that the effectiveness of residual blocks is incremental than going deeper with additional layers to train a spatiotemporal autoencoder with good generalization across datasets.
Similar content being viewed by others
References
Ali, A., Taylor, G.W.: Real-time end-to-end action detection with two-stream networks. In: 2018 15th Conference on Computer and Robot Vision (CRV), IEEE, pp. 31–38 (2018)
Biswas, S., Babu, R.V., (2013) Real time anomaly detection in h. 264 compressed videos. In: Fourth National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics (NCVPRIPG), pp. 1–4. IEEE (2013)
Christoph, R., Pinz, F.A.: Spatiotemporal residual networks for video action recognition. Advances in Neural Information Processing Systems, pp. 3468–3476 (2016)
Del Giorno, A., Bagnell, J.A., Hebert, M.: A discriminative framework for anomaly detection in large videos. In: European Conference on Computer Vision, pp. 334–349. Springer (2016)
Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., Van Der Smagt, P., Cremers, D., Brox, T.: Flownet: Learning optical flow with convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2758–2766 (2015)
Ghrab, N.B., Fendri, E., Hammami, M.: Abnormal events detection based on trajectory clustering. In: 2016 13th International Conference on Computer Graphics, Imaging and Visualization (CGiV), pp. 301–306. IEEE (2016)
Gong, D., Liu, L., Le, V., Saha, B., Mansour, MR., Venkatesh, S., Hengel, Avd.: Memorizing normality to detect anomaly: memory-augmented deep autoencoder for unsupervised anomaly detection (2019). arXiv preprint arXiv:1904.02639
Hasan, M., Choi, J., Neumann, J., Roy-Chowdhury, AK., Davis, L.S.: Learning temporal regularity in video sequences. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 733–742 (2016)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Hu, X., Hu, S., Huang, Y., Zhang, H., Wu, H.: Video anomaly detection using deep incremental slow feature analysis network. IET Comput. Vis. 10(4), 258–267 (2016)
Ionescu, R.T., Smeureanu, S., Popescu, M., Alexem B.: Detecting abnormal events in video using narrowed normality clusters. In: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1951–1960. IEEE (2019)
Iqbal, A., Richard, A., Kuehne, H., Gall, J.: Recurrent residual learning for action recognition. In: German Conference on Pattern Recognition, pp. 126–137. Springer (2017)
Kaltsa, V., Briassouli, A., Kompatsiaris, I., Strintzis, M.G.: Swarm-based motion features for anomaly detection in crowds. In: 2014 IEEE International Conference on Image Processing (ICIP), pp. 2353–2357. IEEE (2014)
Khan, M.U.K., Park, H.S., Kyung, C.M.: Rejecting motion outliers for efficient crowd anomaly detection. IEEE Trans. Inf. Forensics Secur. 14(2), 541–556 (2018)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization (2014). arXiv preprint arXiv:1412.6980
Leyva, R., Sanchez, V., Li, C.T.: Abnormal event detection in videos using binary features. In: 2017 40th International Conference on Telecommunications and Signal Processing (TSP), pp. 621–625. IEEE (2017)
Leyva, R., Sanchez, V., Li, C.T.: The LV dataset: a realistic surveillance video dataset for abnormal event detection. In: 2017 5th International Workshop on Biometrics and Forensics (IWBF), pp 1–6. IEEE (2017)
Liu, W., Luo, W., Lian, D., Gao, S.: Future frame prediction for anomaly detection—a new baseline. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6536–6545 (2018)
Lu, C., Shi, J., Jia, J.: Abnormal event detection at 150 fps in matlab. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2720–2727 (2013)
Luo, W., Liu, W., Gao, S.: Remembering history with convolutional LSTM for anomaly detection. In: 2017 IEEE International Conference on Multimedia and Expo (ICME), pp. 439–444. IEEE (2017)
Mahadevan, V., Li, W., Bhalodia, V., Vasconcelos, N.: Anomaly detection in crowded scenes. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition pp. 1975–1981. IEEE (2010)
Mehran, R., Oyama, A., Shah, M.: Abnormal crowd behavior detection using social force model. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 935–942. IEEE (2009)
Noceti, N., Odone, F., Sciutti, A., Sandini, G.: Exploring biological motion regularities of human actions: a new perspective on video analysis. ACM Trans. Appl. Percept. 14(3), 21:1–21:20 (2017). https://doi.org/10.1145/3086591
Revathi, A., Kumar, D.: An efficient system for anomaly detection using deep learning classifier. Signal Image Video Process. 11(2), 291–299 (2017)
Ronneberger, O. Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241. Springer (2015)
Sabokrou, M., Fayyaz, M., Fathy, M., Klette, R.: Deep-cascade: cascading 3d deep neural networks for fast anomaly detection and localization in crowded scenes. IEEE Trans. Image Process. 26(4), 1992–2004 (2017)
Sudhakaran, S., Lanz, O.: Learning to detect violent videos using convolutional long short-term memory. In: 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 1–6. IEEE (2017)
Tadros, T., Cullen, N.C., Greene, M.R., Cooper, E.A.: Assessing neural network scene classification from degraded images. ACM Trans. Appl. Percept. 16(4), 21:1–21:20 (2019). https://doi.org/10.1145/3342349
Tran, H.T., Hogg, D.: Anomaly detection using a convolutional winner-take-all autoencoder. In: Proceedings of the British Machine Vision Conference 2017. British Machine Vision Association (2017)
Tudor Ionescu, R., Smeureanu, S., Alexe, B., Popescu, M.: Unmasking the abnormal events in video. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2895–2903 (2017)
Wang, S., Zeng, Y., Liu, Q., Zhu, C., Zhu, E., Yin, J.: Detecting abnormality without knowing normality. In: ACM International Conference on Multimedia. ACM Press (2018)
Xingjian, S., Chen, Z., Wang, H., Yeung, D.Y., Wong, W.K., Woo, W.c.: Convolutional lstm network: a machine learning approach for precipitation nowcasting. In: Advances in Neural Information Processing Systems, pp. 802–810 (2015)
Xu, D., Ricci, E., Yan, Y., Song, J., Sebe, N.: Learning deep representations of appearance and motion for anomalous event detection (2015). arXiv preprint arXiv:1510.01553
Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: Cvpr, vol. 10, p. 7 (2010)
Zhao, Y., Deng, B., Shen, C., Liu, Y., Lu, H., Hua, X.S.: Spatio-temporal autoencoder for video anomaly detection. In: ACM Multimedia (2017)
Acknowledgements
The authors would like to acknowledge the following funding agencies: “Council of Scientific and Industrial Research (CSIR)” (09/1095(0043)/19-EMR-I) and (No.DST/CSRI/2017/131(G)) project under the “Cognitive Science Research Initiative (CSRI)” sanctioned by the Department of Science and Technology, Government of India.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Deepak, K., Chandrakala, S. & Mohan, C.K. Residual spatiotemporal autoencoder for unsupervised video anomaly detection. SIViP 15, 215–222 (2021). https://doi.org/10.1007/s11760-020-01740-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11760-020-01740-1