AnomalyNet: a spatiotemporal motion-aware CNN approach for detecting anomalies in real-world autonomous surveillance | The Visual Computer Skip to main content
Log in

AnomalyNet: a spatiotemporal motion-aware CNN approach for detecting anomalies in real-world autonomous surveillance

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

Anomaly detection has significant importance for the development of autonomous monitoring systems. Real-world anomalous events are complicated due to diverse human behaviors and class variations. Anomalous activities depend upon speed, length of activity, and motion features to comprehend suspicious behaviors. Fast activities are captured quickly within a few video frames, whereas slow actions may take several hundred video frames to define an anomalous action. Furthermore, a video is more than just a stack of frames with spatiotemporal representations. Most of the existing approaches suffer from learning variable speed fast and slow activities simultaneously and primarily focus on learning spatiotemporal features only. Modeling the spatiotemporal and motion relationships between frames together can help understand the actions better. Motion features when combined with spatiotemporal representations perform higher. Our contribution is two-fold in this research work. Firstly, a novel dynamic frame-skipping approach is proposed to duly generate meaningful representations of spatiotemporal frames and optical-flow-based motion representations for variable speed anomalous actions. Secondly, AnomalyNet, as a new end-to-end deep architecture, is designed to simultaneously learn both spatiotemporal and motion features in image sequences. AnomalyNet is evaluated on the challenging real-world anomaly detection datasets. The results confirm that the proposed model has achieved a competitive AUC of 86.1% on the real-world UCF-Crime dataset and has achieved a superior AUC score of 99.87% compared to state-of-the-art methods on challenging ShanghaiTech dataset in the domain of unsupervised, weakly-supervised, and fully-supervised anomaly detection. Furthermore, the model achieved the highest F1 score for both fast and slow variable speed anomalous activities, such as explosions, road accidents, robbery, and stealing for real-world autonomous surveillance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data availability

Not applicable.

References

  1. Saligrama, V., Konrad, J., Jodoin, P.M.: Video anomaly identification. IEEE Signal Process. Mag. 27, 18 (2010)

    Article  Google Scholar 

  2. Ravanbakhsh, M., Nabi, M., Sangineto, E., Marcenaro, L., Regazzoni, C., Sebe, N.: Abnormal event detection in videos using generative adversarial nets. In Proceedings-International Conference on Image Processing, ICIP, pp. 1577–1581 (2017)

  3. Hasan, M., Choi, J., Neumann, J., Roy-Chowdhury, A. K., Davis, L. S.: Learning temporal regularity in video sequences. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 733–742 (2016)

  4. Liu, W., Luo, W., Lian, D., Gao, S.: Future frame prediction for anomaly detection—a new baseline. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 6536–6545 (2018)

  5. Park, H., Noh, J., Ham, B.: Learning memory-guided normality for anomaly detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14372–14381(2020)

  6. Xu, D., Ricci, E., Yan, Y., Song, J., Sebe, N.: Learning deep Representations of appearance and motion for anomalous event detection. In British Machine Vision Conference (BMVC), pp. 1–3 (2015)

  7. Zhu, X., Liu, J., Wang, J., Fang, Y., Lu, H.: Anomaly detection in crowded scene via appearance and dynamics joint modelling. In IEEE International Conference on Image Processing (ICIP), pp. 2705–2708 (2012)

  8. Colque, R.V.H.M., Caetano, C., De Andrade, M.T.L., Schwartz, W.R.: Histograms of optical flow orientation and magnitude and entropy to detect anomalous events in videos. IEEE Trans. Circuits Syst. Video Technol. 27(3), 673–682 (2016)

    Article  Google Scholar 

  9. Sultani, W., Chen, C., Shah, M.: Real-world anomaly detection in surveillance videos. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 6479–6488 (2018)

  10. Lu, C., Shi, J., Jia, J.: Abnormal event detection at 150 FPS in MATLAB. In Proceedings of the IEEE International Conference on Computer Vision, pp. 2720–2727(2013)

  11. Shao, J., Loy, C.-C., Kang, K., Wang, X.: Slicing convolutional neural network for crowd video understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5620–5628 (2016)

  12. Krizhevsky, A., Sutskever, I., Hinton, G. E.: Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst (2012). https://proceedings.neurips.cc/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf

  13. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)

  14. Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. Adv. Neural Inf. Process. Syst (2014). https://proceedings.neurips.cc/paper_files/paper/2014/file/00ec53c4682d36f5c4359f4ae7bd7ba1-Paper.pdf

  15. Mumtaz, A., Sargano, A. B., Habib, Z.: Violence detection in surveillance videos with deep network using transfer learning. In 2nd European Conference on Electrical Engineering and Computer Science (EECS), pp. 558–563 (2018)

  16. Mumtaz, A., Sargano, A. B., Habib, Z.: Fast learning through deep multi-net CNN model for violence recognition in video surveillance. Comput. J 65(3), 457–472 (2020). https://academic.oup.com/comjnl/article-abstract/65/3/457/5867750

    Article  Google Scholar 

  17. Sargano, A. B., Wang, X., Angelov, P., Habib, Z.: Human action recognition using transfer learning with deep representations. In 2017 International Joint Conference on Neural Networks (IJCNN), pp. 463–469 (2017)

  18. Sargano, A.B., Angelov, P., Habib, Z.: Human action recognition from multiple views based on view-invariant feature descriptor using support vector machines. Appl. Sci. 6(10), 309 (2016)

    Article  Google Scholar 

  19. Sargano, A., Angelov, P., Habib, Z.: A comprehensive review on handcrafted and learning-based action representation approaches for human activity recognition. Appl. Sci. 7(1), 110 (2017)

    Article  Google Scholar 

  20. Wang, L., Koniusz, P., Huynh, D. Q.: Hallucinating IDT descriptors and I3D optical flow features for action recognition with CNNs. In Proceedings of the IEEE International Conference on Computer Vision, pp. 8698–8708 (2019)

  21. Wang, L., Koniusz, P.: Self-supervising action recognition by statistical moment and subspace descriptors. In Proceedings of the 29th ACM international conference on multimedia, pp. 4324–4333 (2021)

  22. Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision, pp. 4489–4497 (2015)

  23. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In International conference on machine learning, pp. 448–456 (2015)

  24. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition, pp. 1–14 (2014)

  25. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)

  26. Liu, Y., Lu, Z., Li, J., Yang, T., Yao, C.: Deep image-to-video adaptation and fusion networks for action recognition. IEEE Trans. Image Process. 29, 3168–3182 (2019)

    Article  Google Scholar 

  27. Carreira, J., Zisserman, A.: Quo Vadis, action recognition? A new model and the kinetics dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6299–6308 (2017)

  28. Kay, W. et al.: The kinetics human action video dataset. Preprint at arXiv Prepr. arXiv1705.06950, 2017.

  29. Soomro, K., Zamir, A. R., Shah, M.: UCF101: a dataset of 101 human actions classes from videos in the wild. Preprint at arXiv Prepr. arXiv1212.0402, 2012.

  30. Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., Serre, T.: HMDB: a large video database for human motion recognition. In Proceedings of the IEEE International Conference on Computer Vision, pp. 2556–2563 (2011)

  31. Zhu, Y., Newsam, S.: Motion-aware feature for improved video anomaly detection. 30th Br. Mach. Vis. Conf. 2019, BMVC 2019, 2019

  32. Wan, B., Jiang, W., Fang, Y., Luo, Z., Ding, G.: Anomaly detection in video sequences: a benchmark and computational model. IET Image Process. 15, 3454 (2021)

    Article  Google Scholar 

  33. Cao, C., Lu, Y., Wang, P., Zhang, Y.: A new comprehensive benchmark for semi-supervised video anomaly detection and anticipation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 20392–20401 (2023)

  34. Zhu, Y., et al.: Hybrid-order representation learning for electricity theft detection. IEEE Trans. Ind. Inf. 19(2), 1248–1259 (2022)

    Article  Google Scholar 

  35. Li, W., Mahadevan, V., Vasconcelos, N.: Anomaly detection and localization in crowded scenes. IEEE Trans. Pattern Anal. Mach. Intell. 36(1), 18–32 (2014)

    Article  Google Scholar 

  36. Kratz, L., Nishino, K.: Anomaly detection in extremely crowded scenes using spatio-temporal motion pattern models. In IEEE conference on computer vision and pattern recognition, pp. 1446–1453 (2009)

  37. Zhao, B., Fei-Fei, L., Xing, E. P.: Online detection of unusual events in videos via dynamic sparse coding. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 3313–3320 (2011)

  38. Mehran, R., Oyama, A., Shah, M.: Abnormal crowd behavior detection using social force model. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 935–942 (2009)

  39. Nayak, R., Pati, U.C., Das, S.K.: A comprehensive review on deep learning-based methods for video anomaly detection. Image Vis. Comput. 106, 104078 (2020)

    Article  Google Scholar 

  40. Ramachandra, B., Jones, M.J., Vatsavai, R.R.: A survey of single-scene video anomaly detection. IEEE Trans. Pattern Anal. Mach. Intell. (2020). https://doi.org/10.1109/TPAMI.2020.3040591

    Article  Google Scholar 

  41. Chidananda, K., Kumar, S.: Human anomaly detection in surveillance videos: a review. Inf. Commun. Technol. Compet. Strateg., pp. 791–802, 2022.

  42. Zhu, S., Chen, C., Sultani, W.: Video anomaly detection for smart surveillance. Preprint at arXiv Prepr. arXiv2004.00222, 2020

  43. Kim, J., Grauman, K.: Observe locally, infer globally: a space-time MRF for detecting abnormal activities with incremental updates. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 2921–2928 (2009)

  44. Ullah, H., Ullah, M., Conci, N.: Dominant motion analysis in regular and irregular crowd scenes. In International Workshop on Human Behavior Understanding, pp. 62–72 (2014)

  45. Cong, Y., Yuan, J., Liu, J.: Sparse reconstruction cost for abnormal event detection. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 3449–3456 (2011)

  46. Chong, Y. S., Tay, Y. H.: Abnormal event detection in videos using spatiotemporal autoencoder. In International Symposium on Neural Networks, pp. 189–196 (2017)

  47. Liu, Y., Liu, J., Lin, J., Zhao, M., Song, L.: Appearance-motion united auto-encoder framework for video anomaly detection. IEEE Trans. Circuits Syst. II Express Briefs 69, 2498 (2022)

    Google Scholar 

  48. Yuan, F.N., Zhang, L., Shi, J.T., Xia, X., Li, G.: Theories and applications of auto-encoder neural networks: a literature survey. Jisuanji Xuebao/Chinese J. Comput. 42(1), 203–230 (2019)

    Google Scholar 

  49. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  50. Luo, W., Liu, W., Gao, S.L Remembering history with convolutional LSTM for anomaly detection. In IEEE International Conference on Multimedia and Expo (ICME), pp. 439–444 (2017)

  51. Luo, W., Liu, W., Gao, S.: Revisit of sparse coding based anomaly detection in stacked RNN framework. In Proceedings of the IEEE International Conference on Computer Vision, pp. 341–349 (2017)

  52. Shen, L., Li, Z., Kwok, J.T.: Timeseries anomaly detection using temporal hierarchical one-class network. Adv. Neural. Inf. Process. Syst. 33, 13016–13026 (2020)

    Google Scholar 

  53. Gong, D. et al.: Memorizing normality to detect anomaly: memory-augmented deep autoencoder for unsupervised anomaly detection. In Proceedings of the IEEE International Conference on Computer Vision, pp. 1705–1714 (2019)

  54. Xu, D., Yan, Y., Ricci, E., Sebe, N.: Detecting anomalous events in videos by learning deep representations of appearance and motion. Comput. Vis. Image Underst. 156, 117–127 (2017)

    Article  Google Scholar 

  55. Liu, Y., Lu, Z., Li, J., Yang, T.: Hierarchically learned view-invariant representations for cross-view action recognition. IEEE Trans. Circuits Syst. Video Technol. 29(8), 2416–2430 (2019)

    Article  Google Scholar 

  56. Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.A.: Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11(12), 3371 (2010)

    MathSciNet  Google Scholar 

  57. Narasimhan, M.G., Sowmya Kamath, S.: Dynamic video anomaly detection and localization using sparse denoising autoencoders. Multimed. Tools Appl. 77(11), 13173–13195 (2018)

    Article  Google Scholar 

  58. Dhole, H., Sutaone, M., Vyas, V.: Anomaly detection using convolutional spatiotemporal Autoencoder. In 2019 10th International Conference on Computing, Communication and Networking Technologies, ICCCNT 2019, 2019

  59. Zhao, Y., Deng, B., Shen, C., Liu, Y., Lu, H., Hua, X. S.: Spatio-temporal AutoEncoder for video anomaly detection. In Proceedings of the 25th ACM international conference on Multimedia, pp. 1933–1941 (2017)

  60. Chalapathy, R., Toth, E., Chawla, S.: Group anomaly detection using deep generative models. Lecture Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 11051 LNAI, pp. 173–189, 2019

  61. Ye, M., Peng, X., Gan, W., Wu, W., Qiao, Y.: AnoPCN: video anomaly detection via deep predictive coding network. In Proceedings of the 27th ACM International Conference on Multimedia, pp. 1805–1813 (2019)

  62. Dong, F., Zhang, Y., Nie, X.: Dual discriminator generative adversarial network for video anomaly detection. IEEE Access 8, 88170 (2020)

    Article  Google Scholar 

  63. Tang, Y., Zhao, L., Zhang, S., Gong, C., Li, G., Yang, J.: Integrating prediction and reconstruction for anomaly detection. Pattern Recognit. Lett. 129, 123–130 (2020)

    Article  Google Scholar 

  64. Liu, Z., Nie, Y., Long, C., Zhang, Q., Li, G.: A hybrid video anomaly detection framework via memory-augmented flow reconstruction and flow-guided frame prediction. In Proceedings of the IEEE International Conference on Computer Vision, pp. 13588–13597 (2021)

  65. He, C., Shao, J., Sun, J.: An anomaly-introduced learning method for abnormal event detection. Multimed. Tools Appl. 77(22), 29573–29588 (2018)

    Article  Google Scholar 

  66. Zhong, J. X., Li, N., Kong, W., Liu, S., Li, T. H., Li, G.: Graph convolutional label noise cleaner: train a plug-and-play action classifier for anomaly detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1237–1246 (2019)

  67. Shah, A. P., Lamare, J. B., Nguyen-Anh, T., Hauptmann, A.: CADP: a novel dataset for CCTV traffic camera based accident analysis. In IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 1–9 (2019)

  68. Bai, S.: et al.: Traffic anomaly detection via perspective map based on spatial-temporal information matrix. In Proc. CVPR Workshops, pp. 117–124 (2019)

  69. Wang, G., Yuan, X., Zhang, A., Hsu, H.-M., Hwang, J.-N.: Anomaly candidate identification and starting time estimation of vehicles from traffic videos. In AI City Challenge Workshop, IEEE/CVF Computer Vision and Pattern Recognition (CVPR) Conference, Long Beach, California, pp. 382–390 (2019)

  70. Hinami, R., Mei, T., Satoh, S.: Joint detection and recounting of abnormal events by learning deep generic knowledge. In Proceedings of the IEEE International Conference on Computer Vision (2017)

  71. Se, S. A. P., Ravanbakhsh, M., Nabi, M., Mousavi, H., Sangineto, E., Sebe, N.: Plug-and-play CNN for crowd motion analysis: An application in abnormal event detection. In Proceedings-2018 IEEE Winter Conference on Applications of Computer Vision, WACV 2018 (2018)

  72. Sabokrou, M., Fayyaz, M., Fathy, M., Moayed, Z., Klette, R.: Deep-anomaly: fully convolutional neural network for fast anomaly detection in crowded scenes. Comput. Vis. Image Underst. 172, 88–97 (2018)

    Article  Google Scholar 

  73. Sabokrou, M., Fayyaz, M., Fathy, M., Klette, R.: Deep-cascade: cascading 3D deep neural networks for fast anomaly detection and localization in crowded scenes. IEEE Trans. Image Process. 26(4), 1992–2004 (2017)

    Article  MathSciNet  Google Scholar 

  74. Liu, Y., Li, G., Lin, L.: Cross-modal causal relational reasoning for event-level visual question answering. IEEE Trans. Pattern Anal. Mach. Intell. (2023). https://doi.org/10.1109/TPAMI.2023.3284038

    Article  Google Scholar 

  75. Liu, Y., Wang, K., Liu, L., Lan, H., Lin, L.: TCGL: temporal contrastive graph for self-supervised video representation learning. IEEE Trans. Image Process. 31, 1978–1993 (2022)

    Article  Google Scholar 

  76. Liu, Y., Wei, Y.S., Yan, H., Bin Li, G., Lin, L.: Causal reasoning meets visual representation learning: a prospective study. Mach. Intell. Res. 19(6), 485–511 (2022)

    Article  Google Scholar 

  77. Wang, L., Huynh, D. Q., Mansour, M. R.: Loss switching fusion with similarity search for video classification. In IEEE International Conference on Image Processing (ICIP), pp. 974–978 (2019)

  78. Wang, L., Koniusz, P.: Uncertainty-DTW for time series and sequences. In European Conference on Computer Vision, pp. 176–195 (2022)

  79. Wang, L., Koniusz, P.: Temporal-viewpoint transportation plan for skeletal few-shot action recognition. In Proceedings of the Asian Conference on Computer Vision, pp. 4176–4193 (2022)

  80. Koniusz, P., Wang, L., Cherian, A.: Tensor representations for action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 44(2), 648–665 (2021)

    Article  Google Scholar 

  81. Qin, Z., et al.: Fusing Higher-Order Features in Graph Neural Networks for Skeleton-Based Action Recognition. IEEE Trans. Neural Netw. Learn. Syst. (2022). https://doi.org/10.1109/TNNLS.2022.3201518

    Article  Google Scholar 

  82. Wang, L., Koniusz, P.: 3Mformer: multi-order multi-mode transformer for skeletal action recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5620–5631 (2023)

  83. Chang, Y., Tu, Z., Xie, W., Yuan, J.: Clustering driven deep autoencoder for video anomaly detection. In Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, Proceedings, Part XV 16, pp. 329–345 (2020)

  84. Chang, Y., et al.: Video anomaly detection with spatio-temporal dissociation. Pattern Recognit. 122, 108213 (2022)

    Article  Google Scholar 

  85. Morais, R., Le, V., Tran, T., Saha, B., Mansour, M., Venkatesh, S.: Learning regularity in skeleton trajectories for anomaly detection in videos. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 11996–12004 (2019)

  86. Yang, M., Feng, Y., Rao, A.S., Rajasegarar, S., Tian, S., Zhou, Z.: Evolving graph-based video crowd anomaly detection. Vis. Comput. (2023). https://doi.org/10.1007/s00371-023-02783-4

    Article  Google Scholar 

  87. Ehsan, T.Z., Nahvi, M., Mohtavipour, S.M.: An accurate violence detection framework using unsupervised spatial–temporal action translation network. Vis. Comput. (2023). https://doi.org/10.1007/s00371-023-02865-3

    Article  Google Scholar 

  88. Zach, C., Pock, T., Bischof, H.: A duality based approach for realtime TV-L 1 optical flow. In Joint pattern recognition symposium, pp. 214–223 (2007)

  89. Bailer, C., Taetz, B., Stricker, D.: Flow fields: dense correspondence fields for highly accurate large displacement optical flow estimation. In Proceedings of the IEEE international conference on computer vision, pp. 4015–4023 (2015)

  90. Ilg, E., Mayer, N., Saikia, T., Keuper, M., Dosovitskiy, A., Brox, T.: Flownet 2.0: Evolution of optical flow estimation with deep networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2462–2470 (2017)

  91. Sun, D., Yang, X., Liu, M. Y., Kautz, J.: PWC-Net: CNNs for optical flow using pyramid, warping, and cost volume. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 8934–8943 (2018)

  92. Adam, A., Rivlin, E., Shimshoni, I., Reinitz, D.: Robust real-time unusual event detection using multiple fixed-location monitors. IEEE Trans. Pattern Anal. Mach. Intell. 30, 555 (2008)

    Article  Google Scholar 

  93. Ramachandra, B., Jones, M.: Street Scene: a new dataset and evaluation protocol for video anomaly detection. In The IEEE Winter Conference on Applications of Computer Vision, pp. 2569–2578 (2020)

  94. Ullah, W., Ullah, A., Hussain, T., Khan, Z.A., Baik, S.W.: An efficient anomaly recognition framework using an attention residual LSTM in surveillance videos. Sensors 21(8), 2811 (2021)

    Article  Google Scholar 

  95. Ling, C. X., Huang, J., Zhang, H.: AUC: a better measure than accuracy in comparing learning algorithms. In Conference of the canadian society for computational studies of intelligence, pp. 329–341 (2003)

  96. Dubey, S., Boragule, A., Gwak, J., Jeon, M.: Anomalous event recognition in videos based on joint learning of motion and appearance with multiple ranking measures. Appl. Sci. 11(3), 1344 (2021)

    Article  Google Scholar 

  97. Liu, W., Luo, W., Li, Z., Zhao, P., Gao, S.: Margin learning embedded prediction for video anomaly detection with a few anomalies. In IJCAI International Joint Conference on Artificial Intelligence, pp. 3023–3030 (2019)

  98. Gianchandani, U., Tirupattur, P., Shah, M.: Weakly-Supervised Spatiotemporal Anomaly Detection. University of Central Florida Center for Research in Computer Vision REU, 2019

  99. Hao, W., et al.: Anomaly event detection in security surveillance using two-stream based model. Secur. Commun. Netw. 2020, 8876056 (2020)

    Article  Google Scholar 

  100. Shreyas, D.G., Raksha, S., Prasad, B.G.: Implementation of an anomalous human activity recognition system. SN Comput. Sci. 1(3), 1–10 (2020)

    Article  Google Scholar 

  101. Zaheer, M. Z., Lee, J., Astrid, M., Mahmood, A., Lee, S.-I.: Cleaning label noise with clusters for minimally supervised anomaly detection. Preprint at arXiv e-prints pp. 3–6 (2021)

  102. Majhi, S., Das, S., Bremond, F., Dash, R., Sa, P. K.: Weakly-supervised joint anomaly detection and classification. In Proceedings - 2021 16th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2021, pp. 1–7 (2021)

  103. Ullah, W., Ullah, A., Haq, I.U., Muhammad, K., Sajjad, M., Baik, S.W.: CNN features with bi-directional LSTM for real-time anomaly detection in surveillance networks. Multimed. Tools Appl. 80(11), 16979–16995 (2021)

    Article  Google Scholar 

  104. Cao, C., Zhang, X., Zhang, S., Wang, P., Zhang, Y.: Adaptive graph convolutional networks for weakly supervised anomaly detection in videos. Preprint at arXiv e-prints (2022)

  105. Thakare, K.V., Sharma, N., Dogra, D.P., Choi, H., Kim, I.J.: A multi-stream deep neural network with late fuzzy fusion for real-world anomaly detection. Expert Syst. Appl. 201, 117030 (2022)

    Article  Google Scholar 

  106. Chen, Y., Liu, Z., Zhang, B., Fok, W., Qi, X., Wu, Y.: MGFN : magnitude-contrastive glance-and-focus network for weakly- supervised video anomaly detection MGFN : magnitude-contrastive glance-and-focus network for weakly-supervised video anomaly detection. Preprint at arXiv Prepr. arXiv2211.15098 (2022)

  107. Maqsood, R., Bajwa, U.I., Saleem, G., Raza, R.H., Anwar, M.W.: Anomaly recognition from surveillance videos using 3D convolution neural network. Multimed. Tools Appl. 80(12), 18693–18716 (2021)

    Article  Google Scholar 

  108. Vu, T.H., Boonaert, J., Ambellouis, S., Taleb-Ahmed, A.: Multi-channel generative framework and supervised learning for anomaly detection in surveillance videos. Sensors 21(9), 1–16 (2021)

    Article  Google Scholar 

  109. Hou, R., Chen, C., Shah, M.: Tube convolutional neural network (T-CNN) for action detection in videos. In Proceedings of the IEEE International Conference on Computer Vision, vol. 2017-Octob, pp. 5822–5831. (2017)

Download references

Funding

This research was supported by the PDE-GIR project, which received funding from the European Union's Horizon 2020 research and innovation program under the Marie Skodowska-Curie grant with the agreement number 778035.

Author information

Authors and Affiliations

Authors

Contributions

The research direction was conceived and designed by AM, ABS., and ZH. AM was responsible for proposing and implementing the methodology, conducting the experiments, and writing the research paper. ABS and ZH examined the data and reviewed the results. ABS and AM contributed to reagents, materials, and analysis tools. All authors have reviewed and approved the published version of the manuscript.

Corresponding author

Correspondence to Zulfiqar Habib.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Informed consent statement

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

See Table 4.

Table 4 Architecture details of the proposed AnomalyNet

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mumtaz, A., Sargano, A.B. & Habib, Z. AnomalyNet: a spatiotemporal motion-aware CNN approach for detecting anomalies in real-world autonomous surveillance. Vis Comput 40, 7823–7844 (2024). https://doi.org/10.1007/s00371-023-03210-4

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-023-03210-4

Keywords

Navigation