Human Violence Recognition in Video Surveillance in Real-Time

Huillcen Baca, Herwin Alayn; de Luz Palomino Valdivia, Flor; Solis, Ivan Soria; Cruz, Mario Aquino; Caceres, Juan Carlos Gutierrez

doi:10.1007/978-3-031-28073-3_52

Herwin Alayn Huillcen Baca¹⁰,
Flor de Luz Palomino Valdivia¹⁰,
Ivan Soria Solis¹⁰,
Mario Aquino Cruz¹¹ &
…
Juan Carlos Gutierrez Caceres¹²

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 652))

Included in the following conference series:

Future of Information and Communication Conference

895 Accesses
5 Citations

Abstract

The automatic detection of human violence in video surveillance is an area of great attention due to its application in security, monitoring, and prevention systems. Detecting violence in real time could prevent criminal acts and even save lives. There are many investigations and proposals for the detection of violence in video surveillance; however, most of them focus on effectiveness and not on efficiency. They focus on overcoming the accuracy results of other proposals and not on their applicability in a real scenario and real-time. In this work, we propose an efficient model for recognizing human violence in real-time, based on deep learning, composed of two modules, a spatial attention module (SA) and a temporal attention module (TA). SA extracts spatial features and regions of interest by frame difference of two consecutive frames and morphological dilation. TA extracts temporal features by averaging all three RGB channels in a single channel to have three frames as input to a 2D CNN backbone. The proposal was evaluated in efficiency, accuracy, and real-time. The results showed that our work has the best efficiency compared to other proposals. Accuracy was very close to the result of the best proposal, and latency was very close to real-time. Therefore our model can be applied in real scenarios and in real-time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 25167; Price includes VAT (Japan)

Softcover Book: JPY 31459; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A spatio-temporal model for violence detection based on spatial and temporal attention modules and 2D CNNs

Article 24 April 2024

Violence detection in videos using interest frame extraction and 3D convolutional neural network

Article 12 March 2022

A multi-stream CNN for deep violence detection in video sequences using handcrafted features

Article 26 July 2021

References

Gao, Y., Liu, H., Sun, X., Wang, C., Liu, Y.: Violence detection using oriented violent flows. Image Vis. Comput. 48–49(2015), 37–41 (2016). https://doi.org/10.1016/j.imavis.2016.01.006
Article Google Scholar
Deniz, O., Serrano, I., Bueno, G., Kim, T.K.: Fast violence detection in video. In: VISAPP 2014 - Proceedings 9th International Conference on Computer Vision Theory Applications, vol. 2, December 2014, pp. 478–485 (2014). https://doi.org/10.5220/0004695104780485
Bilinski, P.: Human violence recognition and detection in surveillance videos, pp. 30–36 (2016). https://doi.org/10.1109/AVSS.2016.7738019
Zhang, T., Jia, W., He, X., Yang, J.: Discriminative dictionary learning with motion weber local descriptor for violence detection. IEEE Trans. Circuits Syst. Video Technol. 27(3), 696–709 (2017)
Article Google Scholar
Deb, T., Arman, A., Firoze, A.: Machine cognition of violence in videos using novel outlier-resistant VLAD. In: 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 989–994 (2018)
Google Scholar
Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Advances in Neural Information Processing Systems, vol. 1, no. January, pp. 568–576 (2014)
Google Scholar
Feichtenhofer, C., Pinz, A., Zisserman, A.: Convolutional two-stream network fusion for video action recognition. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2016, Decemeber 2016, pp. 1933–1941 (2016). https://doi.org/10.1109/CVPR.2016.213
Zhang, B., Wang, L., Wang, Z., Qiao, Y., Wang, H.: Real-time action recognition with deeply transferred motion vector CNNs. IEEE Trans. Image Process. 27(5), 2326–2339 (2018). https://doi.org/10.1109/TIP.2018.2791180
Article MathSciNet Google Scholar
Wang, L., Xiong, Y., Wang, Z., Qiao, Yu., Lin, D., Tang, X., Van Gool, L.: Temporal segment networks: towards good practices for deep action recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 20–36. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_2
Chapter Google Scholar
Zhu, Y., Lan, Z., Newsam, S., Hauptmann, A.: Hidden two-stream convolutional networks for action recognition. In: Jawahar, C.V., Li, H., Mori, G., Schindler, K. (eds.) ACCV 2018. LNCS, vol. 11363, pp. 363–378. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-20893-6_23
Chapter Google Scholar
Ji, S., Xu, W., Yang, M., Yu, K.: 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 221–231 (2013). https://doi.org/10.1109/TPAMI.2012.59
Article Google Scholar
Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, vol. 2015, pp. 4489–4497 (2015). https://doi.org/10.1109/ICCV.2015.510
Qiu, Z., Yao, T., Mei, T.: Learning spatio-temporal representation with pseudo-3D residual networks. In: Proceedings of the IEEE International Conference on Computer Vision, vol. 2017-October, pp. 5534–5542 (2017). https://doi.org/10.1109/ICCV.2017.590
Carreira, J., Zisserman, A.: Quo Vadis, action recognition? A new model and the kinetics dataset. In: Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, vol. 2017-January, pp. 4724–4733 (2017). https://doi.org/10.1109/CVPR.2017.502
Dong, Z., Qin, J., Wang, Y.: Multi-stream deep networks for person to person violence detection in videos. In: Chinese Conference on Pattern Recognition, pp. 517–531 (2016)
Google Scholar
Zhou, P., Ding, Q., Luo, H., Hou, X.: Violent interaction detection in video based on deep learning. J. Phys: Conf. Ser. 844(1), 12044 (2017)
Google Scholar
Serrano, I., Deniz, O., Espinosa-Aranda, J.L., Bueno, G.: Fight recognition in video using Hough forests and 2D convolutional neural network. IEEE Trans. Image Process. 27(10), 4787–4797 (2018)
Article MathSciNet MATH Google Scholar
Sudhakaran, S., Lanz, O.: Learning to detect violent videos using convolutional long short-term memory. In: 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 1–6 (2017)
Google Scholar
Hanson, A., PNVR, K., Krishnagopal, S., Davis, L.: Bidirectional convolutional LSTM for the detection of violence in videos. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018. LNCS, vol. 11130, pp. 280–295. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11012-3_24
Chapter Google Scholar
Ulutan, O., Rallapalli, S., Srivatsa, M., Torres, C., Manjunath, B.S.: Actor conditioned attention maps for video action detection. In: Proceedings of IEEE Winter Conference on Applcations of Computer Vision (WACV), pp. 516–525 (2020)
Google Scholar
Meng, L., et al.: Interpretable spatio-temporal attention for video action recognition. In: Proceedings of IEEE/CVF International Conference Computer Vision Workshop (ICCVW), October 2019, pp. 1513–1522 (2019)
Google Scholar
Kang, M.S., Park, R.H., Park, H.M.: Efficient spatio-temporal modeling methods for real-time violence recognition. IEEE Access 9, 76270–76285 (2021)
Article Google Scholar
Tran, D., Wang, H., Torresani, L., Ray, J., LeCun, Y., Paluri, M.: A closer look at spatiotemporal convolutions for action recognition. In: Proceedings of IEEE/CVF Conference on Computer Vision Pattern Recognition, June 2018, pp. 6450–6459 (2018)
Google Scholar
Qiu, Z., Yao, T., Mei, T.: Learning spatio-temporal representation with pseudo-3D residual networks. In: Proceedings of IEEE International Conference on Computer Vision (ICCV), October 2017, pp. 5534–5542 (2017)
Google Scholar
Hanson, A., PNVR, K., Krishnagopal, S., Davis, L.: Bidirectional convolutional LSTM for the detection of violence in videos. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018. LNCS, vol. 11130, pp. 280–295. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11012-3_24
Chapter Google Scholar
Li, J., Jiang, X., Sun, T., Xu, K.: Efficient violence detection using 3D convolutional neural networks. In: Proceedings of 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), September 2019, pp. 1–8 (2018)
Google Scholar
Soliman, M.M., et al.: Violence recognition from videos using deep learning techniques. In: Proceedings of 9th International Conference on Intelligent Computing and Information System (ICICIS), December 2019, pp. 80–85 (2019)
Google Scholar
Akti, S., Tataroglu, G.A., Ekenel, H.K.: Vision-based fight detection from surveillance cameras. In: Proceedings of 9th International Conference on Image Process. Theory, Tools Application (IPTA), November 2019, pp. 1–6 (2019)
Google Scholar
Traoré, A., Akhloufi, M.A.: 2D bidirectional gated recurrent unit convolutional neural networks for end-to-end violence detection in videos. In: Campilho, A., Karray, F., Wang, Z. (eds.) ICIAR 2020. LNCS, vol. 12131, pp. 152–160. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-50347-5_14
Chapter Google Scholar
Huillcen Baca, H.A., Gutierrez Caceres, J.C., de Luz Palomino Valdivia, F.: Efficiency in human actions recognition in video surveillance using 3D CNN and DenseNet. In: Arai, K. (eds.) Advances in Information and Communication. FICC 2022. LNNS, vol. 438, pp. 342–355. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-98012-2_26
Cheng, M., Cai, K., Li, M.: RWF-2000: an open large scale video database for violence detection. arXiv preprint arXiv:1911.05913 (2019)
Bermejo Nievas, E., Deniz Suarez, O., Bueno García, G., Sukthankar, R.: Violence detection in video using computer vision techniques. In: Real, P., Diaz-Pernil, D., Molina-Abril, H., Berciano, A., Kropatsch, W. (eds.) CAIP 2011. LNCS, vol. 6855, pp. 332–339. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23678-5_39
Chapter Google Scholar
Su, Y., Lin, G., Zhu, J., Wu, Q.: Human interaction learning on 3D skeleton point clouds for video violence recognition. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12349, pp. 74–90. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58548-8_5
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Jose Maria Arguedas National University, Apurimac, Peru
Herwin Alayn Huillcen Baca, Flor de Luz Palomino Valdivia & Ivan Soria Solis
Micaela Bastidas University, Apurimac, Peru
Mario Aquino Cruz
San Agustin National University, Arequipa, Peru
Juan Carlos Gutierrez Caceres

Authors

Herwin Alayn Huillcen Baca
View author publications
You can also search for this author in PubMed Google Scholar
Flor de Luz Palomino Valdivia
View author publications
You can also search for this author in PubMed Google Scholar
Ivan Soria Solis
View author publications
You can also search for this author in PubMed Google Scholar
Mario Aquino Cruz
View author publications
You can also search for this author in PubMed Google Scholar
Juan Carlos Gutierrez Caceres
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Herwin Alayn Huillcen Baca .

Editor information

Editors and Affiliations

Faculty of Science and Engineering, Saga University, Saga, Japan
Kohei Arai

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Huillcen Baca, H.A., de Luz Palomino Valdivia, F., Solis, I.S., Cruz, M.A., Caceres, J.C.G. (2023). Human Violence Recognition in Video Surveillance in Real-Time. In: Arai, K. (eds) Advances in Information and Communication. FICC 2023. Lecture Notes in Networks and Systems, vol 652. Springer, Cham. https://doi.org/10.1007/978-3-031-28073-3_52

Download citation

DOI: https://doi.org/10.1007/978-3-031-28073-3_52
Published: 02 March 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-28072-6
Online ISBN: 978-3-031-28073-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Human Violence Recognition in Video Surveillance in Real-Time