{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,5,6]],"date-time":"2025-05-06T07:26:19Z","timestamp":1746516379853,"version":"3.37.3"},"reference-count":43,"publisher":"MDPI AG","issue":"11","license":[{"start":{"date-parts":[[2019,5,30]],"date-time":"2019-05-30T00:00:00Z","timestamp":1559174400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"The worldwide utilization of surveillance cameras in smart cities has enabled researchers to analyze a gigantic volume of data to ensure automatic monitoring. An enhanced security system in smart cities, schools, hospitals, and other surveillance domains is mandatory for the detection of violent or abnormal activities to avoid any casualties which could cause social, economic, and ecological damages. Automatic detection of violence for quick actions is very significant and can efficiently assist the concerned departments. In this paper, we propose a triple-staged end-to-end deep learning violence detection framework. First, persons are detected in the surveillance video stream using a light-weight convolutional neural network (CNN) model to reduce and overcome the voluminous processing of useless frames. Second, a sequence of 16 frames with detected persons is passed to 3D CNN, where the spatiotemporal features of these sequences are extracted and fed to the Softmax classifier. Furthermore, we optimized the 3D CNN model using an open visual inference and neural networks optimization toolkit developed by Intel, which converts the trained model into intermediate representation and adjusts it for optimal execution at the end platform for the final prediction of violent activity. After detection of a violent activity, an alert is transmitted to the nearest police station or security department to take prompt preventive actions. We found that our proposed method outperforms the existing state-of-the-art methods for different benchmark datasets.<\/jats:p>","DOI":"10.3390\/s19112472","type":"journal-article","created":{"date-parts":[[2019,5,30]],"date-time":"2019-05-30T15:07:44Z","timestamp":1559228864000},"page":"2472","source":"Crossref","is-referenced-by-count":170,"title":["Violence Detection Using Spatiotemporal Features with 3D Convolutional Neural Network"],"prefix":"10.3390","volume":"19","author":[{"given":"Fath U Min","family":"Ullah","sequence":"first","affiliation":[{"name":"Intelligent Media Laboratory, Digital Contents Research Institute, Sejong University, Seoul 143-747, Korea"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7538-2689","authenticated-orcid":false,"given":"Amin","family":"Ullah","sequence":"additional","affiliation":[{"name":"Intelligent Media Laboratory, Digital Contents Research Institute, Sejong University, Seoul 143-747, Korea"}]},{"given":"Khan","family":"Muhammad","sequence":"additional","affiliation":[{"name":"Department of Software, Sejong University, Seoul 143-747, Korea"}]},{"given":"Ijaz Ul","family":"Haq","sequence":"additional","affiliation":[{"name":"Intelligent Media Laboratory, Digital Contents Research Institute, Sejong University, Seoul 143-747, Korea"}]},{"given":"Sung Wook","family":"Baik","sequence":"additional","affiliation":[{"name":"Intelligent Media Laboratory, Digital Contents Research Institute, Sejong University, Seoul 143-747, Korea"}]}],"member":"1968","published-online":{"date-parts":[[2019,5,30]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Batchuluun, G., Kim, Y., Kim, J., Hong, H., and Park, K. (2016). Robust behavior recognition in intelligent surveillance environments. Sensors, 16.","DOI":"10.3390\/s16071010"},{"key":"ref_2","unstructured":"Ullah, A., Muhammad, K., Del Ser, J., Baik, S.W., and Albuquerque, V. (2018). Activity Recognition using Temporal Optical Flow Convolutional Features and Multi-Layer LSTM. IEEE Trans. Ind. Electron."},{"key":"ref_3","unstructured":"Muhammad, K., Ahmad, J., Lv, Z., Bellavista, P., Yang, P., and Baik, S.W. (2018). Efficient Deep CNN-Based Fire Detection and Localization in Video Surveillance Applications. IEEE Trans. Syst. Man Cybern. Syst."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Ullah, A., Muhammad, K., Haq, I.U., and Baik, S.W. (2019). Action recognition using optimized deep autoencoder and CNN for surveillance data streams of non-stationary environments. Future Gener. Comput. Syst.","DOI":"10.1016\/j.future.2019.01.029"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Muhammad, K., Khan, S., Elhoseny, M., Ahmed, S.H., and Baik, S.W. (2019). Efficient Fire Detection for Uncertain Surveillance Environment. IEEE Trans. Ind. Inform.","DOI":"10.1109\/TII.2019.2897594"},{"key":"ref_6","unstructured":"Greenfield, M. (2018, April 25). Change in the Number of Closed-Circuit Television (CCTV) Cameras in Public Places in South Korea. Available online: https:\/\/www.statista.com\/statistics\/651509\/south-korea-cctv-cameras\/."},{"key":"ref_7","unstructured":"Nievas, E.B., Suarez, O.D., Garc\u00eda, G.B., and Sukthankar, R. (2011, January 29\u201331). Violence detection in video using computer vision techniques. Proceedings of the International Conference on Computer Analysis of Images and Patterns, Seville, Spain."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Khan, S., Muhammad, K., Mumtaz, S., Baik, S.W., and de Albuquerque, V.H.C. (2019). Energy-Efficient Deep CNN for Smoke Detection in Foggy IoT Environment. IEEE Internet Things J.","DOI":"10.1109\/JIOT.2019.2896120"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"174","DOI":"10.1016\/j.jocs.2018.12.003","article-title":"Multi-grade brain tumor classification using deep CNN with extensive data augmentation","volume":"30","author":"Sajjad","year":"2019","journal-title":"J. Comput. Sci."},{"key":"ref_10","unstructured":"Sajjad, M., Khan, S., Hussain, T., Muhammad, K., Sangaiah, A.K., Castiglione, A., Esposito, C., and Baik, S.W. (2018). CNN-based anti-spoofing two-tier multi-factor authentication system. Pattern Recognit. Lett."},{"key":"ref_11","unstructured":"Datta, A., Shah, M., and Lobo, N.D.V. (2002, January 11\u201315). Person-on-person violence detection in video data. Proceedings of the 16th International Conference on Pattern Recognition, Quebec, QC, Canada."},{"key":"ref_12","unstructured":"Nguyen, N.T., Phung, D.Q., Venkatesh, S., and Bui, H. (2005, January 20\u201325). Learning and detecting activities from movement trajectories using the hierarchical hidden Markov model. Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Diego, CA, USA."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Mahadevan, V., Li, W., Bhalodia, V., and Vasconcelos, N. (2010, January 13\u201318). Anomaly detection in crowded scenes. Proceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), San Francisco, CA, USA.","DOI":"10.1109\/CVPR.2010.5539872"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Hassner, T., Itcher, Y., and Kliper-Gross, O. (2012, January 16\u201321). Violent flows: Real-time detection of violent crowd behavior. Proceedings of the 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Providence, RI, USA.","DOI":"10.1109\/CVPRW.2012.6239348"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Huang, J.-F., and Chen, S.-L. (2014, January 19\u201321). Detection of violent crowd behavior based on statistical characteristics of the optical flow. Proceedings of the 2014 11th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD), Xiamen, China.","DOI":"10.1109\/FSKD.2014.6980896"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"7327","DOI":"10.1007\/s11042-015-2648-8","article-title":"A new method for violence detection in surveillance scenes","volume":"75","author":"Zhang","year":"2016","journal-title":"Multimed. Tools Appl."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"37","DOI":"10.1016\/j.imavis.2016.01.006","article-title":"Violence detection using oriented violent flows","volume":"48","author":"Gao","year":"2016","journal-title":"Image Vis. Comput."},{"key":"ref_18","unstructured":"Chen, D., Wactlar, H., Chen, M.-Y., Gao, C., Bharucha, A., and Hauptmann, A. (2008, January 20\u201325). Recognition of aggressive human behavior using binary local motion descriptors. Proceedings of the 2008 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Vancouver, BC, Canada."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"De Souza, F.D., Chavez, G.C., do Valle Jr, E.A., and Ara\u00fajo, A.d.A. (September, January 30). Violence detection in video using spatio-temporal features. Proceedings of the 2010 23rd SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI), Gramado, Brazil.","DOI":"10.1109\/SIBGRAPI.2010.38"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Xu, L., Gong, C., Yang, J., Wu, Q., and Yao, L. (2014, January 4\u20139). Violent video detection based on MoSIFT feature and sparse coding. Proceedings of the ICASSP, Florence, Italy.","DOI":"10.1109\/ICASSP.2014.6854259"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"361","DOI":"10.1007\/s00138-017-0830-x","article-title":"Detecting violent and abnormal crowd activity using temporal analysis of grey level co-occurrence matrix (GLCM)-based texture measures","volume":"28","author":"Lloyd","year":"2017","journal-title":"Mach. Vis. Appl."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"130","DOI":"10.1108\/IJPCC-02-2017-0018","article-title":"Automatic fight detection in surveillance videos","volume":"13","author":"Fu","year":"2017","journal-title":"Int. J. Pervasive Comput. Commun."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Sudhakaran, S., and Lanz, O. (September, January 29). Learning to detect violent videos using convolutional long short-term memory. Proceedings of the 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Lecce, Italy.","DOI":"10.1109\/AVSS.2017.8078468"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"121","DOI":"10.1016\/j.eswa.2019.02.032","article-title":"A classification method based on optical flow for violence detection","volume":"127","author":"Mahmoodi","year":"2019","journal-title":"Expert Syst. Appl."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"191","DOI":"10.1016\/j.comnet.2019.01.028","article-title":"Real time Violence Detection Framework for Football Stadium comprising of Big Data Analysis and Deep Learning through Bidirectional LSTM","volume":"151","author":"Fenil","year":"2019","journal-title":"Comput. Netw."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Zhou, P., Ding, Q., Luo, H., and Hou, X. (2018). Violence detection in surveillance video using low-level features. PLoS ONE, 13.","DOI":"10.1371\/journal.pone.0203668"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"108","DOI":"10.1016\/j.eswa.2017.03.052","article-title":"Fuzzy system based human behavior recognition by combining behavior prediction and recognition","volume":"81","author":"Batchuluun","year":"2017","journal-title":"Expert Syst. Appl."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"416","DOI":"10.1016\/j.ins.2018.07.027","article-title":"Raspberry Pi assisted facial expression recognition framework for smart security in law-enforcement services","volume":"479","author":"Sajjad","year":"2018","journal-title":"Inf. Sci."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"1155","DOI":"10.1109\/ACCESS.2017.2778011","article-title":"Action recognition in video sequences using deep Bi-directional LSTM with CNN features","volume":"6","author":"Ullah","year":"2018","journal-title":"IEEE Access"},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"8181","DOI":"10.1109\/ACCESS.2018.2889442","article-title":"Multiple Object Tracking via Feature Pyramid Siamese Networks","volume":"7","author":"Lee","year":"2019","journal-title":"IEEE Access"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"9265","DOI":"10.1109\/ACCESS.2018.2890560","article-title":"DeepStar: Detecting Starring Characters in Movies","volume":"7","author":"Haq","year":"2019","journal-title":"IEEE Access"},{"key":"ref_32","unstructured":"Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., and Adam, H. (2017). Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv."},{"key":"ref_33","unstructured":"Simonyan, K., and Zisserman, A. (2014, January 8\u201313). Two-stream convolutional networks for action recognition in videos. Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., and Fei-Fei, L. (2014, January 23\u201328). Large-scale video classification with convolutional neural networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.223"},{"key":"ref_35","unstructured":"Shou, Z., Wang, D., and Chang, S.-F. (July, January 26). Temporal action localization in untrimmed videos via multi-stage cnns. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Tran, D., Bourdev, L., Fergus, R., Torresani, L., and Paluri, M. (2015, January 7\u201313). Learning spatiotemporal features with 3d convolutional networks. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.510"},{"key":"ref_37","unstructured":"Tran, D., Bourdev, L., Fergus, R., Torresani, L., and Paluri, M. (July, January 26). Deep end2end voxel2voxel prediction. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Las Vegas, NV, USA."},{"key":"ref_38","unstructured":"Muhammad, K., Hussain, T., and Baik, S.W. (2018). Efficient CNN based summarization of surveillance videos for resource-constrained devices. Pattern Recognit. Lett."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., and Darrell, T. (2014, January 3\u20137). Caffe: Convolutional architecture for fast feature embedding. Proceedings of the 22nd ACM International Conference on Multimedia, Orlando, FL, USA.","DOI":"10.1145\/2647868.2654889"},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"4787","DOI":"10.1109\/TIP.2018.2845742","article-title":"Fight Recognition in video using Hough Forests and 2D Convolutional Neural Network","volume":"27","author":"Serrano","year":"2018","journal-title":"IEEE Trans. Image Process."},{"key":"ref_41","unstructured":"Gracia, I.S., Suarez, O.D., Garcia, G.B., and Kim, T.-K. (2015). Fast fight detection. PLoS ONE, 10."},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"1999","DOI":"10.1007\/s13042-017-0682-8","article-title":"Detection and localization of crowd behavior using a novel tracklet-based model","volume":"9","author":"Rabiee","year":"2018","journal-title":"Int. J. Mach. Learn. Cybern."},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Bilinski, P., and Bremond, F. (2016, January 23\u201326). Human violence recognition and detection in surveillance videos. Proceedings of the 2016 13th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Colorado Springs, CO, USA.","DOI":"10.1109\/AVSS.2016.7738019"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/19\/11\/2472\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,7,18]],"date-time":"2024-07-18T22:00:19Z","timestamp":1721340019000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/19\/11\/2472"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,5,30]]},"references-count":43,"journal-issue":{"issue":"11","published-online":{"date-parts":[[2019,6]]}},"alternative-id":["s19112472"],"URL":"https:\/\/doi.org\/10.3390\/s19112472","relation":{},"ISSN":["1424-8220"],"issn-type":[{"type":"electronic","value":"1424-8220"}],"subject":[],"published":{"date-parts":[[2019,5,30]]}}}