Real-Time Human Action Recognition Using CNN Over Temporal Images for Static Video Surveillance Cameras

Jin, Cheng-Bin; Li, Shengzhe; Do, Trung Dung; Kim, Hakil

doi:10.1007/978-3-319-24078-7_33

Cheng-Bin Jin¹⁸,
Shengzhe Li¹⁸,
Trung Dung Do¹⁸ &
…
Hakil Kim¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9315))

Included in the following conference series:

Pacific Rim Conference on Multimedia

Abstract

This paper proposes a real-time human action recognition approach to static video surveillance systems. This approach predicts human actions using temporal images and convolutional neural networks (CNN). CNN is a type of deep learning model that can automatically learn features from training videos. Although the state-of-the-art methods have shown high accuracy, they consume a lot of computational resources. Another problem is that many methods assume that exact knowledge of human positions. Moreover, most of the current methods build complex handcrafted features for specific classifiers. Therefore, these kinds of methods are difficult to apply in real-world applications. In this paper, a novel CNN model based on temporal images and a hierarchical action structure is developed for real-time human action recognition. The hierarchical action structure includes three levels: action layer, motion layer, and posture layer. The top layer represents subtle actions; the bottom layer represents posture. Each layer contains one CNN, which means that this model has three CNNs working together; layers are combined to represent many different kinds of action with a large degree of freedom. The developed approach was implemented and achieved superior performance for the ICVL action dataset; the algorithm can run at around 20 frames per second.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 5719; Price includes VAT (Japan)

Softcover Book: JPY 7149; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

RETRACTED ARTICLE: Human action recognition using a hybrid deep learning heuristic

Article 28 August 2021

Action Recognition in Surveillance Video Using ConvNets and Motion History Image

Fusion of spatial and dynamic CNN streams for action recognition

Article 23 March 2021

References

Oh, S., Hoogs, A., Perera, A., et al.: A large-scale benchmark dataset for event recognition in surveillance video. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, Rhode Island ,pp. 3153–3160 (2011)
Google Scholar
Vahdat, A., Gao, B., Ranjbar, M., et al.: A discriminative key pose sequence model for recognizing human interactions. In: 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), pp. 1729–1736, Barcelona (2011)
Google Scholar
Lan, T., Wang, Y., Yang, W., et al.: Discriminative latent models for recognizing contextual group activities. IEEE Trans. Pattern Anal. Mach. Intell. 34(8), 1549–1562 (2011)
Article Google Scholar
Kim, I., Oh, S., Vahdat, A., et al.: Segmental multi-way local polling for video recognition. In: Proceedings of the 21st ACM International Conference on Multimedia, MM 2013, pp. 637–640, New York (2013)
Google Scholar
Pirsiavash, H., Ramanan, D.: Detecting activities of daily living in first-person camera views. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, Rhode Island, pp. 2847–2854 (2011)
Google Scholar
Ryoo, M.S., Matthies, L.: First-person activity recognition: what are they doing to me? In: 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Portland, Oregon, pp. 2730–2737 (2013)
Google Scholar
Davis, J.W., Bobick, A.F.: The representation and recognition of human movement using temporal templates. In: 1997 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Juan, pp. 928–934 (1997)
Google Scholar
Bobick, A.F., Davis, J.W.: The recognition of human movement using temporal templates. IEEE Trans. Pattern Anal. Mach. Intell. 3(3), 257–267 (2001)
Article Google Scholar
Blank, M., Gorelick, L., Schechtman, E., et al.: Actions as space-time shapes. In: 2005 Tenth IEEE International Conference on Computer Vision (ICCV), Beijing, pp. 1395–1402 (2005)
Google Scholar
Tang, K., Fei-Fei, L., Koller,.D.: Learning latent temporal structure for complex event detection. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, Rhode Island, pp. 1250–1257 (2012)
Google Scholar
Wang, H., Schmid, C.: Action recognition with improved trajectories. In: 2013 IEEE International Conference on Computer Vision (ICCV), Sydney, New South Wales, pp. 3551–3558 (2013)
Google Scholar
Jiang, Z., Lin, Z., Davis, L.S.: A unified tree-based framework for joint action localization, recognition and segmentation. Comput. Vis. Image Underst. 117(10), 1345–1355 (2013)
Article Google Scholar
Felzenszwalb, P., McAllester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Anchorage, Alaska, pp. 1–8 (2008)
Google Scholar
Marszalek, M., Laptev, I., Schmid, C.: Actions in context. In: 2009 IEEE Conference on Computer vision and Pattern Recognition (CVPR), Miami, Florida, pp. 2929–2936 (2009)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, vol. 25 (2012)
Google Scholar
Ji, S., Xu, W., Yang, M., et al.: 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 221–231 (2013)
Article Google Scholar
Toshev, A., Szegedy, C.: DeepPose: human pose estimation via deep neural networks. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, Ohio, pp. 1653–1660 (2014)
Google Scholar
Sun, L., Jia, K., Chan, T., et al.: DL-SFA: deeply-learned slow feature analysis for action recognition. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, Ohio, pp. 2625–2632 (2014)
Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: 2005 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), San Diego, California, pp. 886–893 (2005)
Google Scholar

Download references

Acknowledgements

This research was funded by the MSIP (Ministry of Science, ICT & Future Planning), Korea in the ICT R&D Program 2015 (Project ID: 1391203002-130010200).

Author information

Authors and Affiliations

Information and Communication Engineering, Inha University, Incheon, Korea
Cheng-Bin Jin, Shengzhe Li, Trung Dung Do & Hakil Kim

Authors

Cheng-Bin Jin
View author publications
You can also search for this author in PubMed Google Scholar
Shengzhe Li
View author publications
You can also search for this author in PubMed Google Scholar
Trung Dung Do
View author publications
You can also search for this author in PubMed Google Scholar
Hakil Kim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hakil Kim .

Editor information

Editors and Affiliations

Gwangju Institute of Science and Technology, Gwangju, Korea (Republic of)
Yo-Sung Ho
Chinese Academy of Sciences, Institute of Automation, Beijing, China
Jitao Sang
KAIST, Daejeon, Korea (Republic of)
Yong Man Ro
KAIST, Daejeon, Korea (Republic of)
Junmo Kim
College of Computer Science, Zhejiang University, Hangzhou, China
Fei Wu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jin, CB., Li, S., Do, T.D., Kim, H. (2015). Real-Time Human Action Recognition Using CNN Over Temporal Images for Static Video Surveillance Cameras. In: Ho, YS., Sang, J., Ro, Y., Kim, J., Wu, F. (eds) Advances in Multimedia Information Processing -- PCM 2015. PCM 2015. Lecture Notes in Computer Science(), vol 9315. Springer, Cham. https://doi.org/10.1007/978-3-319-24078-7_33

Download citation

DOI: https://doi.org/10.1007/978-3-319-24078-7_33
Published: 15 December 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24077-0
Online ISBN: 978-3-319-24078-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics