Abstract
This paper proposes a real-time human action recognition approach to static video surveillance systems. This approach predicts human actions using temporal images and convolutional neural networks (CNN). CNN is a type of deep learning model that can automatically learn features from training videos. Although the state-of-the-art methods have shown high accuracy, they consume a lot of computational resources. Another problem is that many methods assume that exact knowledge of human positions. Moreover, most of the current methods build complex handcrafted features for specific classifiers. Therefore, these kinds of methods are difficult to apply in real-world applications. In this paper, a novel CNN model based on temporal images and a hierarchical action structure is developed for real-time human action recognition. The hierarchical action structure includes three levels: action layer, motion layer, and posture layer. The top layer represents subtle actions; the bottom layer represents posture. Each layer contains one CNN, which means that this model has three CNNs working together; layers are combined to represent many different kinds of action with a large degree of freedom. The developed approach was implemented and achieved superior performance for the ICVL action dataset; the algorithm can run at around 20 frames per second.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Oh, S., Hoogs, A., Perera, A., et al.: A large-scale benchmark dataset for event recognition in surveillance video. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, Rhode Island ,pp. 3153–3160 (2011)
Vahdat, A., Gao, B., Ranjbar, M., et al.: A discriminative key pose sequence model for recognizing human interactions. In: 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), pp. 1729–1736, Barcelona (2011)
Lan, T., Wang, Y., Yang, W., et al.: Discriminative latent models for recognizing contextual group activities. IEEE Trans. Pattern Anal. Mach. Intell. 34(8), 1549–1562 (2011)
Kim, I., Oh, S., Vahdat, A., et al.: Segmental multi-way local polling for video recognition. In: Proceedings of the 21st ACM International Conference on Multimedia, MM 2013, pp. 637–640, New York (2013)
Pirsiavash, H., Ramanan, D.: Detecting activities of daily living in first-person camera views. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, Rhode Island, pp. 2847–2854 (2011)
Ryoo, M.S., Matthies, L.: First-person activity recognition: what are they doing to me? In: 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Portland, Oregon, pp. 2730–2737 (2013)
Davis, J.W., Bobick, A.F.: The representation and recognition of human movement using temporal templates. In: 1997 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Juan, pp. 928–934 (1997)
Bobick, A.F., Davis, J.W.: The recognition of human movement using temporal templates. IEEE Trans. Pattern Anal. Mach. Intell. 3(3), 257–267 (2001)
Blank, M., Gorelick, L., Schechtman, E., et al.: Actions as space-time shapes. In: 2005 Tenth IEEE International Conference on Computer Vision (ICCV), Beijing, pp. 1395–1402 (2005)
Tang, K., Fei-Fei, L., Koller,.D.: Learning latent temporal structure for complex event detection. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, Rhode Island, pp. 1250–1257 (2012)
Wang, H., Schmid, C.: Action recognition with improved trajectories. In: 2013 IEEE International Conference on Computer Vision (ICCV), Sydney, New South Wales, pp. 3551–3558 (2013)
Jiang, Z., Lin, Z., Davis, L.S.: A unified tree-based framework for joint action localization, recognition and segmentation. Comput. Vis. Image Underst. 117(10), 1345–1355 (2013)
Felzenszwalb, P., McAllester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Anchorage, Alaska, pp. 1–8 (2008)
Marszalek, M., Laptev, I., Schmid, C.: Actions in context. In: 2009 IEEE Conference on Computer vision and Pattern Recognition (CVPR), Miami, Florida, pp. 2929–2936 (2009)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, vol. 25 (2012)
Ji, S., Xu, W., Yang, M., et al.: 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 221–231 (2013)
Toshev, A., Szegedy, C.: DeepPose: human pose estimation via deep neural networks. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, Ohio, pp. 1653–1660 (2014)
Sun, L., Jia, K., Chan, T., et al.: DL-SFA: deeply-learned slow feature analysis for action recognition. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, Ohio, pp. 2625–2632 (2014)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: 2005 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), San Diego, California, pp. 886–893 (2005)
Acknowledgements
This research was funded by the MSIP (Ministry of Science, ICT & Future Planning), Korea in the ICT R&D Program 2015 (Project ID: 1391203002-130010200).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Jin, CB., Li, S., Do, T.D., Kim, H. (2015). Real-Time Human Action Recognition Using CNN Over Temporal Images for Static Video Surveillance Cameras. In: Ho, YS., Sang, J., Ro, Y., Kim, J., Wu, F. (eds) Advances in Multimedia Information Processing -- PCM 2015. PCM 2015. Lecture Notes in Computer Science(), vol 9315. Springer, Cham. https://doi.org/10.1007/978-3-319-24078-7_33
Download citation
DOI: https://doi.org/10.1007/978-3-319-24078-7_33
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24077-0
Online ISBN: 978-3-319-24078-7
eBook Packages: Computer ScienceComputer Science (R0)