Real-Time Human Action Recognition Using CNN Over Temporal Images for Static Video Surveillance Cameras | SpringerLink
Skip to main content

Real-Time Human Action Recognition Using CNN Over Temporal Images for Static Video Surveillance Cameras

  • Conference paper
  • First Online:
Advances in Multimedia Information Processing -- PCM 2015 (PCM 2015)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9315))

Included in the following conference series:

Abstract

This paper proposes a real-time human action recognition approach to static video surveillance systems. This approach predicts human actions using temporal images and convolutional neural networks (CNN). CNN is a type of deep learning model that can automatically learn features from training videos. Although the state-of-the-art methods have shown high accuracy, they consume a lot of computational resources. Another problem is that many methods assume that exact knowledge of human positions. Moreover, most of the current methods build complex handcrafted features for specific classifiers. Therefore, these kinds of methods are difficult to apply in real-world applications. In this paper, a novel CNN model based on temporal images and a hierarchical action structure is developed for real-time human action recognition. The hierarchical action structure includes three levels: action layer, motion layer, and posture layer. The top layer represents subtle actions; the bottom layer represents posture. Each layer contains one CNN, which means that this model has three CNNs working together; layers are combined to represent many different kinds of action with a large degree of freedom. The developed approach was implemented and achieved superior performance for the ICVL action dataset; the algorithm can run at around 20 frames per second.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 5719
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 7149
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Oh, S., Hoogs, A., Perera, A., et al.: A large-scale benchmark dataset for event recognition in surveillance video. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, Rhode Island ,pp. 3153–3160 (2011)

    Google Scholar 

  2. Vahdat, A., Gao, B., Ranjbar, M., et al.: A discriminative key pose sequence model for recognizing human interactions. In: 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), pp. 1729–1736, Barcelona (2011)

    Google Scholar 

  3. Lan, T., Wang, Y., Yang, W., et al.: Discriminative latent models for recognizing contextual group activities. IEEE Trans. Pattern Anal. Mach. Intell. 34(8), 1549–1562 (2011)

    Article  Google Scholar 

  4. Kim, I., Oh, S., Vahdat, A., et al.: Segmental multi-way local polling for video recognition. In: Proceedings of the 21st ACM International Conference on Multimedia, MM 2013, pp. 637–640, New York (2013)

    Google Scholar 

  5. Pirsiavash, H., Ramanan, D.: Detecting activities of daily living in first-person camera views. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, Rhode Island, pp. 2847–2854 (2011)

    Google Scholar 

  6. Ryoo, M.S., Matthies, L.: First-person activity recognition: what are they doing to me? In: 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Portland, Oregon, pp. 2730–2737 (2013)

    Google Scholar 

  7. Davis, J.W., Bobick, A.F.: The representation and recognition of human movement using temporal templates. In: 1997 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Juan, pp. 928–934 (1997)

    Google Scholar 

  8. Bobick, A.F., Davis, J.W.: The recognition of human movement using temporal templates. IEEE Trans. Pattern Anal. Mach. Intell. 3(3), 257–267 (2001)

    Article  Google Scholar 

  9. Blank, M., Gorelick, L., Schechtman, E., et al.: Actions as space-time shapes. In: 2005 Tenth IEEE International Conference on Computer Vision (ICCV), Beijing, pp. 1395–1402 (2005)

    Google Scholar 

  10. Tang, K., Fei-Fei, L., Koller,.D.: Learning latent temporal structure for complex event detection. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, Rhode Island, pp. 1250–1257 (2012)

    Google Scholar 

  11. Wang, H., Schmid, C.: Action recognition with improved trajectories. In: 2013 IEEE International Conference on Computer Vision (ICCV), Sydney, New South Wales, pp. 3551–3558 (2013)

    Google Scholar 

  12. Jiang, Z., Lin, Z., Davis, L.S.: A unified tree-based framework for joint action localization, recognition and segmentation. Comput. Vis. Image Underst. 117(10), 1345–1355 (2013)

    Article  Google Scholar 

  13. Felzenszwalb, P., McAllester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Anchorage, Alaska, pp. 1–8 (2008)

    Google Scholar 

  14. Marszalek, M., Laptev, I., Schmid, C.: Actions in context. In: 2009 IEEE Conference on Computer vision and Pattern Recognition (CVPR), Miami, Florida, pp. 2929–2936 (2009)

    Google Scholar 

  15. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, vol. 25 (2012)

    Google Scholar 

  16. Ji, S., Xu, W., Yang, M., et al.: 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 221–231 (2013)

    Article  Google Scholar 

  17. Toshev, A., Szegedy, C.: DeepPose: human pose estimation via deep neural networks. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, Ohio, pp. 1653–1660 (2014)

    Google Scholar 

  18. Sun, L., Jia, K., Chan, T., et al.: DL-SFA: deeply-learned slow feature analysis for action recognition. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, Ohio, pp. 2625–2632 (2014)

    Google Scholar 

  19. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: 2005 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), San Diego, California, pp. 886–893 (2005)

    Google Scholar 

Download references

Acknowledgements

This research was funded by the MSIP (Ministry of Science, ICT & Future Planning), Korea in the ICT R&D Program 2015 (Project ID: 1391203002-130010200).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hakil Kim .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Jin, CB., Li, S., Do, T.D., Kim, H. (2015). Real-Time Human Action Recognition Using CNN Over Temporal Images for Static Video Surveillance Cameras. In: Ho, YS., Sang, J., Ro, Y., Kim, J., Wu, F. (eds) Advances in Multimedia Information Processing -- PCM 2015. PCM 2015. Lecture Notes in Computer Science(), vol 9315. Springer, Cham. https://doi.org/10.1007/978-3-319-24078-7_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-24078-7_33

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-24077-0

  • Online ISBN: 978-3-319-24078-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics