An Attention-Aware Model for Human Action Recognition on Tree-Based Skeleton Sequences | SpringerLink
Skip to main content

An Attention-Aware Model for Human Action Recognition on Tree-Based Skeleton Sequences

  • Conference paper
  • First Online:
Social Robotics (ICSR 2018)

Abstract

Skeleton-based human action recognition (HAR) has attracted a lot of research attentions because of robustness to variations of locations and appearances. However, most existing methods treat the whole skeleton as a fixed pattern, in which the importance of different skeleton joints for action recognition is not considered. In this paper, a novel CNN-based attention-ware network is proposed. First, to describe the semantic meaning of skeletons and learn the discriminative joints over time, an attention generate network named Global Attention Network (GAN) is proposed to generate attention masks. Then, to encode the spatial structure of skeleton sequences, we design a tree-based traversal (TTTM) rule, which can represent the skeleton structure, as a convolution unit of main network. Finally, the GAN and main network are cascaded as a whole network which is trained in an end-to-end manner. Experiments show that the TTTM and GAN are supplemented each other, and the whole network achieves an efficient improvement over the state-of-the-arts, e.g., the classification accuracy of this network was 83.6% and 89.5% on NTU-RGBD CV and CS dataset, which outperforms any other methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 5719
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 7149
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Baxter, R.H., Robertson, N.M., Lane, D.M.: Human behavior recognition in data-scarce domains. Pattern Recognit. 48(8), 2377–2393 (2015)

    Article  Google Scholar 

  2. Chen, H., Wang, G., Xue, J., He, L.: A novel hierarchical framework for human action recognition. Pattern Recognit. 55(C), 148–159 (2016)

    Article  Google Scholar 

  3. Zhang, Z.: Microsoft Kinect sensor and its effect. IEEE Multimed. 19(2), 4–10 (2012)

    Article  Google Scholar 

  4. Ding, M., Fan, G.: Multilayer joint gait-pose manifolds for human gait motion modeling. IEEE Trans. Cybern. 45(11), 1–8 (2015)

    Article  Google Scholar 

  5. Yao, A., Gall, J., Fanelli, G., Gool, L.-V.: Does human action recognition benefit from pose estimation? In: British Machine Vision Conference, pp. 67.1–67.11. British Machine Vision Association (2011)

    Google Scholar 

  6. Vemulapalli, R., Arrate, F., Chellappa, R.: Human action recognition by representing 3D skeletons as points in a lie group. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 588–595. IEEE, Columbus (2014)

    Google Scholar 

  7. Yang, X., Tian, Y.: Eigen joints-based action recognition using Naïve-Bayes-nearest-neighbor. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 14–19. IEEE, Providence (2012)

    Google Scholar 

  8. Ke, Q., Bennamoun, M., An, S., Sohel, F., Boussaid, F.: Learning clip representations for skeleton-based 3D action recognition. IEEE Trans. Image Process. 27, 2842–2855 (2018)

    Article  MathSciNet  Google Scholar 

  9. Song, S., Lan, C., Xing, J., Zeng, W., Liu, J.: An end-to-end spatio-temporal attention model for human action recognition from skeleton data. In: AAAI Conference on Artificial Intelligence, pp. 4263–4270. AAAI, San Francisco (2017)

    Google Scholar 

  10. Pascanu, R., Gulcehre, C., Cho, K., Bengio, Y.: How to construct deep recurrent neural networks. In: The 30th International Conference on Machine Learning, Beijing, China (2014)

    Google Scholar 

  11. Tu, Z., et al.: Multi-stream CNN: learning representations based on human-related regions for action recognition. Pattern Recognit. 79, 32–43 (2018)

    Article  Google Scholar 

  12. Kim, T.-S., Reiter. A.: Interpretable 3D human action analysis with temporal convolutional networks. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 1623–1631. IEEE Computer Society, Honolulu (2017)

    Google Scholar 

  13. Ding, W., Liu, K., Belyaev, E., Cheng, F.: Tensor-based linear dynamical systems for action recognition from 3D skeletons. Pattern Recognit. 77, 75–86 (2018)

    Article  Google Scholar 

  14. Yan, S., Xiong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition. arXiv preprint arXiv:1801.07455 (2018)

  15. Yang, Y., Ramanan, D.: Articulated pose estimation with flexible mixtures-of-parts. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1385–1392. IEEE, Colorado Springs (2011)

    Google Scholar 

  16. Liu, J., Shahroudy, A., Xu, D., Chichung, A.-K., Wang, G.: Skeleton-based action recognition using spatio-temporal LSTM network with trust gates. IEEE Trans. Pattern Anal. Mach. Intell. PP(99), 1 (2017)

    Google Scholar 

  17. Cayley, A.: XXVIII. On the theory of the analytical forms called trees. Lond. Edinb. Dublin Philos. Mag. J. Sci. 13(85), 172–176 (1857)

    Article  Google Scholar 

  18. Shahroudy, A., Liu, J., Ng, T., Wang, G.: NTU RGB+ D: a large scale dataset for 3D human activity analysis. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1010–1019. IEEE, Las Vegas (2016)

    Google Scholar 

  19. Kay, W., et al.: The kinetics human action video dataset. arXiv preprint: arXiv:1705.06950 (2017)

  20. Cao, Z., Simon, T., Wei, S., Sheikh, Y.: Realtime multi-person 2D pose estimation using part affinity fields. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1302–1310. IEEE, Honolulu (2017)

    Google Scholar 

  21. Veeriah, V., Zhuang, N., Qi, G.: Differential recurrent neural networks for action recognition. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4041–4049. IEEE, Honolulu (2017)

    Google Scholar 

  22. Du, Y., Wang, W., Wang, L.: Hierarchical recurrent neural network for skeleton based action recognition. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1110–1118. IEEE, Honolulu (2017)

    Google Scholar 

  23. Liu, J., Shahroudy, A., Xu, D., Wang, G.: Spatio-temporal LSTM with Trust gates for 3D human action recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 816–833. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46487-9_50

    Chapter  Google Scholar 

  24. Ke, Q., Bennamoun, M., An, S., Sohel, F., Boussaid, F.: A new representation of skeleton sequences for 3D action recognition. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4570–4579. IEEE, Honolulu (2017)

    Google Scholar 

Download references

Acknowledgments

This work is supported by National Natural Science Foundation of China (NSFC, No. U1613209), Scientific Research Project of Shenzhen City (No. JCYJ20170306164 738129, CKCY2017050810242781).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hong Liu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ding, R., Liu, C., Liu, H. (2018). An Attention-Aware Model for Human Action Recognition on Tree-Based Skeleton Sequences. In: Ge, S., et al. Social Robotics. ICSR 2018. Lecture Notes in Computer Science(), vol 11357. Springer, Cham. https://doi.org/10.1007/978-3-030-05204-1_56

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-05204-1_56

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-05203-4

  • Online ISBN: 978-3-030-05204-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics