Abstract
This paper presents a method of human action recognition based on the key points of skeleton, aiming to guide the robot to follow a leader in complex environments. We propose a two-stage human pose estimation model which combines the Single Shot Detector (SSD) algorithm based on ResNet with Convolutional Pose Machines (CPMs) to obtain the key points positions of the human skeleton in 2D images. Based on the position information, we construct structure vectors. Feature models consisting of eight angle features and four modulus ratio features are then extracted as the representation of actions. Finally, multi-classification SVM is used to classify the feature models for action recognition. The experimental results demonstrate the validity of the two-stage human pose estimation model to accomplish the task of human action recognition. Our method achieves 97% recognition accuracy on the self-collected dataset composed of six command actions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Johansson, G.: Visual perception of biological motion and a model for its analysis. Percept. Psychophys. 14(2), 201–211 (1973)
Vieira, A.W., Schwartz, W.R., Campos, M., et al.: Distance matrices as invariant features for classifying MoCap data. In: International Conference on Pattern Recognition, pp. 2934–2937. IEEE (2012)
Barnachon, M., Bouakaz, S., Boufama, B., et al.: Human actions recognition from streamed Motion Capture. In: International Conference on Pattern Recognition, pp. 3807–3810. IEEE (2012)
Ma, H.T., Zhang, X., Yang, H., et al.: SVM-based approach for human daily motion recognition. In: TENCON 2015–2015 IEEE Region 10 Conference, pp. 1–4. IEEE (2015)
Shotton, J., Fitzgibbon, A., Cook, M., et al.: Real-time human pose recognition in parts from single depth images. In: 24th Computer Vision and Pattern Recognition, pp. 1297–1304. IEEE, Piscataway (2011)
Lu, X., Chen, C.-C., Aggarwal, J.K.: View invariant human action recognition using histograms of 3D joints. In: Computer Vision and Pattern Recognition Workshops, pp. 20–27. IEEE (2012)
Li, W., Zhang, Z., Liu, Z.: Action recognition based on a bag of 3D points. In: Computer Vision and Pattern Recognition Workshops, pp. 9–14. IEEE (2010)
Yang, X., Tian Y.: EigenJoints-based action recognition using naïve bayes nearest neighbor. In: Computer Vision and Pattern Recognition Workshops, pp. 14–19. IEEE (2010)
Yang, X., Tian, Y.: Effective 3D action recognition using EigenJoints. J. Vis. Commun. Image Represent. 25(1), 2–11 (2014)
Lu, G., Zhou, Y., Li, X., et al.: Efficient action recognition via local position offset of 3d skeletal body joints. Multimedia Tools Appl. 75(6), 3479–3494 (2016)
Liu, J., Shahroudy, A., Dong, X., Wang, G.: Spatio-temporal LSTM with trust gates for 3D human action recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 816–833. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46487-9_50
Li, C., Zhong, X., Xie, D., et al.: Co-occurrence feature learning from skeleton data for action recognition and detection with hierarchical aggregation. In: 27th International Joint Conference on Artificial Intelligence, pp. 3807–3810. IEEE (2018)
Wei, S.E., Ramakrishna, V., Kanade, T., et al.: Convolutional pose machines. In: Conference on Computer Vision and Pattern Recognition, pp. 4727–4732. IEEE (2016)
Chen, Y., Wang, Z., Peng, Y., et al.: Cascaded pyramid network for multi-person pose estimation. In: Conference on Pattern Recognition, pp. 7103–7112. IEEE (2018)
Bin, X., Wu, H., Wei, Y.: Simple baselines for human pose estimation and tracking. In: European Conference on Computer Vision, pp. 472-487. IEEE (2018)
Liu, W., Anguelov, D., Erhan, D., et al.: SSD: single shot multibox detector. In: European Conference on Computer Vision, pp. 21-37. IEEE (2016)
Bengio, Y., Glorot, X.: Understanding the difficulty of training deep feed forward neural networks. In: 13th International Conference on Artificial Intelligence and Statistics, pp. 249–256. IEEE (2010)
Everingham, M., Gool, L.V., Williams, C., et al.: The pascal visual object classes (VOC) challenge. Int. J. Comput. Vision 88(2), 303–338 (2010)
Andriluka, M., Pishchulin, L., Gehler, P., et al.: 2D human pose estimation: new benchmark and state of the art analysis. In: Computer Vision and Pattern Recognition, pp. 3686–3693. IEEE( 2014)
Johnson, S., Everingham, M.: Learning effective human pose estimation from inaccurate annotation. In: Computer Vision and Pattern Recognition, pp. 1465–1472. IEEE (2016)
Acknowledgments
This study was supported by the National Natural Science Foundation of China (Grants No. 91948201 and 61973135), and the Fundamental Research Funds of Shandong University (Grant No. 2019GN017).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Sun, R., Zhang, Q., Guo, J., Chai, H., Li, Y. (2021). Human Action Recognition Using Skeleton Data from Two-Stage Pose Estimation Model. In: Liu, XJ., Nie, Z., Yu, J., Xie, F., Song, R. (eds) Intelligent Robotics and Applications. ICIRA 2021. Lecture Notes in Computer Science(), vol 13013. Springer, Cham. https://doi.org/10.1007/978-3-030-89095-7_73
Download citation
DOI: https://doi.org/10.1007/978-3-030-89095-7_73
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-89094-0
Online ISBN: 978-3-030-89095-7
eBook Packages: Computer ScienceComputer Science (R0)