Fusing Hand and Body Skeletons for Human Action Recognition in Assembly | SpringerLink
Skip to main content

Fusing Hand and Body Skeletons for Human Action Recognition in Assembly

  • Conference paper
  • First Online:
Artificial Neural Networks and Machine Learning – ICANN 2023 (ICANN 2023)

Abstract

As collaborative robots (cobots) continue to gain popularity in industrial manufacturing, effective human-robot collaboration becomes crucial. Cobots should be able to recognize human actions to assist with assembly tasks and act autonomously. To achieve this, skeleton-based approaches are often used due to their ability to generalize across various people and environments. Although body skeleton approaches are widely used for action recognition, they may not be accurate enough for assembly actions where the worker’s fingers and hands play a significant role. To address this limitation, we propose a method in which less detailed body skeletons are combined with highly detailed hand skeletons. We investigate CNNs and transformers, the latter of which are particularly adept at extracting and combining important information from both skeleton types using attention. This paper demonstrates the effectiveness of our proposed approach in enhancing action recognition in assembly scenarios.

This work has received funding from the Carl-Zeiss-Stiftung as part of the project engineering for smart manufacturing (E4SM).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 9380
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 11725
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Our preliminary experiments on the ATTACH dataset using hand skeletons solely showed far inferior results compared to body skeletons solely and are thus not investigated further.

  2. 2.

    For ResNet, the image is resized to \(224{\times }224\) with pixel values ranging from 0–255. For Swin, we use a resolution of \(256{\times }256\) with pixel values from 0-1.

References

  1. Aganian, D., Köhler, M., Baake, S., Eisenbach, M., Gross, H.M.: How object information improves skeleton-based human action recognition in assembly tasks. In: IEEE International Joint Conference on Neural Networks (IJCNN) (2023)

    Google Scholar 

  2. Aganian, D., Stephan, B., Eisenbach, M., Stretz, C., Gross, H.M.: ATTACH dataset: annotated two-handed assembly actions for human action understanding. In: IEEE International Conference on Robotics and Automation (ICRA) (2023)

    Google Scholar 

  3. Ben-Shabat, Y., et al.: The IKEA ASM dataset: understanding people assembling furniture through actions, objects and pose. In: IEEE Winter Conference on Applications of Computer Vision (WACV) (2021)

    Google Scholar 

  4. Du, Y., Fu, Y., Wang, L.: Skeleton based action recognition with convolutional neural network. In: IEEE IAPR Asian Conference on Pattern Recognition (ACPR) (2015)

    Google Scholar 

  5. Duan, H., Zhao, Y., Chen, K., Lin, D., Dai, B.: Revisiting skeleton-based action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2022)

    Google Scholar 

  6. Eisenbach, M., Aganian, D., Köhler, M., Stephan, B., Schröter, C., Gross, H.M.: Visual scene understanding for enabling situation-aware cobots. In: IEEE International Conference on Automation Science and Engineering (CASE) (2021)

    Google Scholar 

  7. Fischedick, S., Seichter, D., Schmidt, R., Rabes, L., Gross, H.M.: Efficient multi-task scene analysis with RGB-D transformers. In: IEEE International Joint Conference on Neural Networks (IJCNN) (2023)

    Google Scholar 

  8. He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: IEEE International Conference on Computer Vision (ICCV) (2017)

    Google Scholar 

  9. Inkulu, A.K., Bahubalendruni, M.R., Dara, A., SankaranarayanaSamy, K.: Challenges and opportunities in human robot collaboration context of industry 4.0 - a state of the art review. Ind. Robot: Int. J. Robot. Res. Appl. 49(2) (2021)

    Google Scholar 

  10. Liu, Z., et al.: Swin transformer v2: scaling up capacity and resolution. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2022)

    Google Scholar 

  11. Mazzia, V., Angarano, S., Salvetti, F., Angelini, F., Chiaberge, M.: Action transformer: a self-attention model for short-time pose-based human action recognition. Pattern Recogn., 124 (2022)

    Google Scholar 

  12. Ragusa, F., Furnari, A., Livatino, S., Farinella, G.M.: The MECCANO dataset: understanding human-object interactions from egocentric videos in an industrial-like domain. In: IEEE Winter Conference on Applications of Computer Vision (WACV) (2021)

    Google Scholar 

  13. Seichter, D., Köhler, M., Lewandowski, B., Wengefeld, T., Gross, H.M.: Efficient RGB-D semantic segmentation for indoor scene analysis. In: International Conference on Robotics and Automation (ICRA) (2021)

    Google Scholar 

  14. Sener, F., et al.: Assembly101: a large-scale multi-view video dataset for understanding procedural activities. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2022)

    Google Scholar 

  15. Terreran, M., Lazzaretto, M., Ghidoni, S.: Skeleton-based action and gesture recognition for human-robot collaboration. In: International Conference on Intelligent Autonomous Systems (IAS). Springer (2022). https://doi.org/10.1007/978-3-031-22216-0_3

  16. Trivedi, N., Sarvadevabhatla, R.K.: PSUMNet: unified modality part streams are all you need for efficient pose-based action recognition. In: ECCV Workshop and Challenge on People Analysis (WCPA). Springer (2022). https://doi.org/10.1007/978-3-031-25072-9_14

  17. Trivedi, N., Thatipelli, A., Sarvadevabhatla, R.K.: NTU-X: an enhanced large-scale dataset for improving pose-based recognition of subtle human actions. In: Indian Conference on Computer Vision, Graphics and Image Processing (ICVGIP). ACM (2021)

    Google Scholar 

  18. Wang, L., et al.: Symbiotic human-robot collaborative assembly. CIRP annals 68(2) (2019)

    Google Scholar 

  19. Zhang, F., et al.: MediaPipe hands: on-device real-time hand tracking. In: Workshop on Computer Vision for AR/VR (CV4ARVR) (2020)

    Google Scholar 

  20. Zhang, P., Lan, C., Xing, J., Zeng, W., Xue, J., Zheng, N.: View adaptive neural networks for high performance skeleton-based human action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) (2019)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dustin Aganian .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Aganian, D., Köhler, M., Stephan, B., Eisenbach, M., Gross, HM. (2023). Fusing Hand and Body Skeletons for Human Action Recognition in Assembly. In: Iliadis, L., Papaleonidas, A., Angelov, P., Jayne, C. (eds) Artificial Neural Networks and Machine Learning – ICANN 2023. ICANN 2023. Lecture Notes in Computer Science, vol 14254. Springer, Cham. https://doi.org/10.1007/978-3-031-44207-0_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-44207-0_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-44206-3

  • Online ISBN: 978-3-031-44207-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics