Quantized depth image and skeleton-based multimodal dynamic hand gesture recognition

Mahmud, Hasan; Morshed, Mashrur M.; Hasan, Md. Kamrul

doi:10.1007/s00371-022-02762-1

Quantized depth image and skeleton-based multimodal dynamic hand gesture recognition

Original article
Published: 04 January 2023

Volume 40, pages 11–25, (2024)
Cite this article

The Visual Computer Aims and scope Submit manuscript

857 Accesses
1 Altmetric
Explore all metrics

Abstract

An existing approach to dynamic hand gesture recognition is to use multimodal-fusion CRNN (Convolutional Recurrent Neural Networks) on depth images and corresponding 2D hand skeleton coordinates. However, an underlying problem in this method is that raw depth images possess a very low contrast in the hand ROI (region of interest). They do not highlight the details which are important to fine-grained hand gesture recognition details such as finger orientation, the overlap between the fingers and the palm, or overlap between multiple fingers. To address this issue, we propose generating quantized depth images as an alternative input modality to raw depth images. This creates sharp relative contrasts between key parts of the hand, which improves gesture recognition performance. In addition, we explore some ways to tackle the high variance problem in previously researched multimodal-fusion CRNN architectures. We obtained accuracies of 90.82 and 89.21% (14 and 28 gestures, respectively) on the DHG-14/28 dataset and accuracies of 93.81 and 90.24% (14 and 28 gestures, respectively) on the SHREC-2017 dataset, which is a significant improvement over previous multimodal-dusion CRNNs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

HandSense: smart multimodal hand gesture recognition based on deep neural networks

Article 23 August 2018

Skeleton Based Dynamic Hand Gesture Recognition using Short Term Sampling Neural Networks (STSNN)

Dynamic gesture recognition based on 2D convolutional neural network and feature fusion

Article Open access 14 March 2022

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Availability of data and materials

The associated code is available at this GitHub repository: https://github.com/ID56/Multimodal-Fusion-CRNN.

References

Araujo, A., Norris, W., Sim, J.: Computing receptive fields of convolutional neural networks. Distill (2019). https://doi.org/10.23915/distill.00021. https://distill.pub/2019/computing-receptive-fields
Barbhuiya, A.A., Karsh, R.K., Jain, R.: CNN based feature extraction and classification for sign language. Multimedia Tools Appl. 80(2), 3051–3069 (2021)
Article Google Scholar
Chen, Y., Zhao, L., Peng, X., et al.: Construct dynamic graphs for hand gesture recognition via spatial-temporal attention. arXiv:1907.08871 (2019)
Chen, X., Wang, G., Guo, H., et al.: Mfa-net: motion feature augmented network for dynamic hand gesture recognition from skeletal data. Sensors 19(2), 239 (2019)
Article Google Scholar
De Smedt, Q., Wannous, H., Vandeborre, J.P., et al.: Shrec’17 track: 3d hand gesture recognition using a depth and skeletal dataset. In: 3DOR-10th Eurographics Workshop on 3D Object Retrieval, pp. 1–6 (2017)
De Smedt, Q., Wannous, H., Vandeborre, J.P.: Skeleton-based dynamic hand gesture recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 1–9 (2016)
Deng, J., Dong, W., Socher, R., et al.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)
Desai, S., Desai, A.: Human computer interaction through hand gestures for home automation using microsoft kinect. In: Proceedings of International Conference on Communication and Networks, pp. 19–29. Springer (2017)
Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Facebook: Fvcore library (2019). https://github.com/facebookresearch/fvcore
Foto, B.H., Corp, E.: Intel realsense depth module sr300 (online) (2021). https://www.bhphotovideo.com/c/product/1567309-REG/intel_82535ivchvm_realsense_camera_sr300.html/specs. Accessed 1 Aug 2021
Geirhos, R., Rubisch, P., Michaelis, C., et al.: Imagenet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness. arXiv preprint arXiv:1811.12231 (2018)
Hou, J., Wang, G., Chen, X., et al.: Spatial-temporal attention res-tcn for skeleton-based dynamic hand gesture recognition. In: Proceedings of the European Conference on Computer Vision (ECCV) Workshops (2018)
Iwai, Y., Watanabe, K., Yagi, Y., et al.: Gesture recognition by using colored gloves. In: 1996 IEEE International Conference on Systems, Man and Cybernetics. Information Intelligence and Systems (Cat. No. 96CH35929), pp. 76–81. IEEE (1996)
Jain, R., Karsh, R.K., Barbhuiya, A.A.: Encoded motion image-based dynamic hand gesture recognition. Vis. Comput. 38(6), 1957–1974 (2022)
Article Google Scholar
Koller, O., Zargaran, S., Ney, H., et al.: Deep sign: enabling robust statistical continuous sign language recognition via hybrid CNN-HMMS. Int. J. Comput. Vis. 126(12), 1311–1325 (2018)
Article Google Scholar
Kopuklu, O., Kose, N., Rigoll, G.: Motion fused frames: Data level fusion strategy for hand gesture recognition. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pp. 1–9 (2018)
Köpüklü, O., Ledwon, T., Rong, Y., et al.: Drivermhg: a multi-modal dataset for dynamic recognition of driver micro hand gestures and a real-time recognition framework. In: 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020), pp. 77–84. IEEE (2020)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Adv. Neural. Inf. Process. Syst. 25, 1097–1105 (2012)
Google Scholar
Kurakin, A., Zhang, Z., Liu, Z.: A real time system for dynamic hand gesture recognition with a depth sensor. In: 2012 Proceedings of the 20th European signal processing conference (EUSIPCO), pp. 1975–1979. IEEE (2012)
Lai, K., Yanushkevich, S.: An ensemble of knowledge sharing models for dynamic hand gesture recognition. In: 2020 International Joint Conference on Neural Networks (IJCNN), pp 1–7. IEEE (2020)
Lai, K., Yanushkevich, S.N.: CNN+ RNN depth and skeleton based dynamic hand gesture recognition. In: 2018 24th International Conference on Pattern Recognition (ICPR), pp. 3451–3456. IEEE (2018)
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017)
Mahmud, H., Islam, R., Hasan, M.K.: On-air English capital alphabet (ECA) recognition using depth information. Vis. Comput. https://doi.org/10.1007/s00371-021-02065-x. https://link.springer.com/article/10.1007%2Fs00371-021-02065-x
Min, Y., Zhang, Y., Chai, X., et al.: An efficient pointlstm for point clouds based gesture recognition. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5760–5769 (2020). https://doi.org/10.1109/CVPR42600.2020.00580
Molchanov, P., Yang, X., Gupta, S., et al.: Online detection and classification of dynamic hand gestures with recurrent 3d convolutional neural network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Nagi, J., Ducatelle, F., Di Caro, G.A., et al.: Max-pooling convolutional neural networks for vision-based hand gesture recognition. In: 2011 IEEE International Conference on Signal and Image Processing Applications (ICSIPA), pp. 342–347. IEEE (2011)
Naguri, C.R., Bunescu, R.C.: Recognition of dynamic hand gestures from 3d motion data using LSTM and CNN architectures. In: 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 1130–1133 (2017). https://doi.org/10.1109/ICMLA.2017.00013
Nunez, J.C., Cabido, R., Pantrigo, J.J., et al.: Convolutional neural networks and long short-term memory for skeleton-based human activity and hand gesture recognition. Pattern Recogn. 76, 80–94 (2018)
Article Google Scholar
Oudah, M., Al-Naji, A., Chahl, J.: Hand gesture recognition based on computer vision: a review of techniques. J. Imaging 6(8), 73 (2020)
Article Google Scholar
Pintea, S.L., Zheng, J., Li, X., et al.: Hand-tremor frequency estimation in videos. In: Proceedings of the European Conference on Computer Vision (ECCV) Workshops (2018)
Ramachandran, P., Zoph, B., Le, Q.V.: Searching for activation functions. arXiv preprint arXiv:1710.05941 (2017)
Rogozhnikov, A.: Einops: flexible and powerful tensor operations for readable and reliable code (2018). https://github.com/arogozhnikov/einops
Tao, W., Leu, M.C., Yin, Z.: American sign language alphabet recognition using convolutional neural networks with multiview augmentation and inference fusion. Eng. Appl. Artif. Intell. 76, 202–213 (2018)
Article Google Scholar
Vandersteegen, M., Reusen, W., Van Beeck, K., et al.: Low-latency hand gesture recognition with a low-resolution thermal imager. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 98–99 (2020)
Yan, S., Xiong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition. In: McIlraith SA, Weinberger KQ (eds) Proceedings of the 32nd AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2–7, 2018, pp. 7444–7452. AAAI Press (2018). https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/17135
Zhang, Y., Cao, C., Cheng, J., et al.: Egogesture: a new dataset and benchmark for egocentric hand gesture recognition. IEEE Trans. Multimedia 20(5), 1038–1050 (2018)

Download references

Author information

Authors and Affiliations

Systems and Software Lab (SSL), Department of Computer Science and Engineering, Islamic University of Technology, Board Bazar, Gazipur, Dhaka, 1704, Bangladesh
Hasan Mahmud, Mashrur M. Morshed & Md. Kamrul Hasan

Authors

Hasan Mahmud
View author publications
You can also search for this author inPubMed Google Scholar
Mashrur M. Morshed
View author publications
You can also search for this author inPubMed Google Scholar
Md. Kamrul Hasan
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Hasan Mahmud.

Ethics declarations

Conflict of interest

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Mahmud, H., Morshed, M.M. & Hasan, M.K. Quantized depth image and skeleton-based multimodal dynamic hand gesture recognition. Vis Comput 40, 11–25 (2024). https://doi.org/10.1007/s00371-022-02762-1

Download citation

Accepted: 20 December 2022
Published: 04 January 2023
Issue Date: January 2024
DOI: https://doi.org/10.1007/s00371-022-02762-1

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Quantized depth image and skeleton-based multimodal dynamic hand gesture recognition

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

HandSense: smart multimodal hand gesture recognition based on deep neural networks

Skeleton Based Dynamic Hand Gesture Recognition using Short Term Sampling Neural Networks (STSNN)

Dynamic gesture recognition based on 2D convolutional neural network and feature fusion

Explore related subjects

Availability of data and materials

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now