Abstract
Video-based gesture recognition plays an important role in the field of human-computer interaction (HCI), and most of the existing video-based gesture recognition works are based on traditional RGB gesture videos. Compared with RGB gesture videos, RGB-D gesture videos contain additional depth information along with each data frame. Such depth information is considered effective to help overcome the impact of illumination and background variations. So far as we know, there are few RGB-D gesture video datasets, which fully consider sufficient illumination and background variations. We believe this missing factor is quite common in daily usage scenarios and will bring non-necessary obstacles to the development of gesture recognition algorithms. Inspired by this observation, this paper uses embedded devices to collect and classify a set of RGB-D gesture videos which retain both color information and depth information, and proposes a new RGB-D gesture video data set named DG-20. Specifically, DG-20 fully considers the changes of illumination and background when capturing data, and provides more realistic RGB-D gesture video data for the future research of RGB-D gesture recognition algorithms. Furthermore, we give out the benchmark evaluations about DG-20 on two representative light-weighted 3D CNN networks. Experimental results show that the depth information encoded within RGB-D gesture videos could effectively improve the classification accuracy when dramatic changes in illumination and background exist.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Chen, Z.L., Lin, X.L., Chen, J.Y., Huang, Q.Y., Li, C.: Research and application of gesture recognition and rehabilitation system based on computer vision. Comput. Meas. Control 7, 203–207 (2021)
Su, B.Y., Wang, G.J., Zhang, J.: Smart home system based on internet of things and kinect sensor. J. Cent. South Univ. (Sci. Technol.) 44, 181–184 (2013)
Sha, J., Ma, J., Mou, H.J., Hou, J.H.: A review of vision based dynamic hand gestures recognition. Comput. Sci. Appl. 10, 990–1001 (2020)
Zhao, Q.N.: Research on gesture recognition technology based on computer vision. Dalian University of Technology (2020)
Zhou, S.: Gesture recognition based on feature fusion: Zhengzhou University (2019)
Li, J.M.: 3D hand gesture recognition in RGBD images. Graduate School of National University of Defense Technology (2017)
Kang, C.Q.: Hand gesture recognition and application based on RGBD Data. Chang’an University (2017)
Negin, F., et al.: PRAXIS: towards automatic cognitive assessment using gesture recognition. Expert Syst. Appl. 106, 21–35 (2018)
Zhang, Y.F., Cao, C., Cheng, J., Lu, H.: EgoGesture: a new dataset and benchmark for egocentric hand gesture recognition. IEEE Trans. Multimed. 20(5), 1038–1050 (2018)
Chai, X.X., Wang, H., Yin, F., Chen, X.: Communication tool for the hard of hearings: a large vocabulary sign language recognition system. In: 2015 International Conference on Affective Computing and Intelligent Interaction (ACII), pp. 781–783 (2015)
Lin, F., Wilhelm, C., Martinez, T.: Two-hand global 3D pose estimation using monocular RGB. In: The CVF Winter Conference on Applications of Computer Vision (WACV), pp. 2373–2381 (2021)
Wan, J., Li, S.Z., Zhao, Y., Shuai, Z., Escalera, S.: ChaLearn looking at people RGB-D isolated and continuous datasets for gesture recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 761–769 (2016)
Yuan, S.H., Jordi, S.R., Taking, L., Kai, L.H., Wen, H.C.: LaRED: a large RGB-D extensible hand gesture dataset. In: Xu, C.S. (ed.) Multimedia Systems Conference, pp. 53–28 (2014)
Tompson, J., Stein, M., Lecun, Y., Perlin, K.: Real-Time continuous pose recovery of human hands using convolutional networks. ACM Trans. Graph. 33, 1–10 (2014)
Qin, F.: Real-time dynamic gesture recognition based on deep learning. Zhejiang University (2020)
Howard, A.G., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv (2017)
Zhang, X., Zhou, X., Lin, M., Sun, J.: ShuffleNet: an extremely efficient convolutional neural network for mobile devices. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018)
Escalera, S., Baro, X., Escalante, H.J., Guyon, I.: ChaLearn looking at people: a review of events and resources. In: Choe, Y. (ed.) 2017 International Joint Conference on Neural Networks (IJCNN), pp. 1594–1601 (2017)
Kopuklu, O., Kose, N., Gunduz, A., Rigoll, G.: Resource efficient 3D convolutional neural networks. In: IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), pp. 1910–1919 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Xiao, G., Lu, Z., Yang, Z., Jin, P., Li, K., Yin, J. (2021). A New RGB-D Gesture Video Dataset and Its Benchmark Evaluations on Light-Weighted Networks. In: Cai, Z., Li, J., Zhang, J. (eds) Theoretical Computer Science. NCTCS 2021. Communications in Computer and Information Science, vol 1494. Springer, Singapore. https://doi.org/10.1007/978-981-16-7443-3_4
Download citation
DOI: https://doi.org/10.1007/978-981-16-7443-3_4
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-7442-6
Online ISBN: 978-981-16-7443-3
eBook Packages: Computer ScienceComputer Science (R0)