Abstract
We study the problem of recognizing sign language automatically using the RGB videos and skeleton coordinates captured by Kinect, which is of great significance in communication between the deaf and the hearing societies. In this paper, we propose a sign language recognition (SLR) system with data of two channels, including the gesture videos of the sign words and joint trajectories. In our framework, we extract two modals of features to represent the hand shape videos and hand trajectories for recognition. The variation of gesture is obtained by 3D CNN and the activations of fully connected layers are used as the representations of these sign videos. For trajectories, we use the shape context to describe each joint, and combine them all within a feature matrix. After that, a convolutional neural network is applied to generate a robust representation of these trajectories. Furthermore, we fuse these features and train a SVM classifier for recognition. We conduct some experiments on large vocabulary sign language dataset with up to 500 words and the results demonstrate the effectiveness of our proposed method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Amor, B.B., Su, J., Srivastava, A.: Action recognition using rate-invariant analysis of skeletal shape trajectories. TPAMI 38(1), 1–13 (2016)
Belongie, S., Malik, J., Puzicha, J.: Shape matching and object recognition using shape contexts. TPAMI 24(4), 509–522 (2002)
Burges, C.J.: A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Disc. 2(2), 121–167 (1998)
Cai, X., Zhou, W., Li, H.: An effective representation for action recognition with human skeleton joints. In: SPIE/COS Photonics Asia, 92731R. International Society for Optics and Photonics (2014)
Cai, X., Zhou, W., Wu, L., Luo, J., Li, H.: Effective active skeleton representation for low latency human action recognition. TMM 18(2), 141–154 (2016)
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2(3), 27 (2011)
Cover, T.M., Hart, P.E.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13(1), 21–27 (1967)
Gao, W., Fang, G., Zhao, D., Chen, Y.: Transition movement models for large vocabulary continuous sign language recognition. In: FG, pp. 553–558 (2004)
Huang, J., Zhou, W., Li, H., Li, W.: Sign language recognition using 3D convolutional neural networks. In: ICME (2015)
Ji, S., Xu, W., Yang, M., Yu, K.: 3D convolutional neural networks for human action recognition. TPAMI 35(1), 221–231 (2013)
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: ACM MM, pp. 675–678 (2014)
Knerr, S., Personnaz, L., Dreyfus, G.: Single-layer learning revisited: a stepwise procedure for building and training a neural network. In: Neurocomputing, pp. 41–50 (1990)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Lee, G.C., Yeh, F.H., Hsiao, Y.H.: Kinect-based taiwanese sign-language recognition system. Multimedia Tools Appl. 75(1), 261–279 (2016)
Lin, Y., Chai, X., Zhou, Y., Chen, X.: Curve matching from the view of manifold for sign language recognition. In: Jawahar, C.V., Shan, S. (eds.) ACCV 2014. LNCS, vol. 9010, pp. 233–246. Springer, Heidelberg (2015). doi:10.1007/978-3-319-16634-6_18
Liu, Z., Li, H., Zhou, W., Hong, R., Tian, Q.: Uniting keypoints: local visual information fusion for large-scale image search. TMM 17(4), 538–548 (2015)
Murakami, K., Taguchi, H.: Gesture recognition using recurrent neural networks. In: SIGCHI Conference on Human Factors in Computing Systems (1991)
Pavlovic, V.I., Sharma, R., Huang, T.S.: Visual interpretation of hand gestures for human-computer interaction: a review. TPAMI 19(7), 677–695 (1997)
Pu, J., Zhou, W., Zhang, J., Li, H.: Sign language recognition based on trajectory modeling with HMMs. In: Tian, Q., Sebe, N., Qi, G.-J., Huet, B., Hong, R., Liu, X. (eds.) MMM 2016. LNCS, vol. 9516, pp. 686–697. Springer, Heidelberg (2016). doi:10.1007/978-3-319-27671-7_58
Schüldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: ICPR, vol. 3, pp. 32–36 (2004)
Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: ICCV, pp. 4489–4497 (2015)
Ueoka, R., Hirose, M., Kuma, K., Sone, M., Kohiyama, K., Kawamura, T., Hiroto, K.: Wearable computer application for open air exhibition in expo 2005. In: PCM, pp. 8–15 (2001)
Vapnik, V.N., Vapnik, V.: Statistical Learning Theory, vol. 1. Wiley, New York (1998)
Wang, C., Gao, W., Xuan, Z.: A real-time large vocabulary continuous recognition system for Chinese sign language. In: Shum, H.-Y., Liao, M., Chang, S.-F. (eds.) PCM 2001. LNCS, vol. 2195, pp. 150–157. Springer, Heidelberg (2001). doi:10.1007/3-540-45453-5_20
Wang, H., Chai, X., Chen, X.: Sparse observation (so) alignment for sign language recognition. Neurocomputing 175, 674–685 (2016)
Wang, H., Chai, X., Zhou, Y., Chen, X.: Fast sign language recognition benefited from low rank approximation. In: FG, vol. 1, pp. 1–6 (2015)
Wang, H., Kläser, A., Schmid, C., Liu, C.L.: Action recognition by dense trajectories. In: CVPR, pp. 3169–3176 (2011)
Wobbrock, J.O., Wilson, A.D., Li, Y.: Gestures without libraries, toolkits or training: a $1 recognizer for user interface prototypes. In: Annual ACM Symposium on User Interface Software and Technology, pp. 159–168 (2007)
Zhang, J., Zhou, W., Li, H.: A threshold-based HMM-DTW approach for continuous sign language recognition. In: ICIMCS, p. 237 (2014)
Zhang, J., Zhou, W., Li, H.: Chinese sign language recognition with adaptive HMM. In: ICME (2016)
Zhang, Z.: Microsoft kinect sensor and its effect. IEEE MultiMedia 19(2), 4–10 (2012)
Acknowledgement
This work is supported in part to Prof. Houqiang Li by the 973 Program under Contract 2015CB351803 and the National Natural Science Foundation of China (NSFC) under Contract 61390514 and Contract 61325009, and in part to Dr. Wengang Zhou by NSFC under Contract 61472378, the Natural Science Foundation of Anhui Province under Contract 1508085MF109, and the Fundamental Research Funds for the Central Universities.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Pu, J., Zhou, W., Li, H. (2016). Sign Language Recognition with Multi-modal Features. In: Chen, E., Gong, Y., Tie, Y. (eds) Advances in Multimedia Information Processing - PCM 2016. PCM 2016. Lecture Notes in Computer Science(), vol 9917. Springer, Cham. https://doi.org/10.1007/978-3-319-48896-7_25
Download citation
DOI: https://doi.org/10.1007/978-3-319-48896-7_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-48895-0
Online ISBN: 978-3-319-48896-7
eBook Packages: Computer ScienceComputer Science (R0)