Sign Language Recognition with Multi-modal Features | SpringerLink
Skip to main content

Sign Language Recognition with Multi-modal Features

  • Conference paper
  • First Online:
Advances in Multimedia Information Processing - PCM 2016 (PCM 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9917))

Included in the following conference series:

Abstract

We study the problem of recognizing sign language automatically using the RGB videos and skeleton coordinates captured by Kinect, which is of great significance in communication between the deaf and the hearing societies. In this paper, we propose a sign language recognition (SLR) system with data of two channels, including the gesture videos of the sign words and joint trajectories. In our framework, we extract two modals of features to represent the hand shape videos and hand trajectories for recognition. The variation of gesture is obtained by 3D CNN and the activations of fully connected layers are used as the representations of these sign videos. For trajectories, we use the shape context to describe each joint, and combine them all within a feature matrix. After that, a convolutional neural network is applied to generate a robust representation of these trajectories. Furthermore, we fuse these features and train a SVM classifier for recognition. We conduct some experiments on large vocabulary sign language dataset with up to 500 words and the results demonstrate the effectiveness of our proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 5719
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 7149
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Amor, B.B., Su, J., Srivastava, A.: Action recognition using rate-invariant analysis of skeletal shape trajectories. TPAMI 38(1), 1–13 (2016)

    Article  Google Scholar 

  2. Belongie, S., Malik, J., Puzicha, J.: Shape matching and object recognition using shape contexts. TPAMI 24(4), 509–522 (2002)

    Article  Google Scholar 

  3. Burges, C.J.: A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Disc. 2(2), 121–167 (1998)

    Article  Google Scholar 

  4. Cai, X., Zhou, W., Li, H.: An effective representation for action recognition with human skeleton joints. In: SPIE/COS Photonics Asia, 92731R. International Society for Optics and Photonics (2014)

    Google Scholar 

  5. Cai, X., Zhou, W., Wu, L., Luo, J., Li, H.: Effective active skeleton representation for low latency human action recognition. TMM 18(2), 141–154 (2016)

    Google Scholar 

  6. Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2(3), 27 (2011)

    Google Scholar 

  7. Cover, T.M., Hart, P.E.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13(1), 21–27 (1967)

    Article  MATH  Google Scholar 

  8. Gao, W., Fang, G., Zhao, D., Chen, Y.: Transition movement models for large vocabulary continuous sign language recognition. In: FG, pp. 553–558 (2004)

    Google Scholar 

  9. Huang, J., Zhou, W., Li, H., Li, W.: Sign language recognition using 3D convolutional neural networks. In: ICME (2015)

    Google Scholar 

  10. Ji, S., Xu, W., Yang, M., Yu, K.: 3D convolutional neural networks for human action recognition. TPAMI 35(1), 221–231 (2013)

    Google Scholar 

  11. Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: ACM MM, pp. 675–678 (2014)

    Google Scholar 

  12. Knerr, S., Personnaz, L., Dreyfus, G.: Single-layer learning revisited: a stepwise procedure for building and training a neural network. In: Neurocomputing, pp. 41–50 (1990)

    Google Scholar 

  13. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)

    Google Scholar 

  14. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)

    Article  Google Scholar 

  15. Lee, G.C., Yeh, F.H., Hsiao, Y.H.: Kinect-based taiwanese sign-language recognition system. Multimedia Tools Appl. 75(1), 261–279 (2016)

    Article  Google Scholar 

  16. Lin, Y., Chai, X., Zhou, Y., Chen, X.: Curve matching from the view of manifold for sign language recognition. In: Jawahar, C.V., Shan, S. (eds.) ACCV 2014. LNCS, vol. 9010, pp. 233–246. Springer, Heidelberg (2015). doi:10.1007/978-3-319-16634-6_18

    Google Scholar 

  17. Liu, Z., Li, H., Zhou, W., Hong, R., Tian, Q.: Uniting keypoints: local visual information fusion for large-scale image search. TMM 17(4), 538–548 (2015)

    Google Scholar 

  18. Murakami, K., Taguchi, H.: Gesture recognition using recurrent neural networks. In: SIGCHI Conference on Human Factors in Computing Systems (1991)

    Google Scholar 

  19. Pavlovic, V.I., Sharma, R., Huang, T.S.: Visual interpretation of hand gestures for human-computer interaction: a review. TPAMI 19(7), 677–695 (1997)

    Article  Google Scholar 

  20. Pu, J., Zhou, W., Zhang, J., Li, H.: Sign language recognition based on trajectory modeling with HMMs. In: Tian, Q., Sebe, N., Qi, G.-J., Huet, B., Hong, R., Liu, X. (eds.) MMM 2016. LNCS, vol. 9516, pp. 686–697. Springer, Heidelberg (2016). doi:10.1007/978-3-319-27671-7_58

    Chapter  Google Scholar 

  21. Schüldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: ICPR, vol. 3, pp. 32–36 (2004)

    Google Scholar 

  22. Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: ICCV, pp. 4489–4497 (2015)

    Google Scholar 

  23. Ueoka, R., Hirose, M., Kuma, K., Sone, M., Kohiyama, K., Kawamura, T., Hiroto, K.: Wearable computer application for open air exhibition in expo 2005. In: PCM, pp. 8–15 (2001)

    Google Scholar 

  24. Vapnik, V.N., Vapnik, V.: Statistical Learning Theory, vol. 1. Wiley, New York (1998)

    MATH  Google Scholar 

  25. Wang, C., Gao, W., Xuan, Z.: A real-time large vocabulary continuous recognition system for Chinese sign language. In: Shum, H.-Y., Liao, M., Chang, S.-F. (eds.) PCM 2001. LNCS, vol. 2195, pp. 150–157. Springer, Heidelberg (2001). doi:10.1007/3-540-45453-5_20

    Chapter  Google Scholar 

  26. Wang, H., Chai, X., Chen, X.: Sparse observation (so) alignment for sign language recognition. Neurocomputing 175, 674–685 (2016)

    Article  Google Scholar 

  27. Wang, H., Chai, X., Zhou, Y., Chen, X.: Fast sign language recognition benefited from low rank approximation. In: FG, vol. 1, pp. 1–6 (2015)

    Google Scholar 

  28. Wang, H., Kläser, A., Schmid, C., Liu, C.L.: Action recognition by dense trajectories. In: CVPR, pp. 3169–3176 (2011)

    Google Scholar 

  29. Wobbrock, J.O., Wilson, A.D., Li, Y.: Gestures without libraries, toolkits or training: a $1 recognizer for user interface prototypes. In: Annual ACM Symposium on User Interface Software and Technology, pp. 159–168 (2007)

    Google Scholar 

  30. Zhang, J., Zhou, W., Li, H.: A threshold-based HMM-DTW approach for continuous sign language recognition. In: ICIMCS, p. 237 (2014)

    Google Scholar 

  31. Zhang, J., Zhou, W., Li, H.: Chinese sign language recognition with adaptive HMM. In: ICME (2016)

    Google Scholar 

  32. Zhang, Z.: Microsoft kinect sensor and its effect. IEEE MultiMedia 19(2), 4–10 (2012)

    Article  Google Scholar 

Download references

Acknowledgement

This work is supported in part to Prof. Houqiang Li by the 973 Program under Contract 2015CB351803 and the National Natural Science Foundation of China (NSFC) under Contract 61390514 and Contract 61325009, and in part to Dr. Wengang Zhou by NSFC under Contract 61472378, the Natural Science Foundation of Anhui Province under Contract 1508085MF109, and the Fundamental Research Funds for the Central Universities.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wengang Zhou .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Pu, J., Zhou, W., Li, H. (2016). Sign Language Recognition with Multi-modal Features. In: Chen, E., Gong, Y., Tie, Y. (eds) Advances in Multimedia Information Processing - PCM 2016. PCM 2016. Lecture Notes in Computer Science(), vol 9917. Springer, Cham. https://doi.org/10.1007/978-3-319-48896-7_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-48896-7_25

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-48895-0

  • Online ISBN: 978-3-319-48896-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics