Real-time isolated hand sign language recognition using deep networks and SVD

Rastgoo, Razieh; Kiani, Kourosh; Escalera, Sergio

doi:10.1007/s12652-021-02920-8

Real-time isolated hand sign language recognition using deep networks and SVD

Original Research
Published: 16 February 2021

Volume 13, pages 591–611, (2022)
Cite this article

Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

2075 Accesses
42 Citations
Explore all metrics

Abstract

One of the challenges in computer vision models, especially sign language, is real-time recognition. In this work, we present a simple yet low-complex and efficient model, comprising single shot detector, 2D convolutional neural network, singular value decomposition (SVD), and long short term memory, to real-time isolated hand sign language recognition (IHSLR) from RGB video. We employ the SVD method as an efficient, compact, and discriminative feature extractor from the estimated 3D hand keypoints coordinators. Despite the previous works that employ the estimated 3D hand keypoints coordinates as raw features, we propose a novel and revolutionary way to apply the SVD to the estimated 3D hand keypoints coordinates to get more discriminative features. SVD method is also applied to the geometric relations between the consecutive segments of each finger in each hand and also the angles between these sections. We perform a detailed analysis of recognition time and accuracy. One of our contributions is that this is the first time that the SVD method is applied to the hand pose parameters. Results on four datasets, RKS-PERSIANSIGN (\(99.5 \pm 0.04\)), First-Person (\(91 \pm 0.06\)), ASVID (\(93 \pm 0.05\)), and isoGD (\(86.1 \pm 0.04\)), confirm the efficiency of our method in both accuracy (\(mean + std\)) and time recognition. Furthermore, our model outperforms or gets competitive results with the state-of-the-art alternatives in IHSLR and hand action recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Hand pose aware multimodal isolated sign language recognition

Article 01 September 2020

Video-based isolated hand sign language recognition using a deep cascaded model

Article 02 June 2020

Exploiting 3D Hand Pose Estimation in Deep Learning-Based Sign Language Recognition from RGB Videos

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Avola D, Bernardi M, Cinque L, Foresti GL, Massaroni C (2019) Exploiting recurrent neural networks and leap motion controller for sign language and semaphoric gesture recognition. IEEE Trans Multimed 21(1):234–245. https://doi.org/10.1109/TMM.2018.2856094
Article Google Scholar
Bachmann D, Weichert F, Rinkenauer G (2018) Review of three-dimensional human-computer interaction with focus on the leap motion controller. Sensors 18(7):2194. https://doi.org/10.3390/s18072194
Article Google Scholar
Basques K, Kearney M (2020) Analyze runtime performance. https://developers.google.com/web/tools/chrome-devtools/rendering-tools/. Accessed Feb 2021
Borg M, Camilleri KP (2020) Phonologically-meaningful subunits for deep learning-based sign language recognition. SLRTP, pp 1–18
Butt AH et al (2018) Objective and automatic classification of Parkinson disease with leap motion controller. Biomed Eng Online 17:168
Article Google Scholar
Cai S, Zhu G, Wu Y, Liu E, Hu X (2018) A case study of gesture-based games in enhancing the fine motor skills and recognition of children with autism. Interact Learn Environ 26:1039–1052
Article Google Scholar
Cao L (2010) Singular value decomposition applied to digital image processing. division of computing studies, Arizona State University Polytechnic Campus, Mesa, Arizona. https://sites.math.washington.edu/~morrow/498_13/svdphoto.pdf. Accessed Feb 2021
Chen Y, Zhao L, Peng X, Yuan J, Metaxas D (2019) Construct dynamic graphs for hand gesture recognition via spatial-temporal attention. BMVC, London, pp 1–13
Google Scholar
Cohen MW, Voldman I, Regazzoni D, Vitali A (2018) Hand rehabilitation via gesture recognition using leap motion controller. In: Proceedings of the 11th international conference on human system interaction, HIS, Gdansk, Poland, Jul 2018, pp 404–410. https://doi.org/10.1109/HSI.2018.8431349
Correia de Amorim C, Macedo D, Zanchettin C (2019) Spatial-temporal graph convolutional networks for sign language recognition. In: 28th international conference on artificial neural networks (ICANN2019), Sep 2019, Munich, Germany, pp 1–8. https://e-nns.org/icann2019/online_posters/368.pdf
Darabkh KA, Alturk FH, Sweidan SZ (2018) VRCDEA-TCS: 3D virtual reality cooperative drawing educational application with textual chatting system. Comput Appl Eng Educ 26:1677–1698
Article Google Scholar
Dawes F, Penders J, Carbone G (2019) Remote control of a robotic hand using a leap sensor. In: The international conference of IFToMM ITALY 68. Springer International Publishing, Cham, pp 332–341. https://doi.org/10.1007/978-3-030-03320-0_36
Chapter Google Scholar
Elboushaki A, Hannane R, Afdel K, Koutti L (2020) MultiD-CNN: a multidimensional feature learning approach based on deep convolutional networks for gesture recognition in RGB-D image sequences. Expert Syst Appl 139:112829. https://doi.org/10.1016/j.eswa.2019.112829
Article Google Scholar
Feng J, Zhang S, Xiao J (2019) Explorations of skeleton features for LSTM-based action recognition. Multimed Tools Appl 78:591–603. https://doi.org/10.1007/s11042-017-5290-9
Article Google Scholar
Garcia-Hernando G, Yuan S, Baek S, Kim T (2018) First-person hand action benchmark with RGB-D videos and 3D hand pose annotations. CVPR, Salt Lake City, UT, USA, Jun 2018, pp 409–419. http://openaccess.thecvf.com/content_cvpr_2018/papers/%0AGarcia-Hernando_First-Person_Hand_Action_CVPR_2018_paper.pdf
Ghanem S, Conly C, Athitsos V (2017) A survey on sign language recognition using smartphones. In: Proceedings of the 10th international conference on pervasive technologies related to assistive environments, Island of Rhodes Greece, June 2017, pp 171–176. https://doi.org/10.1145/3056540.3056549
Gokce C, Ozdemir O, Kındıroglu A, Akarun L (2020) Score-level multi cue fusion for sign language recognition. SLRTP, pp 1–16
Gomez-Donoso F, Orts-Escolano S, Cazorla M (2019) Accurate and efficient 3D hand pose regression for robot hand teleoperation using a monocular RGB camera. Expert Syst Appl 136:327–337. https://doi.org/10.1016/j.eswa.2019.06.055
Article Google Scholar
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8): 1735–1780. https://www.bioinf.jku.at/publications/older/2604.pdf
Hosain AA, Selvam Santhalingam P, Pathak P, Koseck J, Rangwala H (2019) Sign language recognition analysis using multimodal data. The 6th IEEE international conference on data science and advanced analytics, Oct 2019, Washington DC, USA. https://arxiv.org/abs/1909.11232
Huh D, Gurrapu S, Olson F, Rangwala H, Pathak P, Kosecka J (2020) Generative multi-stream architecture for american sign language recognition, pp 1–5. ArXiv Preprint ArXiv:2003.08743v1. https://arxiv.org/pdf/2003.08743.pdf
Lee I, Kim D, Kang S, Lee S (2017) Ensemble deep learning for skeleton-based action recognition using temporal sliding LSTM networks. In: 2017 IEEE international conference on computer vision (ICCV), Venice, Italy, pp 1012–1020. https://doi.org/10.1109/ICCV.2017.115
Li C, Wang P, Wang S, Hou Y, Li W (2017) Skeleton-based action recognition using lstm and CNN. In: 2017 IEEE international conference on multimedia and expo workshops (ICMEW), Hong Kong, pp 1–6. https://doi.org/10.1109/ICMEW.2017.8026287
Li R, Zou K, Wang W (2020) Application of human body gesture recognition algorithm based on deep learning in non-contact human body measurement. J Ambient Intell Hum Comput. https://doi.org/10.1007/s12652-020-01993-1
Article Google Scholar
Lim KM, Tan AW, Tan SC (2016) Block-based histogram of optical flow for isolated sign language recognition. J Vis Commun Image Represent 40:538–545
Article Google Scholar
Liu J, Shahroudy A, Xu D, Kot AC, Wang G (2018a) Skeleton-based action recognition using spatio-temporal LSTM network with trust gates. IEEE Trans Pattern Anal Mach Intell 40(12):3007–3021. https://doi.org/10.1109/TPAMI.2017.2771306
Article Google Scholar
Liu J, Wang G, Duan LY, Abdiyeva K, Kot AC (2018b) Skeleton based human action recognition with global context-aware attention LSTM networks. IEEE Trans Image Process 27(4):1586–1599. https://doi.org/10.1109/TIP.2017.2785279
Article MathSciNet MATH Google Scholar
Majidi N, Kiani K, Rastgoo R (2020) A deep model for super-resolution enhancement from a single image. J AI Data Mining 8:451–460. https://doi.org/10.22044/JADM.2020.9131.2052
Article Google Scholar
Morando M, Ponte S, Ferrara E, Dellepiane S (2018) Definition of motion and biophysical indicators for home-based rehabilitation through serious games. Information 9:105
Article Google Scholar
Mustafa M (2020) A study on Arabic sign language recognition for differently abled using advanced machine learning classifiers. J Ambient Intell Human Comput. https://doi.org/10.1007/s12652-020-01790-w
Article Google Scholar
Neidle C, Thangali A, Sclaroff S (2012) Challenges in development of the American sign language lexicon video dataset (ASLLVD) corpus. In: 5th workshop on the representation and processing of sign languages: interactions between corpus and Lexicon, LREC 2012, Istanbul, Turkey, May 2012. http://www.bu.edu/asllrp/av/dai-asllvd.html
Rastgoo R, Kiani K, Escalera S (2018) Multi-modal deep hand sign language recognition in still images using restricted Boltzmann machine. Entropy 20(11):809. Retrieved from https://www.mdpi.com/1099-4300/20/11/809
Rastgoo R, Kiani K, Escalera S (2020a) Hand sign language recognition using multi-view hand skeleton. Expert Syst Appl 150:113336. https://doi.org/10.1016/j.eswa.2020.113336
Article Google Scholar
Rastgoo R, Kiani K, Escalera S (2020b) Video-based isolated hand sign language recognition using a deep cascaded model. Multimed Tools Appl. https://doi.org/10.1007/s11042-020-09048-5
Article Google Scholar
Rastgoo R, Kiani K, Escalera S (2021a) Hand pose aware multimodal isolated sign language recognition. Multimed Tools Appl 80:127–163. https://doi.org/10.1007/s11042-020-09700-0
Article Google Scholar
Rastgoo R, Kiani K, Escalera S (2021b) Sign language recognition: a deep survey. Expert Syst Appl 164:113794. https://doi.org/10.1016/j.eswa.2020.113794
Article Google Scholar
Roccetti M, Marfia G, Semeraro A (2012) Playing into the wild: a gesture-based interface for gaming in public spaces. Play Wild Gesture Based Interface Gaming Public Spaces 23:426–440
Google Scholar
Sadek A (2012) SVD based image processing applications: state of the art, contributions and research challenges. (IJACSA) Int J Adv Comput Sci Appl 3: 26–34. https://arxiv.org/ftp/arxiv/papers/1211/1211.7102.pdf
Vaitkevičius A, Taroza M, Blažauskas T, Damaševičius R, Maskeliūnas R, Woźniak M (2019) Recognition of American sign language gestures in a virtual reality using leap motion. Appl Sci 9(3):445. https://doi.org/10.3390/app9030445
Article Google Scholar
Voulodimos A, Doulamis N, Doulamis A, Protopapadakis E (2018) Deep learning for computer vision: a brief review. Hindawi Computational Intelligence and Neuroscience, 2018, 1–13. http://downloads.hindawi.com/journals/cin/2018/7068349.pdf
Wang Y, Wang Y, Jain AK, Tan T (2006) Face verification based on bagging RBF networks. Int Conf Biom. https://doi.org/10.1007/11608288_10
Article Google Scholar
Yang H-D (2015) Sign language recognition with the kinect sensor based on conditional random fields. Sensors 15:135–147
Article Google Scholar
Ye Y, Tian Y, Huenerfauth M, Liu J (2018) Recognizing American Sign Language Gestures from within Continuous Videos. CVPR, Salt Lake City, UT, USA, 2177–2186. https://doi.org/10.1109/CVPRW.2018.00280
Yucer S, Akgul YS (2018) 3D human action recognition with siamese-LSTM based deep metric learning. ArXiv Preprint ArXiv:1807.02131. https://arxiv.org/ftp/arxiv/papers/1807/1807.02131.pdf
Zhang X, Diao W, Cheng Z (2007) Wavelet transform and singular value decomposition of EEG signal for pattern recognition of complicated hand activities. In: International conference on digital human modeling (ICDHM), pp 294–303. https://doi.org/10.1007/978-3-540-73321-8_35
Zhang G, Zou W, Zhang X, Zhao Y (2018a) Singular value decomposition based virtual representation for face recognition. Multimed Tools Appl 77:7171–7186. https://doi.org/10.1007/s11042-017-4627-8
Article Google Scholar
Zhang Z, Tian Z, Zhou MH (2018b) HandSense: smart multimodal hand gesture recognition based on deep neural networks. J Ambient Intell Human Comput. https://doi.org/10.1007/s12652-018-0989-7
Article Google Scholar
Zhao Y, Zhou S, Guyon S, Escalera S, Li SZ (2016) ChaLearn looking at people RGB-D isolated and continuous datasets for gesture recognition. CVPR Workshop, Las Vegas, USA. https://doi.org/10.1109/CVPRW.2016.100
Zimmermann C, Brox T (2017) Learning to estimate 3D hand pose from single RGB images. ICCV, Venice, Italy, Oct 2017, pp 4903–4911. http://openaccess.thecvf.com/content_ICCV_2017/papers/%0AZimmermann_Learning_to_Estimate_ICCV_2017_paper.pdf
Znreza (2019) Training single shot multibox detector, model complexity and mAP. https://ai-diary-by-znreza.com/training-single-shot-multibox-detector-model-complexity-and-map. Accessed Feb 2021

Download references

Acknowledgements

This work has been partially supported by the Spanish project PID2019-105093GB-I00 (MINECO/FEDER, UE) and CERCA Programme/Generalitat de Catalunya, and ICREA under the ICREA Academia programme and High Intelligent Solution (HIS) company in Iran. We thank the NVIDIA Corporation for our processing support.

Funding

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Author information

Authors and Affiliations

Electrical and Computer Engineering Department, Semnan University, 3513119111, Semnan, Iran
Razieh Rastgoo & Kourosh Kiani
Department of Mathematics and Informatics, Universitat de Barcelona, and Computer Vision Center, Barcelona, Spain
Sergio Escalera

Authors

Razieh Rastgoo
View author publications
You can also search for this author in PubMed Google Scholar
Kourosh Kiani
View author publications
You can also search for this author in PubMed Google Scholar
Sergio Escalera
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

RR: methodology, software, data curation, writing original draft, visualization. KK: conceptualization, data curation, writing—review & editing, supervision, project administration. SE: conceptualization, writing—review & editing, supervision, project administration.

Corresponding author

Correspondence to Kourosh Kiani.

Ethics declarations

Conflict of interest

The authors certify that they have no conflict of interest.

Ethics approval

Not applicable.

Consent to participate

Not applicable.

Consent for publication

All authors confirm their consent for publication.

Availability of data and material (data transparency)

Not applicable.

Code availability (software application or custom code)

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rastgoo, R., Kiani, K. & Escalera, S. Real-time isolated hand sign language recognition using deep networks and SVD. J Ambient Intell Human Comput 13, 591–611 (2022). https://doi.org/10.1007/s12652-021-02920-8

Download citation

Received: 13 July 2020
Accepted: 20 January 2021
Published: 16 February 2021
Issue Date: January 2022
DOI: https://doi.org/10.1007/s12652-021-02920-8

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Real-time isolated hand sign language recognition using deep networks and SVD

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Hand pose aware multimodal isolated sign language recognition

Video-based isolated hand sign language recognition using a deep cascaded model

Exploiting 3D Hand Pose Estimation in Deep Learning-Based Sign Language Recognition from RGB Videos

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethics approval

Consent to participate

Consent for publication

Availability of data and material (data transparency)

Code availability (software application or custom code)

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Real-time isolated hand sign language recognition using deep networks and SVD

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Hand pose aware multimodal isolated sign language recognition

Video-based isolated hand sign language recognition using a deep cascaded model

Exploiting 3D Hand Pose Estimation in Deep Learning-Based Sign Language Recognition from RGB Videos

Explore related subjects

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethics approval

Consent to participate

Consent for publication

Availability of data and material (data transparency)

Code availability (software application or custom code)

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation