Abstract
One of the challenges in computer vision models, especially sign language, is real-time recognition. In this work, we present a simple yet low-complex and efficient model, comprising single shot detector, 2D convolutional neural network, singular value decomposition (SVD), and long short term memory, to real-time isolated hand sign language recognition (IHSLR) from RGB video. We employ the SVD method as an efficient, compact, and discriminative feature extractor from the estimated 3D hand keypoints coordinators. Despite the previous works that employ the estimated 3D hand keypoints coordinates as raw features, we propose a novel and revolutionary way to apply the SVD to the estimated 3D hand keypoints coordinates to get more discriminative features. SVD method is also applied to the geometric relations between the consecutive segments of each finger in each hand and also the angles between these sections. We perform a detailed analysis of recognition time and accuracy. One of our contributions is that this is the first time that the SVD method is applied to the hand pose parameters. Results on four datasets, RKS-PERSIANSIGN (\(99.5 \pm 0.04\)), First-Person (\(91 \pm 0.06\)), ASVID (\(93 \pm 0.05\)), and isoGD (\(86.1 \pm 0.04\)), confirm the efficiency of our method in both accuracy (\(mean + std\)) and time recognition. Furthermore, our model outperforms or gets competitive results with the state-of-the-art alternatives in IHSLR and hand action recognition.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Avola D, Bernardi M, Cinque L, Foresti GL, Massaroni C (2019) Exploiting recurrent neural networks and leap motion controller for sign language and semaphoric gesture recognition. IEEE Trans Multimed 21(1):234–245. https://doi.org/10.1109/TMM.2018.2856094
Bachmann D, Weichert F, Rinkenauer G (2018) Review of three-dimensional human-computer interaction with focus on the leap motion controller. Sensors 18(7):2194. https://doi.org/10.3390/s18072194
Basques K, Kearney M (2020) Analyze runtime performance. https://developers.google.com/web/tools/chrome-devtools/rendering-tools/. Accessed Feb 2021
Borg M, Camilleri KP (2020) Phonologically-meaningful subunits for deep learning-based sign language recognition. SLRTP, pp 1–18
Butt AH et al (2018) Objective and automatic classification of Parkinson disease with leap motion controller. Biomed Eng Online 17:168
Cai S, Zhu G, Wu Y, Liu E, Hu X (2018) A case study of gesture-based games in enhancing the fine motor skills and recognition of children with autism. Interact Learn Environ 26:1039–1052
Cao L (2010) Singular value decomposition applied to digital image processing. division of computing studies, Arizona State University Polytechnic Campus, Mesa, Arizona. https://sites.math.washington.edu/~morrow/498_13/svdphoto.pdf. Accessed Feb 2021
Chen Y, Zhao L, Peng X, Yuan J, Metaxas D (2019) Construct dynamic graphs for hand gesture recognition via spatial-temporal attention. BMVC, London, pp 1–13
Cohen MW, Voldman I, Regazzoni D, Vitali A (2018) Hand rehabilitation via gesture recognition using leap motion controller. In: Proceedings of the 11th international conference on human system interaction, HIS, Gdansk, Poland, Jul 2018, pp 404–410. https://doi.org/10.1109/HSI.2018.8431349
Correia de Amorim C, Macedo D, Zanchettin C (2019) Spatial-temporal graph convolutional networks for sign language recognition. In: 28th international conference on artificial neural networks (ICANN2019), Sep 2019, Munich, Germany, pp 1–8. https://e-nns.org/icann2019/online_posters/368.pdf
Darabkh KA, Alturk FH, Sweidan SZ (2018) VRCDEA-TCS: 3D virtual reality cooperative drawing educational application with textual chatting system. Comput Appl Eng Educ 26:1677–1698
Dawes F, Penders J, Carbone G (2019) Remote control of a robotic hand using a leap sensor. In: The international conference of IFToMM ITALY 68. Springer International Publishing, Cham, pp 332–341. https://doi.org/10.1007/978-3-030-03320-0_36
Elboushaki A, Hannane R, Afdel K, Koutti L (2020) MultiD-CNN: a multidimensional feature learning approach based on deep convolutional networks for gesture recognition in RGB-D image sequences. Expert Syst Appl 139:112829. https://doi.org/10.1016/j.eswa.2019.112829
Feng J, Zhang S, Xiao J (2019) Explorations of skeleton features for LSTM-based action recognition. Multimed Tools Appl 78:591–603. https://doi.org/10.1007/s11042-017-5290-9
Garcia-Hernando G, Yuan S, Baek S, Kim T (2018) First-person hand action benchmark with RGB-D videos and 3D hand pose annotations. CVPR, Salt Lake City, UT, USA, Jun 2018, pp 409–419. http://openaccess.thecvf.com/content_cvpr_2018/papers/%0AGarcia-Hernando_First-Person_Hand_Action_CVPR_2018_paper.pdf
Ghanem S, Conly C, Athitsos V (2017) A survey on sign language recognition using smartphones. In: Proceedings of the 10th international conference on pervasive technologies related to assistive environments, Island of Rhodes Greece, June 2017, pp 171–176. https://doi.org/10.1145/3056540.3056549
Gokce C, Ozdemir O, Kındıroglu A, Akarun L (2020) Score-level multi cue fusion for sign language recognition. SLRTP, pp 1–16
Gomez-Donoso F, Orts-Escolano S, Cazorla M (2019) Accurate and efficient 3D hand pose regression for robot hand teleoperation using a monocular RGB camera. Expert Syst Appl 136:327–337. https://doi.org/10.1016/j.eswa.2019.06.055
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8): 1735–1780. https://www.bioinf.jku.at/publications/older/2604.pdf
Hosain AA, Selvam Santhalingam P, Pathak P, Koseck J, Rangwala H (2019) Sign language recognition analysis using multimodal data. The 6th IEEE international conference on data science and advanced analytics, Oct 2019, Washington DC, USA. https://arxiv.org/abs/1909.11232
Huh D, Gurrapu S, Olson F, Rangwala H, Pathak P, Kosecka J (2020) Generative multi-stream architecture for american sign language recognition, pp 1–5. ArXiv Preprint ArXiv:2003.08743v1. https://arxiv.org/pdf/2003.08743.pdf
Lee I, Kim D, Kang S, Lee S (2017) Ensemble deep learning for skeleton-based action recognition using temporal sliding LSTM networks. In: 2017 IEEE international conference on computer vision (ICCV), Venice, Italy, pp 1012–1020. https://doi.org/10.1109/ICCV.2017.115
Li C, Wang P, Wang S, Hou Y, Li W (2017) Skeleton-based action recognition using lstm and CNN. In: 2017 IEEE international conference on multimedia and expo workshops (ICMEW), Hong Kong, pp 1–6. https://doi.org/10.1109/ICMEW.2017.8026287
Li R, Zou K, Wang W (2020) Application of human body gesture recognition algorithm based on deep learning in non-contact human body measurement. J Ambient Intell Hum Comput. https://doi.org/10.1007/s12652-020-01993-1
Lim KM, Tan AW, Tan SC (2016) Block-based histogram of optical flow for isolated sign language recognition. J Vis Commun Image Represent 40:538–545
Liu J, Shahroudy A, Xu D, Kot AC, Wang G (2018a) Skeleton-based action recognition using spatio-temporal LSTM network with trust gates. IEEE Trans Pattern Anal Mach Intell 40(12):3007–3021. https://doi.org/10.1109/TPAMI.2017.2771306
Liu J, Wang G, Duan LY, Abdiyeva K, Kot AC (2018b) Skeleton based human action recognition with global context-aware attention LSTM networks. IEEE Trans Image Process 27(4):1586–1599. https://doi.org/10.1109/TIP.2017.2785279
Majidi N, Kiani K, Rastgoo R (2020) A deep model for super-resolution enhancement from a single image. J AI Data Mining 8:451–460. https://doi.org/10.22044/JADM.2020.9131.2052
Morando M, Ponte S, Ferrara E, Dellepiane S (2018) Definition of motion and biophysical indicators for home-based rehabilitation through serious games. Information 9:105
Mustafa M (2020) A study on Arabic sign language recognition for differently abled using advanced machine learning classifiers. J Ambient Intell Human Comput. https://doi.org/10.1007/s12652-020-01790-w
Neidle C, Thangali A, Sclaroff S (2012) Challenges in development of the American sign language lexicon video dataset (ASLLVD) corpus. In: 5th workshop on the representation and processing of sign languages: interactions between corpus and Lexicon, LREC 2012, Istanbul, Turkey, May 2012. http://www.bu.edu/asllrp/av/dai-asllvd.html
Rastgoo R, Kiani K, Escalera S (2018) Multi-modal deep hand sign language recognition in still images using restricted Boltzmann machine. Entropy 20(11):809. Retrieved from https://www.mdpi.com/1099-4300/20/11/809
Rastgoo R, Kiani K, Escalera S (2020a) Hand sign language recognition using multi-view hand skeleton. Expert Syst Appl 150:113336. https://doi.org/10.1016/j.eswa.2020.113336
Rastgoo R, Kiani K, Escalera S (2020b) Video-based isolated hand sign language recognition using a deep cascaded model. Multimed Tools Appl. https://doi.org/10.1007/s11042-020-09048-5
Rastgoo R, Kiani K, Escalera S (2021a) Hand pose aware multimodal isolated sign language recognition. Multimed Tools Appl 80:127–163. https://doi.org/10.1007/s11042-020-09700-0
Rastgoo R, Kiani K, Escalera S (2021b) Sign language recognition: a deep survey. Expert Syst Appl 164:113794. https://doi.org/10.1016/j.eswa.2020.113794
Roccetti M, Marfia G, Semeraro A (2012) Playing into the wild: a gesture-based interface for gaming in public spaces. Play Wild Gesture Based Interface Gaming Public Spaces 23:426–440
Sadek A (2012) SVD based image processing applications: state of the art, contributions and research challenges. (IJACSA) Int J Adv Comput Sci Appl 3: 26–34. https://arxiv.org/ftp/arxiv/papers/1211/1211.7102.pdf
Vaitkevičius A, Taroza M, Blažauskas T, Damaševičius R, Maskeliūnas R, Woźniak M (2019) Recognition of American sign language gestures in a virtual reality using leap motion. Appl Sci 9(3):445. https://doi.org/10.3390/app9030445
Voulodimos A, Doulamis N, Doulamis A, Protopapadakis E (2018) Deep learning for computer vision: a brief review. Hindawi Computational Intelligence and Neuroscience, 2018, 1–13. http://downloads.hindawi.com/journals/cin/2018/7068349.pdf
Wang Y, Wang Y, Jain AK, Tan T (2006) Face verification based on bagging RBF networks. Int Conf Biom. https://doi.org/10.1007/11608288_10
Yang H-D (2015) Sign language recognition with the kinect sensor based on conditional random fields. Sensors 15:135–147
Ye Y, Tian Y, Huenerfauth M, Liu J (2018) Recognizing American Sign Language Gestures from within Continuous Videos. CVPR, Salt Lake City, UT, USA, 2177–2186. https://doi.org/10.1109/CVPRW.2018.00280
Yucer S, Akgul YS (2018) 3D human action recognition with siamese-LSTM based deep metric learning. ArXiv Preprint ArXiv:1807.02131. https://arxiv.org/ftp/arxiv/papers/1807/1807.02131.pdf
Zhang X, Diao W, Cheng Z (2007) Wavelet transform and singular value decomposition of EEG signal for pattern recognition of complicated hand activities. In: International conference on digital human modeling (ICDHM), pp 294–303. https://doi.org/10.1007/978-3-540-73321-8_35
Zhang G, Zou W, Zhang X, Zhao Y (2018a) Singular value decomposition based virtual representation for face recognition. Multimed Tools Appl 77:7171–7186. https://doi.org/10.1007/s11042-017-4627-8
Zhang Z, Tian Z, Zhou MH (2018b) HandSense: smart multimodal hand gesture recognition based on deep neural networks. J Ambient Intell Human Comput. https://doi.org/10.1007/s12652-018-0989-7
Zhao Y, Zhou S, Guyon S, Escalera S, Li SZ (2016) ChaLearn looking at people RGB-D isolated and continuous datasets for gesture recognition. CVPR Workshop, Las Vegas, USA. https://doi.org/10.1109/CVPRW.2016.100
Zimmermann C, Brox T (2017) Learning to estimate 3D hand pose from single RGB images. ICCV, Venice, Italy, Oct 2017, pp 4903–4911. http://openaccess.thecvf.com/content_ICCV_2017/papers/%0AZimmermann_Learning_to_Estimate_ICCV_2017_paper.pdf
Znreza (2019) Training single shot multibox detector, model complexity and mAP. https://ai-diary-by-znreza.com/training-single-shot-multibox-detector-model-complexity-and-map. Accessed Feb 2021
Acknowledgements
This work has been partially supported by the Spanish project PID2019-105093GB-I00 (MINECO/FEDER, UE) and CERCA Programme/Generalitat de Catalunya, and ICREA under the ICREA Academia programme and High Intelligent Solution (HIS) company in Iran. We thank the NVIDIA Corporation for our processing support.
Funding
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
Author information
Authors and Affiliations
Contributions
RR: methodology, software, data curation, writing original draft, visualization. KK: conceptualization, data curation, writing—review & editing, supervision, project administration. SE: conceptualization, writing—review & editing, supervision, project administration.
Corresponding author
Ethics declarations
Conflict of interest
The authors certify that they have no conflict of interest.
Ethics approval
Not applicable.
Consent to participate
Not applicable.
Consent for publication
All authors confirm their consent for publication.
Availability of data and material (data transparency)
Not applicable.
Code availability (software application or custom code)
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Rastgoo, R., Kiani, K. & Escalera, S. Real-time isolated hand sign language recognition using deep networks and SVD. J Ambient Intell Human Comput 13, 591–611 (2022). https://doi.org/10.1007/s12652-021-02920-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12652-021-02920-8