Abstract
Deaf people communicate naturally through sign languages and often face barriers to communicating with hearing people and accessing information in written languages. These communication difficulties are aggravated in the health domain, especially in a hospital emergency, when human sign language interpreters are unavailable. This paper proposes a solution for automatically recognizing signs in Brazilian Sign Language (Libras) in the health context to reduce this problem. The idea is that the system could assist in the communication between a Deaf patient and his doctor in the future. Our solution involves a multiple-stream architecture that combines convolutional and recurrent neural networks, dealing with sign languages’ visual phonemes individual and specialized ways. The first stream uses the optical flow as input for capturing information about the “movement” of the sign; the second stream extracts kinematic and postural features, including “handshapes” and “facial expressions”; and the third stream process the raw RGB images to address additional attributes about the sign not captured in the previous streams. Thus, we can process more spatiotemporal features that discriminate the classes during the training stage. The computational results show that the solution can recognize signs in Libras in the health context, with an average accuracy, precision, recall, and f1-score of 99.80%, 99.81%, 99.80%, and 99.80%, respectively. Our system also performed better than other works in the literature, obtaining an average accuracy of 100% in an Argentine Sign Language (LSA) dataset, which is usually used for comparison purposes.
Similar content being viewed by others
Availability of data and material
Data generated or used during the study is available from the corresponding author by request.
Notes
The multi-stream architectures present multiple channels with different data and processing that are merged using concatenative, additive, subtractive multiplicative, statistical, among others
References
Akmeliawati R, Ooi MPL, Kuang YC (2007) Real-time Malaysian sign language translation using colour segmentation and neural network. In: 2007 IEEE Instrumentation & Measurement Technology Conference IMTC 2007. IEEE. https://doi.org/10.1109/imtc.2007.379311
Aragão JDS, Francisco ISXD, Coura AS, Sousa FSD, Batista JDL, Magalhões IMDO (2007) A content validity study of signs, symptoms and diseases/health problems expressed in LIBRAS. Revista Latino-Americana de Enfermagem 23:1014–1023. http://www.scielo.br/scielo.php?script=sci_arttext &pid=S0104-11692015000601014 &nrm=iso
Araujo T, Ferreira F, Silva D, Oliveira L, Falcão E, Martins V, Portela I, Nóbrega Y, Lima H, Souza Filho G, Tavares T, Duarte A (2014) An approach to generate and embed sign language video tracks into multimedia contents. Inf Sci 281:762. https://doi.org/10.1016/j.ins.2014.04.008
de Araújo TMU, Ferreira FLS, dos S. Silva DAN, Lemos FH, Neto GPB, Omaia D, de Souza Filho GL, Tavares TA (2012) Automatic generation of Brazilian sign language windows for digital TV systems. J Braz Comput Soc 19:107–125
Bessa Carneiro S, De M. Santos EDF, De A. Barbosa TM, Ferreira JO, Soares Alcalá SG, Da Rocha AF (2016) Static gestures recognition for Brazilian sign language with kinect sensor. In: 2016 IEEE Sensors. pp 1–3
Bhatti UA, Huang M, Wang H, Zhang Y, Mehmood A, Di W (2018) Recommendation system for immunization coverage and monitoring. Hum Vaccin Immunother 14(1):165–171. https://doi.org/10.1080/21645515.2017.1379639. (PMID: 29068748)
Bhatti UA, Huang M, Wu D, Zhang Y, Mehmood A, Han H (2019) Recommendation system using feature extraction and pattern recognition in clinical care systems. Enterprise Information Systems 13(3):329–351. https://doi.org/10.1080/17517575.2018.1557256
Binh ND, Ejima T (2005) Real-time Malaysian sign language translation using colour segmentation and neural network. In: Proceeding of ICGST International Conference Graphics, Vision and Image Processing. pp 1–6
Boháček M, Hrúz M (2022) Sign pose-based transformer for word-level sign language recognition. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Workshops. pp 182–191
Bradski G (2000) The OpenCV Library. Dr. Dobb’s Journal of Software Tools
Cao Z, Hidalgo G, Simon T, Wei S, Sheikh Y (2018) Openpose: realtime multi-person 2D pose estimation using part affinity fields. CoRR abs/1812.08008. http://arxiv.org/abs/1812.08008
Carreira J, Zisserman A (2017) Quo vadis, action recognition? A new model and the kinetics dataset
Castiglioni I, Rundo L, Codari M, Di Leo G, Salvatore C, Interlenghi M, Gallivanone F, Cozzi A, D’Amico NC, Sardanelli F (2021) AI applications to medical images: from machine learning to deep learning. Physica Med 83:9–24
Cavararo R (2010) Características gerais da população, religião e pessoas com defciência. Instituto Brasileiro de Geografa e Estatística (IBGE). https://biblioteca.ibge.gov.br/visualizacao/periodicos/94/cd_2010_religiao_deficiencia.pdf
Cheok MJ, Omar Z, Jaward MH (2017) A review of hand gesture and sign language recognition techniques. Int J Mach Learn Cybern. https://doi.org/10.1007/s13042-017-0705-5
Chollet F et al (2015) Keras. https://keras.io
Chuan C, Regina E, Guardino C (2014) American sign language recognition using leap motion sensor. In: 2014 13th International Conference on Machine Learning and Applications. pp 541–544
CNsaúde: Cenário dos Hospitais no Brasil. S.N. (2022). http://cnsaude.org.br/wp-content/uploads/2022/07/CNSAUDE-FBH-CENARIOS-2022.pdf
Cooper H, Holt B, Bowden R (2011) Sign language recognition. In: Visual Analysis of Humans. Springer London, pp 539–562. https://doi.org/10.1007/978-0-85729-997-0-27
Cooper H, Ong E, Pugeault N, Bowden R (2017) Sign language recognition using sub-units. pp 89–118. https://doi.org/10.1007/978-3-319-57021-1_3
Cooper H, Pugeault N, Bowden R (2011). Reading the signs: a video based sign dictionary. https://doi.org/10.1109/iccvw.2011.6130349
Dignan C, Perez E, Ahmad I, Huber M, Clark A (2022) An AI-based approach for improved sign language recognition using multiple videos. Multimed Tools Appl 81(24):34525–34546. https://doi.org/10.1007/s11042-021-11830-y
Donahue J, Hendricks LA, Rohrbach M, Venugopalan S, Guadarrama S, Saenko K, Darrell T (2014) Long-term recurrent convolutional networks for visual recognition and description. Preprint at http://arxiv.org/abs/1411.4389
Elemento O, Leslie C, Lundin J, Tourassi G (2021) Artificial intelligence in cancer research, diagnosis and therapy. Nat Rev Cancer 21(12):747–752
Fakhfakh S, Jemaa YB (2022) Deep learning shape trajectories for isolated word sign language recognition. Int Arab J Inf Technol 19(4):660–666
Galicia R, Carranza O, Jiménez ED, Rivera GE (2015) Mexican sign language recognition using movement sensor. In: 2015 IEEE 24th International Symposium on Industrial Electronics (ISIE). pp 573–578
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
Huenerfauth M (2004). A multi-path architecture for machine translation of English text into American sign language animation. https://doi.org/10.3115/1614038.1614043
Huenerfauth M (2008) Generating American sign language animation: overcoming misconceptions and technical challenges. Univ Access Inf Soc 6:419–434. https://doi.org/10.1007/s10209-007-0095-7
Jani AB, Kotak NA, Roy AK (2018) Sensor based hand gesture recognition system for English alphabets used in sign language of deaf-mute people. In: 2018 IEEE Sensors. pp 1–4
Kau L, Su W, Yu P, Wei S (2015) A real-time portable sign language translation system. In: 2015 IEEE 58th International Midwest Symposium on Circuits and Systems (MWSCAS). pp 1–4
Kaya F, Tuncer AF, Yildiz ŞK (2018) Detection of the Turkish sign language alphabet with strain sensor based data glove. In: 2018 26th Signal Processing and Communications Applications Conference (SIU). pp 1–4
Konstantinidis D, Dimitropoulos K, Daras P (2018) A deep learning approach for analyzing video and skeletal features in sign language recognition. In: 2018 IEEE International Conference on Imaging Systems and Techniques (IST). pp 1–6. https://doi.org/10.1109/IST.2018.8577085
Konstantinidis D, Dimitropoulos K, Daras P (2018) Sign language recognition based on hand and body skeletal data. In: 2018 - 3DTV-Conference: The True Vision - Capture, Transmission and Display of 3D Video (3DTV-CON). pp 1–4. https://doi.org/10.1109/3DTV.2018.8478467
Lab C (2015) Ranking web of world hospitals. https://hospitals.webometrics.info/
Li T, Li J, Liu J, Huang M, Chen YW, Bhatti UA (2022) Robust watermarking algorithm for medical images based on log-polar transform. EURASIP J Wirel Commun Netw 2022(1):24. https://doi.org/10.1186/s13638-022-02106-6
López-Ludeña V, Morcillo C, López JC, Barra-Chicote R, Cordoba R, Hernandez R (2014) Translating bus information into sign language for deaf people. Eng Appl Artif Intell 32. https://doi.org/10.1016/j.engappai.2014.02.006
López-Ludeña V, Morcillo C, López JC, Ferreiro E, Ferreiros J, Hernandez R (2014) Methodology for developing an advanced communications system for the deaf in a new domain. Knowl-Based Syst 56:240–252. https://doi.org/10.1016/j.knosys.2013.11.017
Lu J, Nguyen M, Yan WQ (2021) Sign language recognition from digital videos using deep learning methods. In: Nguyen M, Yan WQ, Ho H (eds) Geometry and Vision. Springer International Publishing, Cham, pp 108–118
Machado MC (2018) Classificação automática de sinais visuais da língua brasileira de sinais representados por caracterização espaço-temporal. Master’s thesis. https://tede.ufam.edu.br/handle/tede/6645. Instituto de Computação
Masood S, Srivastava A, Thuwal H, Ahmad M (2018) Real-time sign language gesture (word) recognition from video sequences using CNN and RNN. pp 623–632. https://doi.org/10.1007/978-981-10-7566-7_63
Mistree K, Thakor D, Bhatt B (2021) Towards Indian sign language sentence recognition using INSIGNVID: Indian sign language video dataset. Int J Adv Comput Sci Appl 12(8)
Morrissey S, Way A (2013) Manual labour: tackling machine translation for sign languages. Mach Transl 27. https://doi.org/10.1007/s10590-012-9133-1
Ong EJ, Koller O, Pugeault N, Bowden R (2014). Sign spotting using hierarchical sequential patterns with temporal intervals. https://doi.org/10.1109/CVPR.2014.248
World Health Organization (2013) Millions of people in the world have hearing loss that can be treated or prevented. WHO. encurtador.com.br/qOXZ8
Oszust M, Wysocki M (2013) Polish sign language words recognition with Kinect. In: 2013 6th International Conference on Human System Interactions (HSI). pp 219–226
Parelli M, Papadimitriou K, Potamianos G, Pavlakos G, Maragos P (2020) Exploiting 3D hand pose estimation in deep learning-based sign language recognition from RGB videos. In: Bartoli A, Fusiello A (eds) Computer Vision - ECCV 2020 Workshops. Springer International Publishing, Cham, pp 249–263
Pigou L, Dieleman S, Kindermans PJ, Schrauwen B (2015) Sign language recognition using convolutional neural networks. In: Computer Vision - ECCV 2014 Workshops. Springer International Publishing, pp 572–578. https://doi.org/10.1007/978-3-319-16178-5_40
Rastgoo R, Kiani K, Escalera S (2021) Hand pose aware multimodal isolated sign language recognition. Multimed Tools Appl 80(1):127–163. https://doi.org/10.1007/s11042-020-09700-0
Rastgoo R, Kiani K, Escalera S (2022) Real-time isolated hand sign language recognition using deep networks and SVD. J Ambient Intell Humaniz Comput 13(1):591–611. https://doi.org/10.1007/s12652-021-02920-8
Ronchetti F, Quiroga F, Estrebou C, Lanzarini L, Rosete A (2016) LSA64: a dataset of Argentinian sign language. XX II Congreso Argentino de Ciencias de la Computación (CACIC)
Ronchetti F, Quiroga F, Estrebou C, Lanzarini L, Rosete A (2016) LSA64: an Argentinian sign language dataset
Sharma S, Kumar K (2021) ASL-3DCNN: American sign language recognition technique using 3-D convolutional neural networks. Multimed Tools Appl 80(17):26319–26331. https://doi.org/10.1007/s11042-021-10768-5
Shoaib U, Ahmad N, Prinetto P, Tiotto G (2013) Integrating multiwordnet with Italian sign language lexical resources. Expert Syst Appl. https://doi.org/10.1016/j.eswa.2013.09.027
Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. Preprint at http://arxiv.org/abs/1406.2199
de Souza MFNS, Araújo AMB, Sandes LFF, Freitas DA, Soares WD, de Mello Vianna RS, de Sousa ÁAD (2017) Principais dificuldades e obstáculos enfrentados pela comunidade surda no acesso à saúde: uma revisão integrativa de literatura. Revista CEFAC 19(3)395–405. https://doi.org/10.1590/1982-0216201719317116
Srinivasu PN, SivaSai JG, Ijaz MF, Bhoi AK, Kim W, Kang JJ (2021) Classification of skin disease using deep learning neural networks with mobilenet V2 and LSTM. Sensors 21(8). https://doi.org/10.3390/s21082852. https://www.mdpi.com/1424-8220/21/8/2852
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2015) Rethinking the inception architecture for computer vision. Preprint at http://arxiv.org/abs/1512.00567
Tran WT, Sadeghi-Naini A, Lu FI, Gandhi S, Meti N, Brackstone M, Rakovitch E, Curpen B (2021) Computational radiology in breast cancer screening and diagnosis using artificial intelligence. Can Assoc Radiol J 72(1):98–108
Vazquez-Enriquez M, Alba-Castro JL, Docio-Fernandez L, Rodriguez-Banga E (2021) Isolated sign language recognition with multi-scale spatial-temporal graph convolutional networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops. pp 3462–3471
Wadhawan A, Kumar P (2020) Deep learning-based sign language recognition system for static signs. Neural Comput Appl 32(12):7957–7968. https://doi.org/10.1007/s00521-019-04691-y
Wan J, Li SZ, Zhao Y, Zhou S, Guyon I, Escalera S (2016) Chalearn looking at people RGB-D isolated and continuous datasets for gesture recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). pp 761–769
Wu J, Sun L, Jafari R (2016) A wearable system for recognizing American sign language in real-time using IMU and surface EMG sensors. IEEE J Biomed Health Inform 20(5):1281–1290
Wu Z, Wang X, Jiang YG, Ye H, Xue X (2015) Modeling spatial-temporal clues in a hybrid deep learning framework for video classification
Yadav A, Verma D, Kumar A, Kumar P, Solanki P (2021) The perspectives of biomarker-based electrochemical immunosensors, artificial intelligence and the internet of medical things towardáCOVID-19 diagnosis and management. Mater Today Chem 20:100443
Ye H, Wu Z, Zhao RW, Wang X, Jiang YG, Xue X (2015) Evaluating two-stream CNN for video classification. In: Proceedings of the 5th ACM on International Conference on Multimedia Retrieval, ICMR ’15. Association for Computing Machinery, New York, NY, USA, pp 435–442. https://doi.org/10.1145/2671188.2749406
Zhang L, Zhu G, Shen P, Song J, Shah SA, Bennamoun M (2017) Learning spatiotemporal features using 3DCNN and convolutional LSTM for gesture recognition. In: 2017 IEEE International Conference on Computer Vision Workshops (ICCVW). pp 3120–3128
Acknowledgements
This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001. We gratefully acknowledge NVIDIA Corporation’s support with the donation of a Quadro P6000 used for this research.
Funding
This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001.
Author information
Authors and Affiliations
Contributions
Diego R. B. da Silva and Tiago Maritan U. de Araújo conceived and designed the approach. Thaís Gaudencio do Rêgo contributed to the experimental design of the study. Manuella Aschoff Cavalcanti Brandão helped with data collection. The first draft of the manuscript was written by Diego R. B. da Silva. Tiago Maritan U. de Araújo and Luiz Marcos G. Gonçalves thoroughly corrected the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval
Not applicable.
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
da Silva, D.R.B., de Araújo, T.M.U., do Rêgo, T.G. et al. A multiple stream architecture for the recognition of signs in Brazilian sign language in the context of health. Multimed Tools Appl 83, 19767–19785 (2024). https://doi.org/10.1007/s11042-023-16332-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-16332-7