A multiple stream architecture for the recognition of signs in Brazilian sign language in the context of health | Multimedia Tools and Applications Skip to main content

Advertisement

Log in

A multiple stream architecture for the recognition of signs in Brazilian sign language in the context of health

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Deaf people communicate naturally through sign languages and often face barriers to communicating with hearing people and accessing information in written languages. These communication difficulties are aggravated in the health domain, especially in a hospital emergency, when human sign language interpreters are unavailable. This paper proposes a solution for automatically recognizing signs in Brazilian Sign Language (Libras) in the health context to reduce this problem. The idea is that the system could assist in the communication between a Deaf patient and his doctor in the future. Our solution involves a multiple-stream architecture that combines convolutional and recurrent neural networks, dealing with sign languages’ visual phonemes individual and specialized ways. The first stream uses the optical flow as input for capturing information about the “movement” of the sign; the second stream extracts kinematic and postural features, including “handshapes” and “facial expressions”; and the third stream process the raw RGB images to address additional attributes about the sign not captured in the previous streams. Thus, we can process more spatiotemporal features that discriminate the classes during the training stage. The computational results show that the solution can recognize signs in Libras in the health context, with an average accuracy, precision, recall, and f1-score of 99.80%, 99.81%, 99.80%, and 99.80%, respectively. Our system also performed better than other works in the literature, obtaining an average accuracy of 100% in an Argentine Sign Language (LSA) dataset, which is usually used for comparison purposes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Availability of data and material

Data generated or used during the study is available from the corresponding author by request.

Notes

  1. The multi-stream architectures present multiple channels with different data and processing that are merged using concatenative, additive, subtractive multiplicative, statistical, among others

References

  1. Akmeliawati R, Ooi MPL, Kuang YC (2007) Real-time Malaysian sign language translation using colour segmentation and neural network. In: 2007 IEEE Instrumentation & Measurement Technology Conference IMTC 2007. IEEE. https://doi.org/10.1109/imtc.2007.379311

  2. Aragão JDS, Francisco ISXD, Coura AS, Sousa FSD, Batista JDL, Magalhões IMDO (2007) A content validity study of signs, symptoms and diseases/health problems expressed in LIBRAS. Revista Latino-Americana de Enfermagem 23:1014–1023. http://www.scielo.br/scielo.php?script=sci_arttext &pid=S0104-11692015000601014 &nrm=iso

  3. Araujo T, Ferreira F, Silva D, Oliveira L, Falcão E, Martins V, Portela I, Nóbrega Y, Lima H, Souza Filho G, Tavares T, Duarte A (2014) An approach to generate and embed sign language video tracks into multimedia contents. Inf Sci 281:762. https://doi.org/10.1016/j.ins.2014.04.008

    Article  Google Scholar 

  4. de Araújo TMU, Ferreira FLS, dos S. Silva DAN, Lemos FH, Neto GPB, Omaia D, de Souza Filho GL, Tavares TA (2012) Automatic generation of Brazilian sign language windows for digital TV systems. J Braz Comput Soc 19:107–125

  5. Bessa Carneiro S, De M. Santos EDF, De A. Barbosa TM, Ferreira JO, Soares Alcalá SG, Da Rocha AF (2016) Static gestures recognition for Brazilian sign language with kinect sensor. In: 2016 IEEE Sensors. pp 1–3

  6. Bhatti UA, Huang M, Wang H, Zhang Y, Mehmood A, Di W (2018) Recommendation system for immunization coverage and monitoring. Hum Vaccin Immunother 14(1):165–171. https://doi.org/10.1080/21645515.2017.1379639. (PMID: 29068748)

    Article  PubMed  Google Scholar 

  7. Bhatti UA, Huang M, Wu D, Zhang Y, Mehmood A, Han H (2019) Recommendation system using feature extraction and pattern recognition in clinical care systems. Enterprise Information Systems 13(3):329–351. https://doi.org/10.1080/17517575.2018.1557256

    Article  ADS  Google Scholar 

  8. Binh ND, Ejima T (2005) Real-time Malaysian sign language translation using colour segmentation and neural network. In: Proceeding of ICGST International Conference Graphics, Vision and Image Processing. pp 1–6

  9. Boháček M, Hrúz M (2022) Sign pose-based transformer for word-level sign language recognition. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Workshops. pp 182–191

  10. Bradski G (2000) The OpenCV Library. Dr. Dobb’s Journal of Software Tools

  11. Cao Z, Hidalgo G, Simon T, Wei S, Sheikh Y (2018) Openpose: realtime multi-person 2D pose estimation using part affinity fields. CoRR abs/1812.08008. http://arxiv.org/abs/1812.08008

  12. Carreira J, Zisserman A (2017) Quo vadis, action recognition? A new model and the kinetics dataset

  13. Castiglioni I, Rundo L, Codari M, Di Leo G, Salvatore C, Interlenghi M, Gallivanone F, Cozzi A, D’Amico NC, Sardanelli F (2021) AI applications to medical images: from machine learning to deep learning. Physica Med 83:9–24

    Article  Google Scholar 

  14. Cavararo R (2010) Características gerais da população, religião e pessoas com defciência. Instituto Brasileiro de Geografa e Estatística (IBGE). https://biblioteca.ibge.gov.br/visualizacao/periodicos/94/cd_2010_religiao_deficiencia.pdf

  15. Cheok MJ, Omar Z, Jaward MH (2017) A review of hand gesture and sign language recognition techniques. Int J Mach Learn Cybern. https://doi.org/10.1007/s13042-017-0705-5

    Article  Google Scholar 

  16. Chollet F et al (2015) Keras. https://keras.io

  17. Chuan C, Regina E, Guardino C (2014) American sign language recognition using leap motion sensor. In: 2014 13th International Conference on Machine Learning and Applications. pp 541–544

  18. CNsaúde: Cenário dos Hospitais no Brasil. S.N. (2022). http://cnsaude.org.br/wp-content/uploads/2022/07/CNSAUDE-FBH-CENARIOS-2022.pdf

  19. Cooper H, Holt B, Bowden R (2011) Sign language recognition. In: Visual Analysis of Humans. Springer London, pp 539–562. https://doi.org/10.1007/978-0-85729-997-0-27

  20. Cooper H, Ong E, Pugeault N, Bowden R (2017) Sign language recognition using sub-units. pp 89–118. https://doi.org/10.1007/978-3-319-57021-1_3

  21. Cooper H, Pugeault N, Bowden R (2011). Reading the signs: a video based sign dictionary. https://doi.org/10.1109/iccvw.2011.6130349

  22. Dignan C, Perez E, Ahmad I, Huber M, Clark A (2022) An AI-based approach for improved sign language recognition using multiple videos. Multimed Tools Appl 81(24):34525–34546. https://doi.org/10.1007/s11042-021-11830-y

    Article  Google Scholar 

  23. Donahue J, Hendricks LA, Rohrbach M, Venugopalan S, Guadarrama S, Saenko K, Darrell T (2014) Long-term recurrent convolutional networks for visual recognition and description. Preprint at http://arxiv.org/abs/1411.4389

  24. Elemento O, Leslie C, Lundin J, Tourassi G (2021) Artificial intelligence in cancer research, diagnosis and therapy. Nat Rev Cancer 21(12):747–752

    Article  CAS  PubMed  Google Scholar 

  25. Fakhfakh S, Jemaa YB (2022) Deep learning shape trajectories for isolated word sign language recognition. Int Arab J Inf Technol 19(4):660–666

    Google Scholar 

  26. Galicia R, Carranza O, Jiménez ED, Rivera GE (2015) Mexican sign language recognition using movement sensor. In: 2015 IEEE 24th International Symposium on Industrial Electronics (ISIE). pp 573–578

  27. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735

    Article  CAS  PubMed  Google Scholar 

  28. Huenerfauth M (2004). A multi-path architecture for machine translation of English text into American sign language animation. https://doi.org/10.3115/1614038.1614043

  29. Huenerfauth M (2008) Generating American sign language animation: overcoming misconceptions and technical challenges. Univ Access Inf Soc 6:419–434. https://doi.org/10.1007/s10209-007-0095-7

    Article  Google Scholar 

  30. Jani AB, Kotak NA, Roy AK (2018) Sensor based hand gesture recognition system for English alphabets used in sign language of deaf-mute people. In: 2018 IEEE Sensors. pp 1–4

  31. Kau L, Su W, Yu P, Wei S (2015) A real-time portable sign language translation system. In: 2015 IEEE 58th International Midwest Symposium on Circuits and Systems (MWSCAS). pp 1–4

  32. Kaya F, Tuncer AF, Yildiz ŞK (2018) Detection of the Turkish sign language alphabet with strain sensor based data glove. In: 2018 26th Signal Processing and Communications Applications Conference (SIU). pp 1–4

  33. Konstantinidis D, Dimitropoulos K, Daras P (2018) A deep learning approach for analyzing video and skeletal features in sign language recognition. In: 2018 IEEE International Conference on Imaging Systems and Techniques (IST). pp 1–6. https://doi.org/10.1109/IST.2018.8577085

  34. Konstantinidis D, Dimitropoulos K, Daras P (2018) Sign language recognition based on hand and body skeletal data. In: 2018 - 3DTV-Conference: The True Vision - Capture, Transmission and Display of 3D Video (3DTV-CON). pp 1–4. https://doi.org/10.1109/3DTV.2018.8478467

  35. Lab C (2015) Ranking web of world hospitals. https://hospitals.webometrics.info/

  36. Li T, Li J, Liu J, Huang M, Chen YW, Bhatti UA (2022) Robust watermarking algorithm for medical images based on log-polar transform. EURASIP J Wirel Commun Netw 2022(1):24. https://doi.org/10.1186/s13638-022-02106-6

    Article  Google Scholar 

  37. López-Ludeña V, Morcillo C, López JC, Barra-Chicote R, Cordoba R, Hernandez R (2014) Translating bus information into sign language for deaf people. Eng Appl Artif Intell 32. https://doi.org/10.1016/j.engappai.2014.02.006

  38. López-Ludeña V, Morcillo C, López JC, Ferreiro E, Ferreiros J, Hernandez R (2014) Methodology for developing an advanced communications system for the deaf in a new domain. Knowl-Based Syst 56:240–252. https://doi.org/10.1016/j.knosys.2013.11.017

    Article  Google Scholar 

  39. Lu J, Nguyen M, Yan WQ (2021) Sign language recognition from digital videos using deep learning methods. In: Nguyen M, Yan WQ, Ho H (eds) Geometry and Vision. Springer International Publishing, Cham, pp 108–118

    Chapter  Google Scholar 

  40. Machado MC (2018) Classificação automática de sinais visuais da língua brasileira de sinais representados por caracterização espaço-temporal. Master’s thesis. https://tede.ufam.edu.br/handle/tede/6645. Instituto de Computação

  41. Masood S, Srivastava A, Thuwal H, Ahmad M (2018) Real-time sign language gesture (word) recognition from video sequences using CNN and RNN. pp 623–632. https://doi.org/10.1007/978-981-10-7566-7_63

  42. Mistree K, Thakor D, Bhatt B (2021) Towards Indian sign language sentence recognition using INSIGNVID: Indian sign language video dataset. Int J Adv Comput Sci Appl 12(8)

  43. Morrissey S, Way A (2013) Manual labour: tackling machine translation for sign languages. Mach Transl 27. https://doi.org/10.1007/s10590-012-9133-1

  44. Ong EJ, Koller O, Pugeault N, Bowden R (2014). Sign spotting using hierarchical sequential patterns with temporal intervals. https://doi.org/10.1109/CVPR.2014.248

  45. World Health Organization (2013) Millions of people in the world have hearing loss that can be treated or prevented. WHO. encurtador.com.br/qOXZ8

  46. Oszust M, Wysocki M (2013) Polish sign language words recognition with Kinect. In: 2013 6th International Conference on Human System Interactions (HSI). pp 219–226

  47. Parelli M, Papadimitriou K, Potamianos G, Pavlakos G, Maragos P (2020) Exploiting 3D hand pose estimation in deep learning-based sign language recognition from RGB videos. In: Bartoli A, Fusiello A (eds) Computer Vision - ECCV 2020 Workshops. Springer International Publishing, Cham, pp 249–263

    Chapter  Google Scholar 

  48. Pigou L, Dieleman S, Kindermans PJ, Schrauwen B (2015) Sign language recognition using convolutional neural networks. In: Computer Vision - ECCV 2014 Workshops. Springer International Publishing, pp 572–578. https://doi.org/10.1007/978-3-319-16178-5_40

  49. Rastgoo R, Kiani K, Escalera S (2021) Hand pose aware multimodal isolated sign language recognition. Multimed Tools Appl 80(1):127–163. https://doi.org/10.1007/s11042-020-09700-0

    Article  Google Scholar 

  50. Rastgoo R, Kiani K, Escalera S (2022) Real-time isolated hand sign language recognition using deep networks and SVD. J Ambient Intell Humaniz Comput 13(1):591–611. https://doi.org/10.1007/s12652-021-02920-8

    Article  Google Scholar 

  51. Ronchetti F, Quiroga F, Estrebou C, Lanzarini L, Rosete A (2016) LSA64: a dataset of Argentinian sign language. XX II Congreso Argentino de Ciencias de la Computación (CACIC)

  52. Ronchetti F, Quiroga F, Estrebou C, Lanzarini L, Rosete A (2016) LSA64: an Argentinian sign language dataset

  53. Sharma S, Kumar K (2021) ASL-3DCNN: American sign language recognition technique using 3-D convolutional neural networks. Multimed Tools Appl 80(17):26319–26331. https://doi.org/10.1007/s11042-021-10768-5

    Article  Google Scholar 

  54. Shoaib U, Ahmad N, Prinetto P, Tiotto G (2013) Integrating multiwordnet with Italian sign language lexical resources. Expert Syst Appl. https://doi.org/10.1016/j.eswa.2013.09.027

    Article  Google Scholar 

  55. Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. Preprint at http://arxiv.org/abs/1406.2199

  56. de Souza MFNS, Araújo AMB, Sandes LFF, Freitas DA, Soares WD, de Mello Vianna RS, de Sousa ÁAD (2017) Principais dificuldades e obstáculos enfrentados pela comunidade surda no acesso à saúde: uma revisão integrativa de literatura. Revista CEFAC 19(3)395–405. https://doi.org/10.1590/1982-0216201719317116

  57. Srinivasu PN, SivaSai JG, Ijaz MF, Bhoi AK, Kim W, Kang JJ (2021) Classification of skin disease using deep learning neural networks with mobilenet V2 and LSTM. Sensors 21(8). https://doi.org/10.3390/s21082852. https://www.mdpi.com/1424-8220/21/8/2852

  58. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2015) Rethinking the inception architecture for computer vision. Preprint at http://arxiv.org/abs/1512.00567

  59. Tran WT, Sadeghi-Naini A, Lu FI, Gandhi S, Meti N, Brackstone M, Rakovitch E, Curpen B (2021) Computational radiology in breast cancer screening and diagnosis using artificial intelligence. Can Assoc Radiol J 72(1):98–108

    Article  PubMed  Google Scholar 

  60. Vazquez-Enriquez M, Alba-Castro JL, Docio-Fernandez L, Rodriguez-Banga E (2021) Isolated sign language recognition with multi-scale spatial-temporal graph convolutional networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops. pp 3462–3471

  61. Wadhawan A, Kumar P (2020) Deep learning-based sign language recognition system for static signs. Neural Comput Appl 32(12):7957–7968. https://doi.org/10.1007/s00521-019-04691-y

    Article  Google Scholar 

  62. Wan J, Li SZ, Zhao Y, Zhou S, Guyon I, Escalera S (2016) Chalearn looking at people RGB-D isolated and continuous datasets for gesture recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). pp 761–769

  63. Wu J, Sun L, Jafari R (2016) A wearable system for recognizing American sign language in real-time using IMU and surface EMG sensors. IEEE J Biomed Health Inform 20(5):1281–1290

    Article  PubMed  Google Scholar 

  64. Wu Z, Wang X, Jiang YG, Ye H, Xue X (2015) Modeling spatial-temporal clues in a hybrid deep learning framework for video classification

  65. Yadav A, Verma D, Kumar A, Kumar P, Solanki P (2021) The perspectives of biomarker-based electrochemical immunosensors, artificial intelligence and the internet of medical things towardáCOVID-19 diagnosis and management. Mater Today Chem 20:100443

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  66. Ye H, Wu Z, Zhao RW, Wang X, Jiang YG, Xue X (2015) Evaluating two-stream CNN for video classification. In: Proceedings of the 5th ACM on International Conference on Multimedia Retrieval, ICMR ’15. Association for Computing Machinery, New York, NY, USA, pp 435–442. https://doi.org/10.1145/2671188.2749406

  67. Zhang L, Zhu G, Shen P, Song J, Shah SA, Bennamoun M (2017) Learning spatiotemporal features using 3DCNN and convolutional LSTM for gesture recognition. In: 2017 IEEE International Conference on Computer Vision Workshops (ICCVW). pp 3120–3128

Download references

Acknowledgements

This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001. We gratefully acknowledge NVIDIA Corporation’s support with the donation of a Quadro P6000 used for this research.

Funding

This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001.

Author information

Authors and Affiliations

Authors

Contributions

Diego R. B. da Silva and Tiago Maritan U. de Araújo conceived and designed the approach. Thaís Gaudencio do Rêgo contributed to the experimental design of the study. Manuella Aschoff Cavalcanti Brandão helped with data collection. The first draft of the manuscript was written by Diego R. B. da Silva. Tiago Maritan U. de Araújo and Luiz Marcos G. Gonçalves thoroughly corrected the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Diego R. B. da Silva.

Ethics declarations

Ethics approval

Not applicable.

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

da Silva, D.R.B., de Araújo, T.M.U., do Rêgo, T.G. et al. A multiple stream architecture for the recognition of signs in Brazilian sign language in the context of health. Multimed Tools Appl 83, 19767–19785 (2024). https://doi.org/10.1007/s11042-023-16332-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-16332-7

Keywords

Navigation