Automatic fingersign-to-speech translation system | Journal on Multimodal User Interfaces Skip to main content
Log in

Automatic fingersign-to-speech translation system

  • Original Paper
  • Published:
Journal on Multimodal User Interfaces Aims and scope Submit manuscript

Abstract

The aim of this paper is to help the communication of two people, one hearing impaired and one visually impaired by converting speech to fingerspelling and fingerspelling to speech. Fingerspelling is a subset of sign language, and uses finger signs to spell letters of the spoken or written language. We aim to convert finger spelled words to speech and vice versa. Different spoken languages and sign languages such as English, Russian, Turkish and Czech are considered.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. http://www.cmpe.boun.edu.tr/pilab/pilabfiles/demos/enterface2010/Conversation.m4v (2010)

  2. http://www.cmpe.boun.edu.tr/pilab/pilabfiles/demos/enterface2010/Avatar.mp4 (2010)

  3. http://www.cmpe.boun.edu.tr/pilab/pilabfiles/demos/enterface2010/Integrated_System.wmv (2010)

  4. Allen J, Xu R, Jin J (2004) Object tracking using camshift algorithm and multiple quantized feature spaces. In: Proceedings of the Pan-Sydney area workshop on visual information processing, vol 36. Australian Computer Society, Inc, pp 3–7

  5. Aran O, Ari I, Akarun L, Dikici E, Parlak S, Saraclar M, Campr P, Hruz M (2008) Speech and sliding text aided sign retrieval from hearing impaired sign news videos. J Multimodal User Interfaces 2(2):117–131. http://www.springerlink.com/index/XX0443800N585126.pdf

    Article  Google Scholar 

  6. Arısoy E, Can D, Parlak S, Sak H, Saraçlar M (2009) Turkish broadcast news transcription and retrieval. IEEE Trans Audio Speech Lang Process 17(5):874–883

    Article  Google Scholar 

  7. Beutnagel M, Mohri M, Riley M (1999) Rapid unit selection from a large speech corpus for concatenative speech synthesis. In: ESCA, pp 607–610

  8. Can D, Saraçlar M (2009) Turkish broadcast news transcription with open-source software. In: IEEE 17th signal processing and communications applications conference (SIU), pp 325–328

    Chapter  Google Scholar 

  9. Chen F (2003) Hand gesture recognition using a real-time tracking method and hidden Markov models. Image Vis Comput 21(8):745–758. http://linkinghub.elsevier.com/retrieve/pii/S0262885603000702

    Article  Google Scholar 

  10. Daniel T, Jiří K, Jindřich M (2010) Enhancements of viterbi search for fast unit selection synthesis, pp 174–177. http://www.kky.zcu.cz/en/publications/TihelkaDaniel_2010_Enhancementsof

  11. Duarte K, Gibet S (2010) Heterogeneous data sources for signed language analysis and synthesis: the signcom project. In: LREC—language resources and evaluation

    Google Scholar 

  12. Dutoit T, Bozkurt B (2009) Speech Synthesis, 1st edn. Springer, New York, pp 557–585

    Google Scholar 

  13. Elliott R, Glauert JRW, Kennaway JR, Marshall I, Safar E (2008) Linguistic modelling and language-processing technologies for avatar-based sign language presentation. Univers Access Inf Soc 6:375–391

    Article  Google Scholar 

  14. Exner D, Bruns E, Kurz D, Grundh A (2005) Fast and reliable CAMShift tracking

    Google Scholar 

  15. Ezzat T, Poggio T (1999) Visual speech synthesis by morphing visemes, pp 45–57

    Google Scholar 

  16. Fang Y, Cheng J, Wang K, Lu H (2007) Hand gesture recognition using fast multi-scale analysis. In: Fourth international conference on image and graphics (ICIG 2007), pp 694–698. http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=4297171

    Chapter  Google Scholar 

  17. Fotinea SE, Efthimiou E, Caridakis G, Karpouzis K (2008) A knowledge-based sign synthesis architecture. Univers Access Inf Soc 6:405–418

    Article  Google Scholar 

  18. Ganapathiraju A, Hamaker J, Picone J (2000) Hybrid svm/hmm architectures for speech recognition. In: Proceedings of speech transcription workshop, pp 504–507

    Google Scholar 

  19. Grůber M, Tihelka D (2010) Expressive speech synthesis for Czech limited domain dialogue system—basic experiments, vol 1, Institute of Electrical and Electronics Engineers, Beijing, pp 561–564. http://www.kky.zcu.cz/en/publications/GruberM_2010_ExpressiveSpeech

    Google Scholar 

  20. Hanzlíček Z (2010) Czech hmm-based speech synthesis. In: Text, speech and dialogue. Lecture notes in computer science, vol 6231. Springer, Berlin, pp 291–298. http://www.kky.zcu.cz/en/publications/ZdenekHanzlicek_2010_CzechHMM-Based

    Chapter  Google Scholar 

  21. Heloir A, Kipp M (2010) Real-time animation of interactive agents: Specification and realization. Appl Artif Intell 24:510–529

    Article  Google Scholar 

  22. Hoffmann R, Jokisch O, Lobanov B, Tsirulnik L, Shpilewsky E, Piurkowska B, Ronzhin A, Karpov A (2007) Slavonic TTS and STT conversion for let’s fly dialogue system. In: 12th international conference on speech and computer SPECOM, Moscow, Russia, pp 729–733

    Google Scholar 

  23. Hu M (1962) Visual pattern recognition by moment invariants. IRE Trans Inf Theory 8(2):179–187

    Article  Google Scholar 

  24. Jing Z, Min Z (2010) Speech recognition system based improved DTW algorithm. In: Proceedings of the international conference on computer, mechatronics. Control and electronic engineering CMCE-2010, vol 5, pp 320–323

    Google Scholar 

  25. Kanis J, Krňoul Z (2008) Interactive HamNoSys notation editor for signed speech annotation. In: ELRA, pp 88–93

    Google Scholar 

  26. Karpov A, Ronzhin A, Markov KMZ (2010) Viseme-dependent weight optimization for CHMM-based audio-visual speech recognition. In: Interspeech-2010 proceedings. ISCA Association, Makuhari, pp 2678–2681

    Google Scholar 

  27. Kindiroglu AA, Yalcin H, Aran O, Hruz M, Campr P, Akarun L, Karpov A (2010) Multi-lingual fingerspelling recognition for handicapped kiosk. In: Pattern recognition and image analysis, St Petersburg, pp 33–37

    Google Scholar 

  28. Krňoul Z (2010) New features in synthesis of sign language addressing non-manual component. In: 4th workshop on representation and processing of sign languages: corpora and sign language technologies

    Google Scholar 

  29. Krňoul Z, Kanis J, Železný M, Müller L (2008) Czech text-to-sign speech synthesizer. Mach Learn Multimodal Interact 1, 180–191

    Google Scholar 

  30. Krňoul Z, Železný M (2004) Realistic face animation for a Czech Talking Head. Lecture notes in artificial intelligence, vol 3206, pp 603–610

    Google Scholar 

  31. Krňoul Z, Železný M (2007) Translation and conversion for Czech sign speech synthesis. Lecture notes in artificial intelligence, vol 4629, pp 524–531

    Google Scholar 

  32. Krňoul Z, Železný M, Müller L, Kanis J (2006) Training of coarticulation models using dominance functions and visual unit selection methods for audio-visual speech synthesis. In: Proceedings of INTERSPEECH 2006 - ICSLP. Bonn

    Google Scholar 

  33. Kuhl F, Giardina C (1982) Elliptic Fourier features of a closed contour. Comput Graph Image Process 18:236–258

    Article  Google Scholar 

  34. Levenshtein V (1966) Binary codes capable of correcting deletions, insertions and reversals. Sov Phys Dokl 10(8):707–710

    MathSciNet  Google Scholar 

  35. Lin C, Hwang C (1987) New forms of shape invariants from elliptic Fourier descriptors. Pattern Recognit 20:535–545

    Article  Google Scholar 

  36. Liu R, Li Z, Jia J (2008) Image partial blur detection and classification. In: 2008 IEEE conference on computer vision and pattern recognition, pp 1–8

    Chapter  Google Scholar 

  37. Liwicki S, Everingham M (2009) Automatic recognition of fingerspelled words in British Sign Language. In: 2009 IEEE computer society conference on computer vision and pattern recognition workshops (iv), pp 50–57

    Chapter  Google Scholar 

  38. Lombardo V, Nunnari F, Damiano R (2010) A virtual interpreter for the Italian sign language. In: Proceedings of the 10th international conference on intelligent virtual agents IVA’10. Springer, Berlin, pp 201–207

    Google Scholar 

  39. Marnik J (2007) The Polish finger alphabet hand postures recognition using elastic graph matching. In: Computer recognition systems 2, pp 454–461. http://www.springerlink.com/index/J44W02H1J7U2NXG0.pdf

    Chapter  Google Scholar 

  40. Matoušek J, Hanzlíček Z, Tihelka D, Méner M (2010) Automatic dubbing of tv programmes for the hearing impaired. In: Proceedings of IEEE 10th international conference on signal processing, vol 1. Institute of Electrical and Electronics Engineers, Beijing, pp 589–592. http://www.kky.zcu.cz/en/publications/MatousekJ_2010_AutomaticDubbingof

    Chapter  Google Scholar 

  41. Mohri M, Pereira F, Riley M (2002) Weighted finite-state transducers in speech recognition. Comput Speech Lang 16(1):69–88

    Article  Google Scholar 

  42. Nguyen TT, Binh ND, Bischof H (2008) An active boosting-based learning framework for real-time hand detection. In: 2008 8th IEEE international conference on automatic face & gesture recognition, pp 1–6

    Google Scholar 

  43. Ojala T (1996) A comparative study of texture measures with classification based on featured distributions. Pattern Recognit 29(1):51–59

    Article  Google Scholar 

  44. Ojala T, Pietikainen M, Maenpaa T (2002) Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans Pattern Anal Mach Intell 24(7):971–987

    Article  Google Scholar 

  45. Rabiner L, Juang BH (1993) Fundamentals of speech recognition. Prentice-Hall, Englewood Cliffs

    Google Scholar 

  46. Ronzhin A, Karpov A (2007) Russian voice interface. Pattern Recognit Image Anal 17(2):321–336. http://www.springerlink.com/content/376u33001458177p/?p=ad4e76356897411e90554fc1094cb60d&pi=3, pleiades publishing

    Article  Google Scholar 

  47. Sak H, Güngör T, Saraçlar M (2010) Resources for Turkish morphological processing. Lang Resour Evaluation

  48. Sak H, Saraçlar M, Güngör T (2010) On-the-fly lattice rescoring for real-time automatic speech recognition. In: Interspeech, Makahuri, Japan

    Google Scholar 

  49. Schnepp J, Wolfe R, McDonald JC (2010) Synthetic corpora: a synergy of linguistics and computer animation. In: Fourth workshop on the representation and processing of sign languages: corpora and sign language technologies

    Google Scholar 

  50. Schröder M, Trouvain J (2003) The German text-to-speech synthesis system MARY: A tool for research, development and teaching. Int J Speech Technol 6(4):365–377

    Article  Google Scholar 

  51. Schwarz P, Matejka P, Cernocky J (2006) Hierarchical structures of neural networks for phoneme recognition. In: Proceedings of the IEEE international conference on acoustics, speech and signal processing ICASSP-2006, Toulouse, France

    Google Scholar 

  52. Solera-Urena R, Martn-Iglesias D, Gallardo-Antoln A, Pelaez-Moreno C, Daz-de Mara F (2007) Robust ASR using support vector machines. Speech Commun 49(4):253–267

    Article  Google Scholar 

  53. Stephenson TA, Escofet J, Magimai.-Doss M, Bourlard H (2002) Dynamic Bayesian network based speech recognition with pitch and energy as auxiliary variables. Idiap-RR Idiap-RR-24-2002 (0 2002). In: IEEE international workshop on neural networks for signal processing NNSP-2002

    Google Scholar 

  54. Tort A (2003) Elliptical Fourier functions as a morphological descriptor of the genus Stenosarina (Brachiopoda, Terebratulida, New Caledonia). Math Geol 35(7):873–885

    Article  Google Scholar 

  55. Trentin E, Gori M (2001) A survey of hybrid ANN/HMM models for automatic speech recognition. Neurocomputing 37(1–4):91–126

    Article  MATH  Google Scholar 

  56. Vanaken C, Hermans C, Mertens T, Fiore FD, Bekaert P, Reeth FV (2008) Strike a pose: image-based pose synthesis. In: VMV, pp 131–138

    Google Scholar 

  57. Whittaker E (2000) Statistical language modelling for automatic speech recognition of Russian and English. PhD thesis, Cambridge University, Cambridge, UK

  58. Yang M, Kpalma K, Ronsin J (2008) A survey of shape feature extraction techniques. Pattern Recognit. http://hal.archives-ouvertes.fr/hal-00446037/

  59. Yörük E, Konukolu E, Sankur B, Darbon J (2006) Shape-based hand recognition. IEEE Trans Image Process 15(7):1803–1815

    Article  Google Scholar 

  60. Young S, Evermann G, Gales M, Kershaw D, Moore G, Odell J, Ollason D, Povey D, Valtchev V, Woodland P (2006) The HTK book version 3.4

  61. Young S (2008) HMMs and related speech recognition technologies. In: Springer handbook of speech processing. Springer, Berlin, pp 539–557

    Chapter  Google Scholar 

  62. Zen H, Braunschweiler N, Buchholz S, Knill KĆSK, Latorre J (2010) HMM-based polyglot speech synthesis by speaker and language adaptive training. In: Proceedings of the 7th ISCA workshop on speech synthesis, pp 186–191

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marek Hrúz.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hrúz, M., Campr, P., Dikici, E. et al. Automatic fingersign-to-speech translation system. J Multimodal User Interfaces 4, 61–79 (2011). https://doi.org/10.1007/s12193-011-0059-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12193-011-0059-3

Keywords

Navigation