Automatic fingersign-to-speech translation system

Hrúz, Marek; Campr, Pavel; Dikici, Erinç; Kındıroğlu, Ahmet Alp; Krňoul, Zdeněk; Ronzhin, Alexander; Sak, Haşim; Schorno, Daniel; Yalçın, Hülya; Akarun, Lale; Aran, Oya; Karpov, Alexey; Saraçlar, Murat; Železný, Milos

doi:10.1007/s12193-011-0059-3

Automatic fingersign-to-speech translation system

Original Paper
Published: 05 July 2011

Volume 4, pages 61–79, (2011)
Cite this article

Journal on Multimodal User Interfaces Aims and scope Submit manuscript

Marek Hrúz¹,
Pavel Campr¹,
Erinç Dikici²,
Ahmet Alp Kındıroğlu²,
Zdeněk Krňoul¹,
Alexander Ronzhin³,
Haşim Sak²,
Daniel Schorno⁵,
Hülya Yalçın²,
Lale Akarun²,
Oya Aran⁴,
Alexey Karpov³,
Murat Saraçlar² &
…
Milos Železný¹

243 Accesses
7 Citations
Explore all metrics

Abstract

The aim of this paper is to help the communication of two people, one hearing impaired and one visually impaired by converting speech to fingerspelling and fingerspelling to speech. Fingerspelling is a subset of sign language, and uses finger signs to spell letters of the spoken or written language. We aim to convert finger spelled words to speech and vice versa. Different spoken languages and sign languages such as English, Russian, Turkish and Czech are considered.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

American Sign Language Fingerspelling Recognition Using Wide Residual Networks

Hands in Harmony: Empowering Communication Through Translation

System Design and Implementation of Assistive Device for Hearing Impaired People

References

http://www.cmpe.boun.edu.tr/pilab/pilabfiles/demos/enterface2010/Conversation.m4v (2010)
http://www.cmpe.boun.edu.tr/pilab/pilabfiles/demos/enterface2010/Avatar.mp4 (2010)
http://www.cmpe.boun.edu.tr/pilab/pilabfiles/demos/enterface2010/Integrated_System.wmv (2010)
Allen J, Xu R, Jin J (2004) Object tracking using camshift algorithm and multiple quantized feature spaces. In: Proceedings of the Pan-Sydney area workshop on visual information processing, vol 36. Australian Computer Society, Inc, pp 3–7
Aran O, Ari I, Akarun L, Dikici E, Parlak S, Saraclar M, Campr P, Hruz M (2008) Speech and sliding text aided sign retrieval from hearing impaired sign news videos. J Multimodal User Interfaces 2(2):117–131. http://www.springerlink.com/index/XX0443800N585126.pdf
Article Google Scholar
Arısoy E, Can D, Parlak S, Sak H, Saraçlar M (2009) Turkish broadcast news transcription and retrieval. IEEE Trans Audio Speech Lang Process 17(5):874–883
Article Google Scholar
Beutnagel M, Mohri M, Riley M (1999) Rapid unit selection from a large speech corpus for concatenative speech synthesis. In: ESCA, pp 607–610
Can D, Saraçlar M (2009) Turkish broadcast news transcription with open-source software. In: IEEE 17th signal processing and communications applications conference (SIU), pp 325–328
Chapter Google Scholar
Chen F (2003) Hand gesture recognition using a real-time tracking method and hidden Markov models. Image Vis Comput 21(8):745–758. http://linkinghub.elsevier.com/retrieve/pii/S0262885603000702
Article Google Scholar
Daniel T, Jiří K, Jindřich M (2010) Enhancements of viterbi search for fast unit selection synthesis, pp 174–177. http://www.kky.zcu.cz/en/publications/TihelkaDaniel_2010_Enhancementsof
Duarte K, Gibet S (2010) Heterogeneous data sources for signed language analysis and synthesis: the signcom project. In: LREC—language resources and evaluation
Google Scholar
Dutoit T, Bozkurt B (2009) Speech Synthesis, 1st edn. Springer, New York, pp 557–585
Google Scholar
Elliott R, Glauert JRW, Kennaway JR, Marshall I, Safar E (2008) Linguistic modelling and language-processing technologies for avatar-based sign language presentation. Univers Access Inf Soc 6:375–391
Article Google Scholar
Exner D, Bruns E, Kurz D, Grundh A (2005) Fast and reliable CAMShift tracking
Google Scholar
Ezzat T, Poggio T (1999) Visual speech synthesis by morphing visemes, pp 45–57
Google Scholar
Fang Y, Cheng J, Wang K, Lu H (2007) Hand gesture recognition using fast multi-scale analysis. In: Fourth international conference on image and graphics (ICIG 2007), pp 694–698. http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=4297171
Chapter Google Scholar
Fotinea SE, Efthimiou E, Caridakis G, Karpouzis K (2008) A knowledge-based sign synthesis architecture. Univers Access Inf Soc 6:405–418
Article Google Scholar
Ganapathiraju A, Hamaker J, Picone J (2000) Hybrid svm/hmm architectures for speech recognition. In: Proceedings of speech transcription workshop, pp 504–507
Google Scholar
Grůber M, Tihelka D (2010) Expressive speech synthesis for Czech limited domain dialogue system—basic experiments, vol 1, Institute of Electrical and Electronics Engineers, Beijing, pp 561–564. http://www.kky.zcu.cz/en/publications/GruberM_2010_ExpressiveSpeech
Google Scholar
Hanzlíček Z (2010) Czech hmm-based speech synthesis. In: Text, speech and dialogue. Lecture notes in computer science, vol 6231. Springer, Berlin, pp 291–298. http://www.kky.zcu.cz/en/publications/ZdenekHanzlicek_2010_CzechHMM-Based
Chapter Google Scholar
Heloir A, Kipp M (2010) Real-time animation of interactive agents: Specification and realization. Appl Artif Intell 24:510–529
Article Google Scholar
Hoffmann R, Jokisch O, Lobanov B, Tsirulnik L, Shpilewsky E, Piurkowska B, Ronzhin A, Karpov A (2007) Slavonic TTS and STT conversion for let’s fly dialogue system. In: 12th international conference on speech and computer SPECOM, Moscow, Russia, pp 729–733
Google Scholar
Hu M (1962) Visual pattern recognition by moment invariants. IRE Trans Inf Theory 8(2):179–187
Article Google Scholar
Jing Z, Min Z (2010) Speech recognition system based improved DTW algorithm. In: Proceedings of the international conference on computer, mechatronics. Control and electronic engineering CMCE-2010, vol 5, pp 320–323
Google Scholar
Kanis J, Krňoul Z (2008) Interactive HamNoSys notation editor for signed speech annotation. In: ELRA, pp 88–93
Google Scholar
Karpov A, Ronzhin A, Markov KMZ (2010) Viseme-dependent weight optimization for CHMM-based audio-visual speech recognition. In: Interspeech-2010 proceedings. ISCA Association, Makuhari, pp 2678–2681
Google Scholar
Kindiroglu AA, Yalcin H, Aran O, Hruz M, Campr P, Akarun L, Karpov A (2010) Multi-lingual fingerspelling recognition for handicapped kiosk. In: Pattern recognition and image analysis, St Petersburg, pp 33–37
Google Scholar
Krňoul Z (2010) New features in synthesis of sign language addressing non-manual component. In: 4th workshop on representation and processing of sign languages: corpora and sign language technologies
Google Scholar
Krňoul Z, Kanis J, Železný M, Müller L (2008) Czech text-to-sign speech synthesizer. Mach Learn Multimodal Interact 1, 180–191
Google Scholar
Krňoul Z, Železný M (2004) Realistic face animation for a Czech Talking Head. Lecture notes in artificial intelligence, vol 3206, pp 603–610
Google Scholar
Krňoul Z, Železný M (2007) Translation and conversion for Czech sign speech synthesis. Lecture notes in artificial intelligence, vol 4629, pp 524–531
Google Scholar
Krňoul Z, Železný M, Müller L, Kanis J (2006) Training of coarticulation models using dominance functions and visual unit selection methods for audio-visual speech synthesis. In: Proceedings of INTERSPEECH 2006 - ICSLP. Bonn
Google Scholar
Kuhl F, Giardina C (1982) Elliptic Fourier features of a closed contour. Comput Graph Image Process 18:236–258
Article Google Scholar
Levenshtein V (1966) Binary codes capable of correcting deletions, insertions and reversals. Sov Phys Dokl 10(8):707–710
MathSciNet Google Scholar
Lin C, Hwang C (1987) New forms of shape invariants from elliptic Fourier descriptors. Pattern Recognit 20:535–545
Article Google Scholar
Liu R, Li Z, Jia J (2008) Image partial blur detection and classification. In: 2008 IEEE conference on computer vision and pattern recognition, pp 1–8
Chapter Google Scholar
Liwicki S, Everingham M (2009) Automatic recognition of fingerspelled words in British Sign Language. In: 2009 IEEE computer society conference on computer vision and pattern recognition workshops (iv), pp 50–57
Chapter Google Scholar
Lombardo V, Nunnari F, Damiano R (2010) A virtual interpreter for the Italian sign language. In: Proceedings of the 10th international conference on intelligent virtual agents IVA’10. Springer, Berlin, pp 201–207
Google Scholar
Marnik J (2007) The Polish finger alphabet hand postures recognition using elastic graph matching. In: Computer recognition systems 2, pp 454–461. http://www.springerlink.com/index/J44W02H1J7U2NXG0.pdf
Chapter Google Scholar
Matoušek J, Hanzlíček Z, Tihelka D, Méner M (2010) Automatic dubbing of tv programmes for the hearing impaired. In: Proceedings of IEEE 10th international conference on signal processing, vol 1. Institute of Electrical and Electronics Engineers, Beijing, pp 589–592. http://www.kky.zcu.cz/en/publications/MatousekJ_2010_AutomaticDubbingof
Chapter Google Scholar
Mohri M, Pereira F, Riley M (2002) Weighted finite-state transducers in speech recognition. Comput Speech Lang 16(1):69–88
Article Google Scholar
Nguyen TT, Binh ND, Bischof H (2008) An active boosting-based learning framework for real-time hand detection. In: 2008 8th IEEE international conference on automatic face & gesture recognition, pp 1–6
Google Scholar
Ojala T (1996) A comparative study of texture measures with classification based on featured distributions. Pattern Recognit 29(1):51–59
Article Google Scholar
Ojala T, Pietikainen M, Maenpaa T (2002) Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans Pattern Anal Mach Intell 24(7):971–987
Article Google Scholar
Rabiner L, Juang BH (1993) Fundamentals of speech recognition. Prentice-Hall, Englewood Cliffs
Google Scholar
Ronzhin A, Karpov A (2007) Russian voice interface. Pattern Recognit Image Anal 17(2):321–336. http://www.springerlink.com/content/376u33001458177p/?p=ad4e76356897411e90554fc1094cb60d&pi=3, pleiades publishing
Article Google Scholar
Sak H, Güngör T, Saraçlar M (2010) Resources for Turkish morphological processing. Lang Resour Evaluation
Sak H, Saraçlar M, Güngör T (2010) On-the-fly lattice rescoring for real-time automatic speech recognition. In: Interspeech, Makahuri, Japan
Google Scholar
Schnepp J, Wolfe R, McDonald JC (2010) Synthetic corpora: a synergy of linguistics and computer animation. In: Fourth workshop on the representation and processing of sign languages: corpora and sign language technologies
Google Scholar
Schröder M, Trouvain J (2003) The German text-to-speech synthesis system MARY: A tool for research, development and teaching. Int J Speech Technol 6(4):365–377
Article Google Scholar
Schwarz P, Matejka P, Cernocky J (2006) Hierarchical structures of neural networks for phoneme recognition. In: Proceedings of the IEEE international conference on acoustics, speech and signal processing ICASSP-2006, Toulouse, France
Google Scholar
Solera-Urena R, Martn-Iglesias D, Gallardo-Antoln A, Pelaez-Moreno C, Daz-de Mara F (2007) Robust ASR using support vector machines. Speech Commun 49(4):253–267
Article Google Scholar
Stephenson TA, Escofet J, Magimai.-Doss M, Bourlard H (2002) Dynamic Bayesian network based speech recognition with pitch and energy as auxiliary variables. Idiap-RR Idiap-RR-24-2002 (0 2002). In: IEEE international workshop on neural networks for signal processing NNSP-2002
Google Scholar
Tort A (2003) Elliptical Fourier functions as a morphological descriptor of the genus Stenosarina (Brachiopoda, Terebratulida, New Caledonia). Math Geol 35(7):873–885
Article Google Scholar
Trentin E, Gori M (2001) A survey of hybrid ANN/HMM models for automatic speech recognition. Neurocomputing 37(1–4):91–126
Article MATH Google Scholar
Vanaken C, Hermans C, Mertens T, Fiore FD, Bekaert P, Reeth FV (2008) Strike a pose: image-based pose synthesis. In: VMV, pp 131–138
Google Scholar
Whittaker E (2000) Statistical language modelling for automatic speech recognition of Russian and English. PhD thesis, Cambridge University, Cambridge, UK
Yang M, Kpalma K, Ronsin J (2008) A survey of shape feature extraction techniques. Pattern Recognit. http://hal.archives-ouvertes.fr/hal-00446037/
Yörük E, Konukolu E, Sankur B, Darbon J (2006) Shape-based hand recognition. IEEE Trans Image Process 15(7):1803–1815
Article Google Scholar
Young S, Evermann G, Gales M, Kershaw D, Moore G, Odell J, Ollason D, Povey D, Valtchev V, Woodland P (2006) The HTK book version 3.4
Young S (2008) HMMs and related speech recognition technologies. In: Springer handbook of speech processing. Springer, Berlin, pp 539–557
Chapter Google Scholar
Zen H, Braunschweiler N, Buchholz S, Knill KĆSK, Latorre J (2010) HMM-based polyglot speech synthesis by speaker and language adaptive training. In: Proceedings of the 7th ISCA workshop on speech synthesis, pp 186–191
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Applied Sciences, University of West Bohemia, Pilsen, Czech Republic
Marek Hrúz, Pavel Campr, Zdeněk Krňoul & Milos Železný
Bogazici University, Istanbul, Turkey
Erinç Dikici, Ahmet Alp Kındıroğlu, Haşim Sak, Hülya Yalçın, Lale Akarun & Murat Saraçlar
SPIIRAS Institute, St. Petersburg, Russia
Alexander Ronzhin & Alexey Karpov
Idiap Research Institute, Martigny, Switzerland
Oya Aran
STEIM, Amsterdam, Netherlands
Daniel Schorno

Authors

Marek Hrúz
View author publications
You can also search for this author in PubMed Google Scholar
Pavel Campr
View author publications
You can also search for this author in PubMed Google Scholar
Erinç Dikici
View author publications
You can also search for this author in PubMed Google Scholar
Ahmet Alp Kındıroğlu
View author publications
You can also search for this author in PubMed Google Scholar
Zdeněk Krňoul
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Ronzhin
View author publications
You can also search for this author in PubMed Google Scholar
Haşim Sak
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Schorno
View author publications
You can also search for this author in PubMed Google Scholar
Hülya Yalçın
View author publications
You can also search for this author in PubMed Google Scholar
Lale Akarun
View author publications
You can also search for this author in PubMed Google Scholar
Oya Aran
View author publications
You can also search for this author in PubMed Google Scholar
Alexey Karpov
View author publications
You can also search for this author in PubMed Google Scholar
Murat Saraçlar
View author publications
You can also search for this author in PubMed Google Scholar
Milos Železný
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marek Hrúz.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hrúz, M., Campr, P., Dikici, E. et al. Automatic fingersign-to-speech translation system. J Multimodal User Interfaces 4, 61–79 (2011). https://doi.org/10.1007/s12193-011-0059-3

Download citation

Received: 04 February 2011
Accepted: 29 April 2011
Published: 05 July 2011
Issue Date: July 2011
DOI: https://doi.org/10.1007/s12193-011-0059-3

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Automatic fingersign-to-speech translation system

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

American Sign Language Fingerspelling Recognition Using Wide Residual Networks

Hands in Harmony: Empowering Communication Through Translation

System Design and Implementation of Assistive Device for Hearing Impaired People

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Automatic fingersign-to-speech translation system

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

American Sign Language Fingerspelling Recognition Using Wide Residual Networks

Hands in Harmony: Empowering Communication Through Translation

System Design and Implementation of Assistive Device for Hearing Impaired People

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation