Abstract
The aim of this paper is to help the communication of two people, one hearing impaired and one visually impaired by converting speech to fingerspelling and fingerspelling to speech. Fingerspelling is a subset of sign language, and uses finger signs to spell letters of the spoken or written language. We aim to convert finger spelled words to speech and vice versa. Different spoken languages and sign languages such as English, Russian, Turkish and Czech are considered.
Similar content being viewed by others
References
http://www.cmpe.boun.edu.tr/pilab/pilabfiles/demos/enterface2010/Conversation.m4v (2010)
http://www.cmpe.boun.edu.tr/pilab/pilabfiles/demos/enterface2010/Avatar.mp4 (2010)
http://www.cmpe.boun.edu.tr/pilab/pilabfiles/demos/enterface2010/Integrated_System.wmv (2010)
Allen J, Xu R, Jin J (2004) Object tracking using camshift algorithm and multiple quantized feature spaces. In: Proceedings of the Pan-Sydney area workshop on visual information processing, vol 36. Australian Computer Society, Inc, pp 3–7
Aran O, Ari I, Akarun L, Dikici E, Parlak S, Saraclar M, Campr P, Hruz M (2008) Speech and sliding text aided sign retrieval from hearing impaired sign news videos. J Multimodal User Interfaces 2(2):117–131. http://www.springerlink.com/index/XX0443800N585126.pdf
Arısoy E, Can D, Parlak S, Sak H, Saraçlar M (2009) Turkish broadcast news transcription and retrieval. IEEE Trans Audio Speech Lang Process 17(5):874–883
Beutnagel M, Mohri M, Riley M (1999) Rapid unit selection from a large speech corpus for concatenative speech synthesis. In: ESCA, pp 607–610
Can D, Saraçlar M (2009) Turkish broadcast news transcription with open-source software. In: IEEE 17th signal processing and communications applications conference (SIU), pp 325–328
Chen F (2003) Hand gesture recognition using a real-time tracking method and hidden Markov models. Image Vis Comput 21(8):745–758. http://linkinghub.elsevier.com/retrieve/pii/S0262885603000702
Daniel T, Jiří K, Jindřich M (2010) Enhancements of viterbi search for fast unit selection synthesis, pp 174–177. http://www.kky.zcu.cz/en/publications/TihelkaDaniel_2010_Enhancementsof
Duarte K, Gibet S (2010) Heterogeneous data sources for signed language analysis and synthesis: the signcom project. In: LREC—language resources and evaluation
Dutoit T, Bozkurt B (2009) Speech Synthesis, 1st edn. Springer, New York, pp 557–585
Elliott R, Glauert JRW, Kennaway JR, Marshall I, Safar E (2008) Linguistic modelling and language-processing technologies for avatar-based sign language presentation. Univers Access Inf Soc 6:375–391
Exner D, Bruns E, Kurz D, Grundh A (2005) Fast and reliable CAMShift tracking
Ezzat T, Poggio T (1999) Visual speech synthesis by morphing visemes, pp 45–57
Fang Y, Cheng J, Wang K, Lu H (2007) Hand gesture recognition using fast multi-scale analysis. In: Fourth international conference on image and graphics (ICIG 2007), pp 694–698. http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=4297171
Fotinea SE, Efthimiou E, Caridakis G, Karpouzis K (2008) A knowledge-based sign synthesis architecture. Univers Access Inf Soc 6:405–418
Ganapathiraju A, Hamaker J, Picone J (2000) Hybrid svm/hmm architectures for speech recognition. In: Proceedings of speech transcription workshop, pp 504–507
Grůber M, Tihelka D (2010) Expressive speech synthesis for Czech limited domain dialogue system—basic experiments, vol 1, Institute of Electrical and Electronics Engineers, Beijing, pp 561–564. http://www.kky.zcu.cz/en/publications/GruberM_2010_ExpressiveSpeech
Hanzlíček Z (2010) Czech hmm-based speech synthesis. In: Text, speech and dialogue. Lecture notes in computer science, vol 6231. Springer, Berlin, pp 291–298. http://www.kky.zcu.cz/en/publications/ZdenekHanzlicek_2010_CzechHMM-Based
Heloir A, Kipp M (2010) Real-time animation of interactive agents: Specification and realization. Appl Artif Intell 24:510–529
Hoffmann R, Jokisch O, Lobanov B, Tsirulnik L, Shpilewsky E, Piurkowska B, Ronzhin A, Karpov A (2007) Slavonic TTS and STT conversion for let’s fly dialogue system. In: 12th international conference on speech and computer SPECOM, Moscow, Russia, pp 729–733
Hu M (1962) Visual pattern recognition by moment invariants. IRE Trans Inf Theory 8(2):179–187
Jing Z, Min Z (2010) Speech recognition system based improved DTW algorithm. In: Proceedings of the international conference on computer, mechatronics. Control and electronic engineering CMCE-2010, vol 5, pp 320–323
Kanis J, Krňoul Z (2008) Interactive HamNoSys notation editor for signed speech annotation. In: ELRA, pp 88–93
Karpov A, Ronzhin A, Markov KMZ (2010) Viseme-dependent weight optimization for CHMM-based audio-visual speech recognition. In: Interspeech-2010 proceedings. ISCA Association, Makuhari, pp 2678–2681
Kindiroglu AA, Yalcin H, Aran O, Hruz M, Campr P, Akarun L, Karpov A (2010) Multi-lingual fingerspelling recognition for handicapped kiosk. In: Pattern recognition and image analysis, St Petersburg, pp 33–37
Krňoul Z (2010) New features in synthesis of sign language addressing non-manual component. In: 4th workshop on representation and processing of sign languages: corpora and sign language technologies
Krňoul Z, Kanis J, Železný M, Müller L (2008) Czech text-to-sign speech synthesizer. Mach Learn Multimodal Interact 1, 180–191
Krňoul Z, Železný M (2004) Realistic face animation for a Czech Talking Head. Lecture notes in artificial intelligence, vol 3206, pp 603–610
Krňoul Z, Železný M (2007) Translation and conversion for Czech sign speech synthesis. Lecture notes in artificial intelligence, vol 4629, pp 524–531
Krňoul Z, Železný M, Müller L, Kanis J (2006) Training of coarticulation models using dominance functions and visual unit selection methods for audio-visual speech synthesis. In: Proceedings of INTERSPEECH 2006 - ICSLP. Bonn
Kuhl F, Giardina C (1982) Elliptic Fourier features of a closed contour. Comput Graph Image Process 18:236–258
Levenshtein V (1966) Binary codes capable of correcting deletions, insertions and reversals. Sov Phys Dokl 10(8):707–710
Lin C, Hwang C (1987) New forms of shape invariants from elliptic Fourier descriptors. Pattern Recognit 20:535–545
Liu R, Li Z, Jia J (2008) Image partial blur detection and classification. In: 2008 IEEE conference on computer vision and pattern recognition, pp 1–8
Liwicki S, Everingham M (2009) Automatic recognition of fingerspelled words in British Sign Language. In: 2009 IEEE computer society conference on computer vision and pattern recognition workshops (iv), pp 50–57
Lombardo V, Nunnari F, Damiano R (2010) A virtual interpreter for the Italian sign language. In: Proceedings of the 10th international conference on intelligent virtual agents IVA’10. Springer, Berlin, pp 201–207
Marnik J (2007) The Polish finger alphabet hand postures recognition using elastic graph matching. In: Computer recognition systems 2, pp 454–461. http://www.springerlink.com/index/J44W02H1J7U2NXG0.pdf
Matoušek J, Hanzlíček Z, Tihelka D, Méner M (2010) Automatic dubbing of tv programmes for the hearing impaired. In: Proceedings of IEEE 10th international conference on signal processing, vol 1. Institute of Electrical and Electronics Engineers, Beijing, pp 589–592. http://www.kky.zcu.cz/en/publications/MatousekJ_2010_AutomaticDubbingof
Mohri M, Pereira F, Riley M (2002) Weighted finite-state transducers in speech recognition. Comput Speech Lang 16(1):69–88
Nguyen TT, Binh ND, Bischof H (2008) An active boosting-based learning framework for real-time hand detection. In: 2008 8th IEEE international conference on automatic face & gesture recognition, pp 1–6
Ojala T (1996) A comparative study of texture measures with classification based on featured distributions. Pattern Recognit 29(1):51–59
Ojala T, Pietikainen M, Maenpaa T (2002) Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans Pattern Anal Mach Intell 24(7):971–987
Rabiner L, Juang BH (1993) Fundamentals of speech recognition. Prentice-Hall, Englewood Cliffs
Ronzhin A, Karpov A (2007) Russian voice interface. Pattern Recognit Image Anal 17(2):321–336. http://www.springerlink.com/content/376u33001458177p/?p=ad4e76356897411e90554fc1094cb60d&pi=3, pleiades publishing
Sak H, Güngör T, Saraçlar M (2010) Resources for Turkish morphological processing. Lang Resour Evaluation
Sak H, Saraçlar M, Güngör T (2010) On-the-fly lattice rescoring for real-time automatic speech recognition. In: Interspeech, Makahuri, Japan
Schnepp J, Wolfe R, McDonald JC (2010) Synthetic corpora: a synergy of linguistics and computer animation. In: Fourth workshop on the representation and processing of sign languages: corpora and sign language technologies
Schröder M, Trouvain J (2003) The German text-to-speech synthesis system MARY: A tool for research, development and teaching. Int J Speech Technol 6(4):365–377
Schwarz P, Matejka P, Cernocky J (2006) Hierarchical structures of neural networks for phoneme recognition. In: Proceedings of the IEEE international conference on acoustics, speech and signal processing ICASSP-2006, Toulouse, France
Solera-Urena R, Martn-Iglesias D, Gallardo-Antoln A, Pelaez-Moreno C, Daz-de Mara F (2007) Robust ASR using support vector machines. Speech Commun 49(4):253–267
Stephenson TA, Escofet J, Magimai.-Doss M, Bourlard H (2002) Dynamic Bayesian network based speech recognition with pitch and energy as auxiliary variables. Idiap-RR Idiap-RR-24-2002 (0 2002). In: IEEE international workshop on neural networks for signal processing NNSP-2002
Tort A (2003) Elliptical Fourier functions as a morphological descriptor of the genus Stenosarina (Brachiopoda, Terebratulida, New Caledonia). Math Geol 35(7):873–885
Trentin E, Gori M (2001) A survey of hybrid ANN/HMM models for automatic speech recognition. Neurocomputing 37(1–4):91–126
Vanaken C, Hermans C, Mertens T, Fiore FD, Bekaert P, Reeth FV (2008) Strike a pose: image-based pose synthesis. In: VMV, pp 131–138
Whittaker E (2000) Statistical language modelling for automatic speech recognition of Russian and English. PhD thesis, Cambridge University, Cambridge, UK
Yang M, Kpalma K, Ronsin J (2008) A survey of shape feature extraction techniques. Pattern Recognit. http://hal.archives-ouvertes.fr/hal-00446037/
Yörük E, Konukolu E, Sankur B, Darbon J (2006) Shape-based hand recognition. IEEE Trans Image Process 15(7):1803–1815
Young S, Evermann G, Gales M, Kershaw D, Moore G, Odell J, Ollason D, Povey D, Valtchev V, Woodland P (2006) The HTK book version 3.4
Young S (2008) HMMs and related speech recognition technologies. In: Springer handbook of speech processing. Springer, Berlin, pp 539–557
Zen H, Braunschweiler N, Buchholz S, Knill KĆSK, Latorre J (2010) HMM-based polyglot speech synthesis by speaker and language adaptive training. In: Proceedings of the 7th ISCA workshop on speech synthesis, pp 186–191
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Hrúz, M., Campr, P., Dikici, E. et al. Automatic fingersign-to-speech translation system. J Multimodal User Interfaces 4, 61–79 (2011). https://doi.org/10.1007/s12193-011-0059-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12193-011-0059-3