DNN-based grapheme-to-phoneme conversion for Arabic text-to-speech synthesis

Hadj Ali, Ikbel; Mnasri, Zied; Lachiri, Zied

doi:10.1007/s10772-020-09750-7

DNN-based grapheme-to-phoneme conversion for Arabic text-to-speech synthesis

Published: 25 August 2020

Volume 23, pages 569–584, (2020)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

495 Accesses
10 Citations
Explore all metrics

Abstract

Arabic text-to-speech synthesis from non-diacritized text is still a big challenge, because of unique Arabic language rules and characteristics. Indeed, the diacritic and gemination signs, which are special characters representing respectively short vowels and consonant doubling, have a major effect on accurate pronunciation of Arabic. However these signs are often not mentioned in written texts, since most of Arab readers are used to guess them from the context. To tackle this issue, this paper presents a grapheme-to-phoneme conversion system for Arabic, which constitutes the text processing module of a deep neural networks (DNN)-based Arabic TTS systems. In the case of Arabic text, this step starts with predicting the diacritic and gemination signs. In this work, this step was fully realized based on DNN. Finally, the grapheme-to-phoneme conversion of the diacritized text was achieved using the Buckwalter code. In comparison to state-of-the-art approaches, the proposed system gives a higher accuracy rate either for all phonemes or for each class, and high precision, recall and F1 score for each class of diacritic signs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Arabic speech synthesis and diacritic recognition

Article 18 May 2016

A Transfer Learning End-to-End Arabic Text-To-Speech (TTS) Deep Architecture

Arabic grapheme-to-phoneme conversion based on joint multi-gram model

Article 02 January 2021

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Abandah, G., & Arabiyat, A. et al. (2017). Investigating hybrid approaches for Arabic text diacritization with recurrent neural networks. In 2017 IEEE Jordan conference on applied electrical engineering and computing technologies (AEECT) (pp. 1–6). IEEE
Abandah, G. A., Graves, A., Al-Shagoor, B., Arabiyat, A., Jamour, F., & Al-Taee, M. (2015). Automatic diacritization of Arabic text using recurrent neural networks. International Journal on Document Analysis and Recognition (IJDAR), 18(2), 183–197.
Article Google Scholar
Abbad, H., & Xiong, S. (2020). Multi-components system for automatic arabic diacritization. In European conference on information retrieval (pp. 341–355). Berlin: Springer.
Abdelali, A., Darwish, K., Durrani, N., & Mubarak, H. (2016). Farasa: A fast and furious segmenter for Arabic. In Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: Demonstrations (pp. 11–16).
Abdelmalek, R., & Mnasri, Z. (2016). High quality arabic text-to-speech synthesis using unit selection. In 2016 13th international multi-conference on systems, signals & devices (SSD) (pp. 1–5). IEEE
Ali, I. H., Mnasri, Z., & Lachiri, Z. (2018). Arabic character diacritization using DNN. ExLing, 2018, 49.
Google Scholar
Arabic speech corpus. Retrieved 5 Mar 2020, from http://en.arabicspeechcorpus.com/diacritiser.php.
Brownlee, J. (2017). Long short-term memory networks with python: Develop sequence prediction models with deep learning. Vermont: Machine Learning Mastery.
Google Scholar
Buckwalter, T. (2002). Arabic transliteration. http://www.qamus.org/transliteration.htm.
Chandrashekar, G., & Sahin, F. (2014). A survey on feature selection methods. Computers & Electrical Engineering, 40(1), 16–28.
Article Google Scholar
Darwish, K., Mubarak, H., & Abdelali, A. (2017). Arabic diacritization: Stats, rules, and hacks. In Proceedings of the third Arabic natural language processing workshop (pp. 9–17).
Elshafei, M., Al-Muhtaseb, H., & Alghamdi, M. (2006). Statistical methods for automatic diacritization of Arabic text. In The Saudi 18th national computer conference, Riyadh (Vol. 18, pp. 301–306).
Fadel, A., Tuffaha, I., & Al-Ayyoub M.et al. (2019). Arabic text diacritization using deep neural networks. In 2019 2nd international conference on computer applications & information security (ICCAIS) (pp. 1–7). IEEE
Fukui, R. (2004). Tipa manual. http://www.ctan.org/texarchive/fonts/tipa/tipaman.pdf.
Graves, A., & Schmidhuber, J. (2005). Framewise phoneme classification with bidirectional lstm and other neural network architectures. Neural Networks, 18(5–6), 602–610.
Article Google Scholar
Habash, N., & Rambow, O. (2007). Arabic diacritization through full morphological tagging. In Human language technologies 2007: The conference of the North American chapter of the association for computational linguistics; companion volume, short papers (pp. 53–56)
Houidhek, A., Colotte, V., Mnasri, Z., & Jouvet, D. (2018). DNN-based speech synthesis for arabic: modelling and evaluation. In International conference on statistical language and speech processing (pp. 9–20). Berlin: Springer
Houidhek, A., Colotte, V., Mnasri, Z., Jouvet, D., & Zangar, I. (2017). Statistical modelling of speech units in hmm-based speech synthesis for arabic.
Hunt, A. J., & Black, A. W. (1996). Unit selection in a concatenative speech synthesis system using a large speech database. In 1996 IEEE international conference on acoustics, speech, and signal processing conference proceedings (Vol. 1, pp. 373–376). IEEE
Introduction to speech processing, CSE TAMU 2017. Retrieved 5 Mar 2020, from http://courses.cs.tamu.edu/rgutier/csce630_f17/
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444.
Article Google Scholar
Maamouri, M., Bies, A., Buckwalter, T., & Mekki, W. (2004). The PENN Arabic treebank: Building a large-scale annotated Arabic corpus. In NEMLAR conference on Arabic language resources and tools, Cairo (Vol. 27, pp. 466–467)
Mnasri, Z., Boukadida, F., & Ellouze, N. (2005). Analyse/synthèse de parole par modélisation sinusoïdale et recouvrement addition. In SETIT.
Onaolapo, J., Idachaba, F., Badejo, J., Odu, T., & Adu, O. (2014). A simplified overview of text-to-speech synthesis.
Rajouani, A., Najim, M., Chiadmi, D., & Zyoute, M. (1987). Synthesis-by-rule of arabic language. In European conference on speech technology.
Rashwan, M., Al-Badrashiny, M., Attia, M., & Abdou, S. (2009). A hybrid system for automatic Arabic diacritization. In The 2nd international conference on Arabic language resources and tools (pp. 54–60).
Rashwan, M. A., Al-Badrashiny, M. A., Attia, M., Abdou, S. M., & Rafea, A. (2010). A stochastic Arabic diacritizer based on a hybrid of factorized and unfactorized textual features. IEEE Transactions on Audio, Speech, and Language Processing, 19(1), 166–175.
Article Google Scholar
Rebai, I., & BenAyed, Y. (2015). Text-to-speech synthesis system with arabic diacritic recognition system. Computer Speech & Language, 34(1), 43–60.
Article Google Scholar
Research Developement International (RDI). Retrieved 4 Aug 2010, from https://www.rdi-eg.com/.
Roth, R., Rambow, O., Habash, N., Diab, M., & Rudin, C. (2008). Arabic morphological tagging, diacritization, and lemmatization using lexeme models and feature ranking. In Proceedings of ACL-08: HLT, short papers (pp. 117–120).
Shamela Library. Retrieved 4 Aug 2010, from https://shamela.ws/.
Shen, J., Pang, R., Weiss, R. J., Schuster, M., Jaitly, N., Yang, Z., Chen, Z., Zhang, Y., Wang, Y., & Skerrv-Ryan, R. et al. (2018). Natural TTS synthesis by conditioning wavenet on MEL spectrogram predictions. In 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 4779–4783). IEEE
Taylor, P. (2009). Text-to-speech synthesis. Cambridge: Cambridge University Press.
Book Google Scholar
Test data. Retrieved 5 Mar 2020, from http://www.RDI-eg.com/RDI/TestData.
Tokuda, K., Zen, H., & Black, A. W. (2002). An hmm-based speech synthesis system applied to english. In IEEE speech synthesis workshop (pp. 227–230).
Training data. Retrieved 5 Mar 2020, from http://www.RDI-eg.com/RDI/.
Wells, J. C., Gibbon, D., Moore, R., & Winski, R. (1997). Handbook of standards and resources for spoken language systems. Mouton de Gruyter.
Zayyan, A. A., Elmahdy, M., binti Husni, H., & Al Ja’am, J. M. (2016). Automatic diacritics restoration for modern standard Arabic text. In 2016 IEEE symposium on computer applications & industrial electronics (ISCAIE) (pp. 221–225). IEEE.
Ze, H., Senior, A., & Schuster, M. (2013). Statistical parametric speech synthesis using deep neural networks. In 2013 IEEE international conference on acoustics, speech and signal processing (pp. 7962–7966). IEEE
Zen, H. (2006). An example of context-dependent label format for HMM-based speech synthesis in English, The HTS CMUARCTIC demo (Vol. 133)
Zitouni, I., & Sarikaya, R. (2009). Arabic diacritic restoration approach based on maximum entropy models. Computer Speech & Language, 23(3), 257–276.
Article Google Scholar

Download references

Funding

Funding was provided by Signal, Image and Technology of Information Laboratory, Electrical Engineering Department.

Author information

Authors and Affiliations

Signal, Image and Technology of Information Laboratory, Electrical Engineering Department, Ecole Nationale d’Ingénieurs de Tunis, University Tunis El-Manar, Tunis, Tunisia
Ikbel Hadj Ali
Signal, Image and Technology of Information Laboratory, Electrical Engineering Department, Ecole Nationale d’Ingénieurs de Tunis, University Tunis El-Manar, Tunis, Tunisia
Zied Mnasri & Zied Lachiri

Authors

Ikbel Hadj Ali
View author publications
You can also search for this author inPubMed Google Scholar
Zied Mnasri
View author publications
You can also search for this author inPubMed Google Scholar
Zied Lachiri
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Ikbel Hadj Ali.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hadj Ali, I., Mnasri, Z. & Lachiri, Z. DNN-based grapheme-to-phoneme conversion for Arabic text-to-speech synthesis. Int J Speech Technol 23, 569–584 (2020). https://doi.org/10.1007/s10772-020-09750-7

Download citation

Received: 09 March 2020
Accepted: 16 August 2020
Published: 25 August 2020
Issue Date: September 2020
DOI: https://doi.org/10.1007/s10772-020-09750-7

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

DNN-based grapheme-to-phoneme conversion for Arabic text-to-speech synthesis

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Arabic speech synthesis and diacritic recognition

A Transfer Learning End-to-End Arabic Text-To-Speech (TTS) Deep Architecture

Arabic grapheme-to-phoneme conversion based on joint multi-gram model

Explore related subjects

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now