DNN-based grapheme-to-phoneme conversion for Arabic text-to-speech synthesis | International Journal of Speech Technology Skip to main content

Advertisement

Log in

DNN-based grapheme-to-phoneme conversion for Arabic text-to-speech synthesis

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

Arabic text-to-speech synthesis from non-diacritized text is still a big challenge, because of unique Arabic language rules and characteristics. Indeed, the diacritic and gemination signs, which are special characters representing respectively short vowels and consonant doubling, have a major effect on accurate pronunciation of Arabic. However these signs are often not mentioned in written texts, since most of Arab readers are used to guess them from the context. To tackle this issue, this paper presents a grapheme-to-phoneme conversion system for Arabic, which constitutes the text processing module of a deep neural networks (DNN)-based Arabic TTS systems. In the case of Arabic text, this step starts with predicting the diacritic and gemination signs. In this work, this step was fully realized based on DNN. Finally, the grapheme-to-phoneme conversion of the diacritized text was achieved using the Buckwalter code. In comparison to state-of-the-art approaches, the proposed system gives a higher accuracy rate either for all phonemes or for each class, and high precision, recall and F1 score for each class of diacritic signs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  • Abandah, G., & Arabiyat, A. et al. (2017). Investigating hybrid approaches for Arabic text diacritization with recurrent neural networks. In 2017 IEEE Jordan conference on applied electrical engineering and computing technologies (AEECT) (pp. 1–6). IEEE

  • Abandah, G. A., Graves, A., Al-Shagoor, B., Arabiyat, A., Jamour, F., & Al-Taee, M. (2015). Automatic diacritization of Arabic text using recurrent neural networks. International Journal on Document Analysis and Recognition (IJDAR), 18(2), 183–197.

    Article  Google Scholar 

  • Abbad, H., & Xiong, S. (2020). Multi-components system for automatic arabic diacritization. In European conference on information retrieval (pp. 341–355). Berlin: Springer.

  • Abdelali, A., Darwish, K., Durrani, N., & Mubarak, H. (2016). Farasa: A fast and furious segmenter for Arabic. In Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: Demonstrations (pp. 11–16).

  • Abdelmalek, R., & Mnasri, Z. (2016). High quality arabic text-to-speech synthesis using unit selection. In 2016 13th international multi-conference on systems, signals & devices (SSD) (pp. 1–5). IEEE

  • Ali, I. H., Mnasri, Z., & Lachiri, Z. (2018). Arabic character diacritization using DNN. ExLing, 2018, 49.

    Google Scholar 

  • Arabic speech corpus. Retrieved 5 Mar 2020, from http://en.arabicspeechcorpus.com/diacritiser.php.

  • Brownlee, J. (2017). Long short-term memory networks with python: Develop sequence prediction models with deep learning. Vermont: Machine Learning Mastery.

    Google Scholar 

  • Buckwalter, T. (2002). Arabic transliteration. http://www.qamus.org/transliteration.htm.

  • Chandrashekar, G., & Sahin, F. (2014). A survey on feature selection methods. Computers & Electrical Engineering, 40(1), 16–28.

    Article  Google Scholar 

  • Darwish, K., Mubarak, H., & Abdelali, A. (2017). Arabic diacritization: Stats, rules, and hacks. In Proceedings of the third Arabic natural language processing workshop (pp. 9–17).

  • Elshafei, M., Al-Muhtaseb, H., & Alghamdi, M. (2006). Statistical methods for automatic diacritization of Arabic text. In The Saudi 18th national computer conference, Riyadh (Vol. 18, pp. 301–306).

  • Fadel, A., Tuffaha, I., & Al-Ayyoub M.et al. (2019). Arabic text diacritization using deep neural networks. In 2019 2nd international conference on computer applications & information security (ICCAIS) (pp. 1–7). IEEE

  • Fukui, R. (2004). Tipa manual. http://www.ctan.org/texarchive/fonts/tipa/tipaman.pdf.

  • Graves, A., & Schmidhuber, J. (2005). Framewise phoneme classification with bidirectional lstm and other neural network architectures. Neural Networks, 18(5–6), 602–610.

    Article  Google Scholar 

  • Habash, N., & Rambow, O. (2007). Arabic diacritization through full morphological tagging. In Human language technologies 2007: The conference of the North American chapter of the association for computational linguistics; companion volume, short papers (pp. 53–56)

  • Houidhek, A., Colotte, V., Mnasri, Z., & Jouvet, D. (2018). DNN-based speech synthesis for arabic: modelling and evaluation. In International conference on statistical language and speech processing (pp. 9–20). Berlin: Springer

  • Houidhek, A., Colotte, V., Mnasri, Z., Jouvet, D., & Zangar, I. (2017). Statistical modelling of speech units in hmm-based speech synthesis for arabic.

  • Hunt, A. J., & Black, A. W. (1996). Unit selection in a concatenative speech synthesis system using a large speech database. In 1996 IEEE international conference on acoustics, speech, and signal processing conference proceedings (Vol. 1, pp. 373–376). IEEE

  • Introduction to speech processing, CSE TAMU 2017. Retrieved 5 Mar 2020, from http://courses.cs.tamu.edu/rgutier/csce630_f17/

  • LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444.

    Article  Google Scholar 

  • Maamouri, M., Bies, A., Buckwalter, T., & Mekki, W. (2004). The PENN Arabic treebank: Building a large-scale annotated Arabic corpus. In NEMLAR conference on Arabic language resources and tools, Cairo (Vol. 27, pp. 466–467)

  • Mnasri, Z., Boukadida, F., & Ellouze, N. (2005). Analyse/synthèse de parole par modélisation sinusoïdale et recouvrement addition. In SETIT.

  • Onaolapo, J., Idachaba, F., Badejo, J., Odu, T., & Adu, O. (2014). A simplified overview of text-to-speech synthesis.

  • Rajouani, A., Najim, M., Chiadmi, D., & Zyoute, M. (1987). Synthesis-by-rule of arabic language. In European conference on speech technology.

  • Rashwan, M., Al-Badrashiny, M., Attia, M., & Abdou, S. (2009). A hybrid system for automatic Arabic diacritization. In The 2nd international conference on Arabic language resources and tools (pp. 54–60).

  • Rashwan, M. A., Al-Badrashiny, M. A., Attia, M., Abdou, S. M., & Rafea, A. (2010). A stochastic Arabic diacritizer based on a hybrid of factorized and unfactorized textual features. IEEE Transactions on Audio, Speech, and Language Processing, 19(1), 166–175.

    Article  Google Scholar 

  • Rebai, I., & BenAyed, Y. (2015). Text-to-speech synthesis system with arabic diacritic recognition system. Computer Speech & Language, 34(1), 43–60.

    Article  Google Scholar 

  • Research Developement International (RDI). Retrieved 4 Aug 2010, from https://www.rdi-eg.com/.

  • Roth, R., Rambow, O., Habash, N., Diab, M., & Rudin, C. (2008). Arabic morphological tagging, diacritization, and lemmatization using lexeme models and feature ranking. In Proceedings of ACL-08: HLT, short papers (pp. 117–120).

  • Shamela Library. Retrieved 4 Aug 2010, from https://shamela.ws/.

  • Shen, J., Pang, R., Weiss, R. J., Schuster, M., Jaitly, N., Yang, Z., Chen, Z., Zhang, Y., Wang, Y., & Skerrv-Ryan, R. et al. (2018). Natural TTS synthesis by conditioning wavenet on MEL spectrogram predictions. In 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 4779–4783). IEEE

  • Taylor, P. (2009). Text-to-speech synthesis. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Test data. Retrieved 5 Mar 2020, from http://www.RDI-eg.com/RDI/TestData.

  • Tokuda, K., Zen, H., & Black, A. W. (2002). An hmm-based speech synthesis system applied to english. In IEEE speech synthesis workshop (pp. 227–230).

  • Training data. Retrieved 5 Mar 2020, from http://www.RDI-eg.com/RDI/.

  • Wells, J. C., Gibbon, D., Moore, R., & Winski, R. (1997). Handbook of standards and resources for spoken language systems. Mouton de Gruyter.

  • Zayyan, A. A., Elmahdy, M., binti Husni, H., & Al Ja’am, J. M. (2016). Automatic diacritics restoration for modern standard Arabic text. In 2016 IEEE symposium on computer applications & industrial electronics (ISCAIE) (pp. 221–225). IEEE.

  • Ze, H., Senior, A., & Schuster, M. (2013). Statistical parametric speech synthesis using deep neural networks. In 2013 IEEE international conference on acoustics, speech and signal processing (pp. 7962–7966). IEEE

  • Zen, H. (2006). An example of context-dependent label format for HMM-based speech synthesis in English, The HTS CMUARCTIC demo (Vol. 133)

  • Zitouni, I., & Sarikaya, R. (2009). Arabic diacritic restoration approach based on maximum entropy models. Computer Speech & Language, 23(3), 257–276.

    Article  Google Scholar 

Download references

Funding

Funding was provided by Signal, Image and Technology of Information Laboratory, Electrical Engineering Department.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ikbel Hadj Ali.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hadj Ali, I., Mnasri, Z. & Lachiri, Z. DNN-based grapheme-to-phoneme conversion for Arabic text-to-speech synthesis. Int J Speech Technol 23, 569–584 (2020). https://doi.org/10.1007/s10772-020-09750-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-020-09750-7

Keywords