Abstract
Phonetically rich and balanced speech corpora are essential components in state-of-the-art automatic speech recognition (ASR) and text-to-speech (TTS) systems. The written form of speech corpora must be prepared carefully to represent the richness and balance in the linguistic content. There is a lack of this type of spoken and written corpora for Standard Arabic (SA), and the only one available was prepared manually by expert linguists and phoneticians. In this work, we address the task of automatic preparation of written corpora with rich linguistic units. Our work depends on a comprehensive statistical linguistic study of SA based on automatic phonetic transcription of texts with more than 5 million words. We prepared two written corpora: the first corpus contains all allophones in SA with at least 3 occurrences of each allophone and 17 occurences of each phoneme. The second corpus contains, in addition to all allophones, 90.72% of diphones in SA.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Al Jazeera Website For Learning Arabic, in Arabic: “
”, March 2017. http://learning.aljazeera.net/Arabic
Diwan of Standard Arabic Poetry, in Arabic “
”, March 2017. http://www.aldiwan.net/poem.html?Word=%C7%E1%DF%C7%E3%E1&Find=meaning
Holy Bible, in Arabic: “
”, March 2017. http://ar.arabicbible.com/arabic-bible/word.html
Holy Quran, in Arabic: “
”, March 2017. http://www.holyquran.net/quran/index.html
Nahj al-Balagha, in Arabic: “
”, March 2017. http://ia600306.us.archive.org/7/items/98472389432/nhj-blagh-ali.pdf
Abushariah, M., Ainon, R., Zainuddin, R., Khalifa, O., Elshafei, M.: Phonetically rich and balanced arabic speech corpus: an overview. In: International Conference on Computer and Communication Engineering, pp. 1–6. IEEE, Kuala Lumpur (2010)
Alghamdi, M., Alhamid, A.H., Aldasuqi, M.M.: Database of Arabic sounds: sentences, in Arabic: “
”. Technical report, King Abdulaziz City of Science and Technology (KACST), Riyadh, Saudi Arabia (2003)
Bobzin, K.: Arabic Basic Course, in German: “Arabisch Grundkurs”. Harrassowitz Verlag, Wiesbaden (2009)
Gibbon, D., Moore, R., Winski, R.: Handbook of Standards and Resources for Spoken Language Systems. Mouton De Gruyter, Berlin (1997)
Matoušek, J., Romportl, J.: On building phonetically and prosodically rich speech corpus for text-to-speech synthesis. In: Proceedings of the Second IASTED International Conference on Computational Intelligence, pp. 442–447. ACTA Press, San Francisco (2006)
Sindran, F., Mualla, F., Haderlein, T., Daqrouq, K., Nöth, E.: Automatic phonetization-based statistical linguistic study of standard Arabic. Int. J. Comput. Linguist. (IJCL) 7, 38–53 (2016)
Sindran, F., Mualla, F., Haderlein, T., Daqrouq, K., Nöth, E.: Rule-based standard arabic phonetization at phoneme, allophone, and syllable level. Int. J. Comput. Linguist. (IJCL) 7, 23–37 (2016)
Cormen, Thomas H., Leiserson, Charles E., Rivest, Ronald L., Stein, Clifford: Introduction to Algorithms. The MIT Press, Massachusetts (2009)
Yuwan, R., Lestari, D.P.: Automatic extraction phonetically rich and balanced verses for speaker-dependent quranic speech recognition system. In: 14th International Conference of the Pacific Association for Computational Linguistics, pp. 65–75. Springer, Bali (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Sindran, F., Mualla, F., Haderlein, T., Daqrouq, K., Nöth, E. (2017). Automatic Preparation of Standard Arabic Phonetically Rich Written Corpora with Different Linguistic Units. In: Ekštein, K., Matoušek, V. (eds) Text, Speech, and Dialogue. TSD 2017. Lecture Notes in Computer Science(), vol 10415. Springer, Cham. https://doi.org/10.1007/978-3-319-64206-2_23
Download citation
DOI: https://doi.org/10.1007/978-3-319-64206-2_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-64205-5
Online ISBN: 978-3-319-64206-2
eBook Packages: Computer ScienceComputer Science (R0)