Abstract
This study investigates the impact of phonetization and phonetic segmentation of training corpora on the quality of HMM-based TTS synthesis. HMM-TTS requires phonetic symbols aligned to the speech corpus in order to train the models used for synthesis. Phonetic annotation is a complex task, since pronunciation usually differs from spelling, as well as differing among regional accents. In this paper, the infrastructure of a French TTS system is presented. A corpus whose phonetic label occurrences were systematically modified (number of schwas and liaisons) and label boundaries were displaced, was used to train several systems, one for each condition. A perceptual evaluation of the influence of labeling accuracy on synthetic speech quality was conducted. Despite the degree of annotation changes, the synthetic speech quality of the five best systems remained close to that of the reference system, built upon the corpus whose labels were manually corrected.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Anumanchipalli, G.K., Prahallad, K., Black, A.W.: Festvox: Tools for creation and analyses of large speech corpora. In: Workshop on Very Large Scale Phonetics Research, UPenn (2011)
Boersma, P., van Heuven, V.: Praat, a system for doing phonetics by computer. Glot Int. 5(9/10), 341–345 (2001)
Imai, S., Kobayashi, T., Tokuda, K., Masuko, T., Koishida, K., Sako, S., Zen, H.: Speech Signal Processing Toolkit (SPTK) (2009)
Imai, S.: Cepstral analysis synthesis on the mel frequency scale. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 1983, 8, pp. 93–96. IEEE (1983)
Jouvet, D., Fohr, D., Illina, I.: Evaluating grapheme-to-phoneme converters in automatic speech recognition context. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4821–4824. IEEE (2012)
Le Maguer, S., Barbot, N., Boeffard, O.: Evaluation of contextual descriptors for HMM-based speech synthesis in french. In: 8th International Speech Communication Association (ISCA) Speech Synthesis Workshop, pp. 153–158 (2013)
Prahallad, K., Black, A.W., Mosur, R.: Sub-phonetic modeling for capturing pronunciation variations for conversational speech synthesis. In: Proceedings of 2006 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1 (2006)
Rietveld, T., Van Hout, R.: Statistics in Language Research: Analysis of Variance. Walter de Gruyter, Berlin (2005)
Tokuda, K., Kobayashi, T., Masuko, T., Imai, S.: Mel-generalized cepstral analysis – a unified approach to speech spectral estimation. In: ICSLP (1994)
Tokuda, K., Nankaku, Y., Toda, T., Zen, H., Yamagishi, J., Oura, K.: Speech synthesis based on hidden Markov models. Proc. IEEE 101(5), 1234–1252 (2013)
Tokuda, K., Zen, H., Black, A.W.: An HMM-based speech synthesis system applied to English. In: Proceedings of 2002 IEEE Workshop on Speech Synthesis, pp. 227–230 (2002)
Woehrling, C., de Mareuil, B.: Identification d’accents regionaux en francais: perception et analyse. Revue Parole 37, 55 (2006)
Yvon, F., De Mareüil, P.B., Aubergé, V., Bagein, M., Bailly, G., Béchet, F., Foukia, S., Goldman, J.F., Keller, E., Pagel, V., et al.: Objective evaluation of grapheme to phoneme conversion for text-to-speech synthesis in french. Comput. Speech Lang. 12(4), 393–410 (1998)
Acknowledgments
This work is presented in the context of the “AND T-R” project (FUI-11 OSEO/DGCIS) granted by the région Ile-de-France and the conseil général de la Seine-Saint-Denis and the ville de Paris.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Evrard, M., Rilliard, A., d’Alessandro, C. (2015). Evaluation of the Impact of Corpus Phonetic Alignment on the HMM-Based Speech Synthesis Quality. In: Dediu, AH., Martín-Vide, C., Vicsi, K. (eds) Statistical Language and Speech Processing. SLSP 2015. Lecture Notes in Computer Science(), vol 9449. Springer, Cham. https://doi.org/10.1007/978-3-319-25789-1_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-25789-1_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25788-4
Online ISBN: 978-3-319-25789-1
eBook Packages: Computer ScienceComputer Science (R0)