Abstract
This paper presents a new approach to unit selection for corpus-based speech synthesis, in which the units are selected according to acoustic criteria. In a training stage, an acoustic clustering is carried out using context dependent HMMs. In the synthesis stage, an acoustic target is generated and divided into segments corresponding to the required unit sequence. Then, the acoustic unit sequence that best matches the target is selected. Tests are carried out which show the relevance of the proposed method.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Black, A.W., Campbell, N.: Optimising Selection of Units from Speech Database for Concatenative Synthesis. In: Proc. Eurospeech, Madrid, pp. 581–584 (1995)
Blouin, C., Rosec, O., Bagshaw, P.C., d’Alessandro, C.: Concatenation Cost Calculation and Optimization for Unit Selection in TTS. In: IEEEWorkshop on Speech Synthesis, SantaMonica CA, USA (2002)
Campbell, N., Isard, S.D.: Segment Durations in a Syllable Frame. Journal of Phonetics, 19 Special issue on Speech Synthesis, 37–47 (1991)
De Tournemire, S.: Identification et Génération Automatique de Contours Prosodiques pour la Synthése Vocale à Partir du Texte en Franca̧is. PhD. Thesis, Ecole Nationale Supérieure des Télécommunication, Paris (1998)
Donovan, R.E.: Trainable Speech synthesis. PhD. Thesis, Cambridge University Engineering Department (1996)
Donovan, R.E., et al.: Current Status of the IBM Trainable Speech Synthesis System. In: Proc. 4th ESCA Tutorial and Research Workshop on Speech Synthesis, Scotland, UK (2001)
Donovan, R.E.: A new distance measure for costing spectral discontinuities in concatenative speech synthesisers. In: The 4th ISCA Tutorial and Research Workshop on Speech Synthesis (2001)
Eide, E., Aron, A., Bakis, R., Cohen, P., Donovan, R., Hamza, W., Mathes, T., Picheny, M., Smith, M., Viswanathan, M.: Recent Improvements to the IBM Trainable Speech Synthesis System. In: Proc ICASSP, Hong Kong, China (2003)
Huang, X., Acero, A., Ju, Y., Liu, J., Meredith, S., Plumpe, M.: Recent Improvements on Microsoft’s Trainable Text-To-Speech System - Whistler. In: Proc. ICASSP, Munich, Germany, pp. 959–962 (1997)
Moulines, E., Charpentier, F.: Pitch-SynchronousWaveform Processing Techniques for Textto- Speech Synthesis Using. Speech Communication 9, 453–467 (1990)
Odell, J.J.: The Use of Context in Large Vocabulary Speech Recognition. PhD. Thesis, Queen’s College (March 1995)
Sakoe, H., Chiba, S.: A Dynamic Programming Algorithm Optimization for Spoken Word Recognition. IEEE Transactions on Acoustics, Speech and Signal Processing ASSP–26(1), 43–49 (1978)
Pierrehumbert, J.: The Phonology and Phonetics of English Intonation. PhD. Thesis, MIT, Boston (1980)
Toda, T., Kawai, H., Tsuzaki, M.: Optimizing Sub-Cost Functions for Segment Selection Based on Perceptual Evaluations in Concatenative Speech Synthesis. In: Proc. ICASSP, Montreal, Quebec, Canada, pp. 657–660 (2004)
Tokuda, K., Masuko, T., Yamada, T., Kobayashi, T., Imai, S.: An Algorithm for Speech Parameters Generation from Continuous Mixture HMMs with Dynamic Features. In: Proc. Eurospeech, pp. 757–760 (1995)
Tokuda, K., Zen, H., Black, A.: An HMM-based Speech Synthesis Applied to English. In: Proc. of IEEEWorkshop on Speech Synthesis, Santa Monica (September 2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Rouibia, S., Rosec, O., Moudenc, T. (2005). Unit Selection for Speech Synthesis Based on Acoustic Criteria. In: Matoušek, V., Mautner, P., Pavelka, T. (eds) Text, Speech and Dialogue. TSD 2005. Lecture Notes in Computer Science(), vol 3658. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11551874_36
Download citation
DOI: https://doi.org/10.1007/11551874_36
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28789-6
Online ISBN: 978-3-540-31817-0
eBook Packages: Computer ScienceComputer Science (R0)