Unit Selection for Speech Synthesis Based on Acoustic Criteria | SpringerLink
Skip to main content

Unit Selection for Speech Synthesis Based on Acoustic Criteria

  • Conference paper
Text, Speech and Dialogue (TSD 2005)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3658))

Included in the following conference series:

Abstract

This paper presents a new approach to unit selection for corpus-based speech synthesis, in which the units are selected according to acoustic criteria. In a training stage, an acoustic clustering is carried out using context dependent HMMs. In the synthesis stage, an acoustic target is generated and divided into segments corresponding to the required unit sequence. Then, the acoustic unit sequence that best matches the target is selected. Tests are carried out which show the relevance of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Black, A.W., Campbell, N.: Optimising Selection of Units from Speech Database for Concatenative Synthesis. In: Proc. Eurospeech, Madrid, pp. 581–584 (1995)

    Google Scholar 

  2. Blouin, C., Rosec, O., Bagshaw, P.C., d’Alessandro, C.: Concatenation Cost Calculation and Optimization for Unit Selection in TTS. In: IEEEWorkshop on Speech Synthesis, SantaMonica CA, USA (2002)

    Google Scholar 

  3. Campbell, N., Isard, S.D.: Segment Durations in a Syllable Frame. Journal of Phonetics, 19 Special issue on Speech Synthesis, 37–47 (1991)

    Google Scholar 

  4. De Tournemire, S.: Identification et Génération Automatique de Contours Prosodiques pour la Synthése Vocale à Partir du Texte en Franca̧is. PhD. Thesis, Ecole Nationale Supérieure des Télécommunication, Paris (1998)

    Google Scholar 

  5. Donovan, R.E.: Trainable Speech synthesis. PhD. Thesis, Cambridge University Engineering Department (1996)

    Google Scholar 

  6. Donovan, R.E., et al.: Current Status of the IBM Trainable Speech Synthesis System. In: Proc. 4th ESCA Tutorial and Research Workshop on Speech Synthesis, Scotland, UK (2001)

    Google Scholar 

  7. Donovan, R.E.: A new distance measure for costing spectral discontinuities in concatenative speech synthesisers. In: The 4th ISCA Tutorial and Research Workshop on Speech Synthesis (2001)

    Google Scholar 

  8. Eide, E., Aron, A., Bakis, R., Cohen, P., Donovan, R., Hamza, W., Mathes, T., Picheny, M., Smith, M., Viswanathan, M.: Recent Improvements to the IBM Trainable Speech Synthesis System. In: Proc ICASSP, Hong Kong, China (2003)

    Google Scholar 

  9. Huang, X., Acero, A., Ju, Y., Liu, J., Meredith, S., Plumpe, M.: Recent Improvements on Microsoft’s Trainable Text-To-Speech System - Whistler. In: Proc. ICASSP, Munich, Germany, pp. 959–962 (1997)

    Google Scholar 

  10. http://htk.eng.cam.ac.uk

  11. Moulines, E., Charpentier, F.: Pitch-SynchronousWaveform Processing Techniques for Textto- Speech Synthesis Using. Speech Communication 9, 453–467 (1990)

    Article  Google Scholar 

  12. Odell, J.J.: The Use of Context in Large Vocabulary Speech Recognition. PhD. Thesis, Queen’s College (March 1995)

    Google Scholar 

  13. Sakoe, H., Chiba, S.: A Dynamic Programming Algorithm Optimization for Spoken Word Recognition. IEEE Transactions on Acoustics, Speech and Signal Processing ASSP–26(1), 43–49 (1978)

    Article  Google Scholar 

  14. Pierrehumbert, J.: The Phonology and Phonetics of English Intonation. PhD. Thesis, MIT, Boston (1980)

    Google Scholar 

  15. Toda, T., Kawai, H., Tsuzaki, M.: Optimizing Sub-Cost Functions for Segment Selection Based on Perceptual Evaluations in Concatenative Speech Synthesis. In: Proc. ICASSP, Montreal, Quebec, Canada, pp. 657–660 (2004)

    Google Scholar 

  16. Tokuda, K., Masuko, T., Yamada, T., Kobayashi, T., Imai, S.: An Algorithm for Speech Parameters Generation from Continuous Mixture HMMs with Dynamic Features. In: Proc. Eurospeech, pp. 757–760 (1995)

    Google Scholar 

  17. Tokuda, K., Zen, H., Black, A.: An HMM-based Speech Synthesis Applied to English. In: Proc. of IEEEWorkshop on Speech Synthesis, Santa Monica (September 2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Rouibia, S., Rosec, O., Moudenc, T. (2005). Unit Selection for Speech Synthesis Based on Acoustic Criteria. In: Matoušek, V., Mautner, P., Pavelka, T. (eds) Text, Speech and Dialogue. TSD 2005. Lecture Notes in Computer Science(), vol 3658. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11551874_36

Download citation

  • DOI: https://doi.org/10.1007/11551874_36

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-28789-6

  • Online ISBN: 978-3-540-31817-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics