Abstract
This contribution describes experiments with emotional style conversion performed on the utterances produced by the Czech and Slovak text-to-speech (TTS) system with cepstral description and basic prosody generated by rules. Emotional style conversion was realized as post-processing of the TTS output speech signal, and as a real-time implementation into the system. Emotional style prototypes representing three emotional states (sad, angry, and joyous) were obtained from the sentences with the same information content. The problem with the different frame length between the prototype and the target utterance was solved by linear time scale mapping (LTSM). The results were evaluated by a listening test of the resynthetized utterances.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Přibilová, A., Přibil, J.: Non-linear Frequency Scale Mapping for Voice Conversion in Text-to-Speech System with Cepstral Description. Speech Communication 48, 1691–1703 (2006)
Murray, I.R., Arnott, J.L., Rohwer, E.A.: Emotional Stress in Synthetic Speech: Progress and Future Directions. Speech Communication 20, 85–91 (1996)
Vlčková-Mejvaldová, J.: Prosodic Changes in Emotional Speech. In: Vích, R. (ed.) Proc. of the 16th Conference Electronic Speech Signal Processing Joined with the 15th Czech-German Workshop Speech Processing, Prague, pp. 38–45 (2005)
Huang, X., Acero, A., Hon, H.-W.: Spoken Language Processing: A Guide to Theory, Algorithm, and System Development. Prentice Hall, New Jersey (2001)
Murray, I.R., Arnott, J.L.: Implementation and Testing of a System for Producing Emotion-by-Rule in Synthetic Speech. Speech Communication 16, 369–390 (1995)
Iida, A., Campbell, N., Higuchi, F., Yasumura, M.: A Corpus-Based Speech Synthesis System with Emotion. Speech Communication 40, 161–187 (2003)
Navas, E., Hernáez, I., Luengo, I.: An Objective and Subjective Study of the Role of Semantics and Prosodic Features in Building Corpora for Emotional TTS. IEEE Transactions on Audio, Speech, and Language Processing 14, 1117–1127 (2006)
Přibil, J., Přibilová, A.: Czech TTS Engine for BraillePen Device Based on Pocket PC Platform. In: Vích, R. (ed.) Proc. of the 16th Conf. Electronic Speech Signal Processing Joined with the 15th Czech-German Workshop Speech Processing, Prague, pp. 402–408 (2005)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Přibil, J., Přibilová, A. (2007). Emotional Style Conversion in the TTS System with Cepstral Description. In: Esposito, A., Faundez-Zanuy, M., Keller, E., Marinaro, M. (eds) Verbal and Nonverbal Communication Behaviours. Lecture Notes in Computer Science(), vol 4775. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-76442-7_6
Download citation
DOI: https://doi.org/10.1007/978-3-540-76442-7_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-76441-0
Online ISBN: 978-3-540-76442-7
eBook Packages: Computer ScienceComputer Science (R0)