ISCA Archive - Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis
ISCA Archive Eurospeech 1999
ISCA Archive Eurospeech 1999

Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis

Takayoshi Yoshimura, Keiichi Tokuda, Takashi Masuko, Takao Kobayashi, Tadashi Kitamura

In this paper, we describe an HMM­based speech synthesis system in which spectrum, pitch and state duration are modeled simultaneously in a unified framework of H­MM. In the system, pitch and state duration are modeled by multispace probability distribution HMMs and multidimensional Gaussian distributions, respectively. The distributions for spectral parameter, pitch parameter and the state duration are clustered independently by using a decision­tree based context clustering technique. Synthetic speech is generated by using an speech parameter generation algorithm from HMMand a melcepstrum based vocod­ing technique. Through informal listening tests, we have confirmed that the proposed system successfully synthesizes natural­sounding speech which resembles the speaker the training database.