Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis

Yoshimura, Takayoshi; Tokuda, Keiichi; Masuko, Takashi; Kobayashi, Takao; Kitamura, Tadashi

doi:10.21437/Eurospeech.1999-596

In this paper, we describe an HMMbased speech synthesis system in which spectrum, pitch and state duration are modeled simultaneously in a unified framework of HMM. In the system, pitch and state duration are modeled by multispace probability distribution HMMs and multidimensional Gaussian distributions, respectively. The distributions for spectral parameter, pitch parameter and the state duration are clustered independently by using a decisiontree based context clustering technique. Synthetic speech is generated by using an speech parameter generation algorithm from HMMand a melcepstrum based vocoding technique. Through informal listening tests, we have confirmed that the proposed system successfully synthesizes naturalsounding speech which resembles the speaker the training database.

Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis

Takayoshi Yoshimura, Keiichi Tokuda, Takashi Masuko, Takao Kobayashi, Tadashi Kitamura