ISCA Archive - The Role of Voice Quality in the Perception of Prominence in Synthetic Speech
ISCA Archive Interspeech 2019
ISCA Archive Interspeech 2019

The Role of Voice Quality in the Perception of Prominence in Synthetic Speech

Andy Murphy, Irena Yanushevskaya, Ailbhe Ní Chasaide, Christer Gobl

This paper explores how prominence can be modelled in speech synthesis through voice quality variation. Synthetic utterances varying in voice quality (breathy, modal, tense) were generated using a glottal source model where the global waveshape parameter Rd was the main control parameter and f0 was not varied. A manipulation task perception experiment was conducted to establish perceptually salient Rd values in the signalling of focus. The participants were presented with mini-dialogues designed to elicit narrow focus (with different focal syllable locations) and were asked to manipulate an unknown parameter in the synthetic utterances to produce a natural response. The results showed that participants manipulated Rd not only in focal syllables, but also in the pre- and postfocal material. The direction of Rd manipulation in the focal syllables was the same across the three voice qualities — towards decreased Rd values (tenser phonation). The magnitude of the decrease in Rd was significantly less for tense voice compared to breathy and modal voice, but did not vary with the location of the focal syllable in the utterance. Overall, the results suggest that Rd is effective as a control parameter for modelling prominence in synthetic speech.