ISCA Archive - Prediction of Turn-taking Using Multitask Learning with Prediction of Backchannels and Fillers
ISCA Archive Interspeech 2018
ISCA Archive Interspeech 2018

Prediction of Turn-taking Using Multitask Learning with Prediction of Backchannels and Fillers

Kohei Hara, Koji Inoue, Katsuya Takanashi, Tatsuya Kawahara

We address prediction of turn-taking considering related behaviors such as backchannels and fillers. Backchannels are used by listeners to acknowledge that the current speaker can hold the turn. On the other hand, fillers are used by prospective speakers to indicate a will to take a turn. We propose a turn-taking model based on multitask learning in conjunction with prediction of backchannels and fillers. The multitask learning of LSTM neural networks shared by these tasks allows for efficient and generalized learning and thus improves prediction accuracy. Evaluations with two kinds of dialogue corpora of human-robot interaction demonstrate that the proposed multitask learning scheme outperforms the conventional single-task learning.