Abstract
In spoken dialog systems for humanoid robots, smooth turn-taking function is one of the most important factors to realize natural interaction with users. Speech collisions often occur when a user and the dialog system speak simultaneously. This study presents a method to generate fillers at the beginning of the system utterances to indicate an intention of turn-taking or turn-holding just like human conversations. To this end, we analyzed the relationship between a dialog context and fillers observed in a human-robot interaction corpus, where a user talks with a humanoid robot remotely operated by a human. At first, we annotated dialog act tags in the dialog corpus and analyzed the typical type of a sequential pair of dialog acts, called a DA pair. It is found that the typical filler forms and their occurrence patterns are different according to the DA pairs. Then, we build a machine learning model to predict occurrence of fillers and its appropriate form from linguistic and prosodic features extracted from the preceding and the following utterances. The experimental results show that the effective feature set also depends on the type of DA pair.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Akita Y, Kawahara T (2010) Statistical transformation of language and pronunciation models for spontaneous speech recognition. IEEE Trans Audio Speech Lang Process 18(6):1539–1549
Andersson S, Georgila K, Traum D, Aylett M, Clark R (2010) Prediction and realisation of conversational characteristics by utilising spontaneous speech for unit selection. In: Proceedings of the Speech Prosody
Bunt H, Alexandersson J, Carletta J, Chae JW, Fang AC, Hasida K, Lee K, Petukhova O, Popescu-Belis A, Romary L et al (2010) Towards an ISO standard for dialogue act annotation. In: proceedings of the LREC 2010, Malta
Cohen J (1960) A coefficient of agreement for nominal scales. Educ Psychol Meas 20(1):37–46
Den Y (2015) Some phonological, syntactic, and cognitive factors behind phrase-final lengthening in spontaneous Japanese: a corpus-based study. Lab Phonol 6(3–4):337–379
Den Y, Koiso H, Maruyama T, Maekawa K, Takanashi K, Enomoto M, Yoshida N (2010) Two-level annotation of utterance-units in Japanese dialogs: an empirically emerged scheme. In: LREC
Inoue K, Milhorat P, Lala D, Zhao T, Kawahara T (2016) Talking with ERICA, an autonomous android. In: Proceedings of the SIGdial meeting discourse and dialogue, pp 212–215
Itagaki H, Morise M, Nisimura R, Irino T, Kawahara H (2009) A bottom-up procedure to extract periodicity structure of voiced sounds and its application to represent and restoration of pathological voices. In: MAVEBA, pp 115–118
Kawahara H, Masuda-Katsuse I, De Cheveigne A (1999) Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: possible role of a repetitive structure in sounds. Speech Commun 27(3):187–207
Kawahara H, Morise M, Takahashi T, Nisimura R, Irino T, Banno H (2008) Tandem-straight: a temporally stable power spectral representation for periodic signals and applications to interference-free spectrum, F0, and aperiodicity estimation. In: IEEE international conference on acoustics, speech and signal processing, 2008. ICASSP 2008. IEEE, pp 3933–3936
Kawahara T, Yamaguchi T, Inoue K, Takanashi K, Ward N (2016) Prediction and generation of backchannel form for attentive listening systems. In: Proceedings of the INTERSPEECH, vol 2016
Koiso H, Horiuchi Y, Tutiya S, Ichikawa A, Den Y (1998) An analysis of turn-taking and backchannels based on prosodic and syntactic features in Japanese map task dialogs. Lang Speech 41(3–4):295–321
Koiso H, Nishikawa K, Mabuchi Y (2006) Construction of the corpus of spontaneous Japanese
Lala D, Milhorat P, Inoue K, Ishida M, Takanashi K, Kawahara T (2017) Attentive listening system with backchanneling, response generation and flexible turn-taking. In: Proceedings of the SIGdial meeting discourse and dialogue, pp 127–136
Milhorat P, Lala D, Inoue K, Tianyu Z, Ishida M, Takanashi K, Nakamura S, Kawahara T (2017) A conversational dialogue manager for the humanoid robot ERICA. In: Proceedings of the international workshop spoken dialogue systems (IWSDS) (2017)
Ohsuga T, Horiuchi Y, Nishida M, Ichikawa A (2006) Prediction of turn-taking from prosody in spontaneous dialogue. Trans Jpn Soc Artif Intell 21:1–8
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830
Schegloff EA, Sacks H (1973) Opening up closings. Semiotica 8(4):289–327
Shiwa T, Kanda T, Imai M, Ishiguro H, Hagita N (2008) How quickly should communication robots respond? In: 2008 3rd ACM/IEEE international conference on human-robot interaction (HRI). IEEE, pp 153–160
Skantze G, Hjalmarsson A, Oertel C (2014) Turn-taking, feedback and joint attention in situated human-robot interaction. Speech Commun 65:50–66
Sundaram S, Narayanan S (2002) Spoken language synthesis: experiments in synthesis of spontaneous monologues. In: Proceedings of the IEEE workshop on speech synthesis, pp 203–206
Watanabe M (2009) Features and roles of filled pauses in speech communication: a corpus-based study of spontaneous speech. Hitsuji Syobo Publishing
Acknowledgements
This work was supported by JST ERATO Ishiguro Symbiotic Human-Robot Interaction program (Grant Number JPMJER1401), Japan.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Nakanishi, R., Inoue, K., Nakamura, S., Takanashi, K., Kawahara, T. (2019). Generating Fillers Based on Dialog Act Pairs for Smooth Turn-Taking by Humanoid Robot. In: D'Haro, L., Banchs, R., Li, H. (eds) 9th International Workshop on Spoken Dialogue System Technology. Lecture Notes in Electrical Engineering, vol 579. Springer, Singapore. https://doi.org/10.1007/978-981-13-9443-0_8
Download citation
DOI: https://doi.org/10.1007/978-981-13-9443-0_8
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-9442-3
Online ISBN: 978-981-13-9443-0
eBook Packages: Literature, Cultural and Media StudiesLiterature, Cultural and Media Studies (R0)