Response selection and turn-taking for a sensitive artificial listening agent

Author:

Maat, Mark ter [claim]

Description:

This thesis focusses on two aspects of the interaction between a user and a virtual human, namely the perception of turn-taking strategies and the selec- tion of appropriate responses. This research was carried out in the context of the SEMAINE project, in which a virtual listening agent was built: a virtual agent that tries to keep the user talking for as long as possible. Additionally, the system consists of four specific characters, each with a certain emotional state: a happy, a gloomy, an aggressive, and a pragmatic one. The first part describes the study of how different turn-taking strategies used by a dialogue system in uence the perception that users have of that system. These turn-taking strategies are different start times of the next turn (starting before the user's turn has finished, directly when it finishes or after a small pause) and different reactions when overlapping speech is detected (stop speaking, continue normally or continue with a raised voice). These strategies were evaluated in two studies. In the first study, users had to listen to simulated, non-intelligible conversations in which one participant used a predetermined turn-taking strategy. In the second study, users were interviewed by a dialogue system, but the exact timing of each question was controlled by a human wizard. After each study, the users had to complete a questionnaire containing semantic differential scales about how they per- ceived the participant in the conversation. The final part describes the response selection of the listening agent. We decided to select an appropriate response based on the non-verbal input, rather than on the content of the user's speech, to make the listening agent capable of responding appropriately regardless of the topic. This thesis first describes the handcrafted models and then the more data-driven approach. In this approach, humans annotated videos containing user turns with appro- priate possible responses. Classifiers were then used to learn how to respond after a user's turn. The classifiers were tested by letting them predict appro- priate responses for new fragments and let humans rate these responses. We found that some classifiers produced significantly more appropriate responses than a random model.

Publisher:

University of Twente

Year of Publication:

2011

Document Type:

Doctoral thesis ; [Doctoral and postdoctoral thesis]

DDC:

004 Data processing & computer science (computed)

Relations:

http://doc.utwente.nl/78566/1/thesis_M_ter_Maat.pdf ; http://purl.utwente.nl/publications/78566

URL:

http://purl.utwente.nl/publications/78566