Automatic Detection of Inconsistencies in Open-Domain Chatbots

Prats, Jorge Mira; Estecha-Garitagoitia, Marcos; Rodríguez-Cantelar, Mario; D’Haro, Luis Fernando

doi:10.21437/IberSPEECH.2022-24

Current pre-trained Large Language Models applied to chatbots are capable of producing good quality sentences, handling different conversation topics, and larger interaction times. Unfortunately, the generated responses highly depend on the data on which the chatbot has been trained on, the specific dialogue history and current turn used for guiding the response, the internal decoding mechanisms, ranking strategies, among others. Therefore, it may happen that for the same question asked by the user, the chatbot may provide a different answer, which in a long-term interaction may produce confusion. In this paper, we propose a new methodology based on three phases: a) automatic detection of dialogue topics using zeroshot learning approaches, b) automatic clustering of distinctive questions, and c) detecting inconsistent answers using K-Means clustering and the Silhouette coefficient. To test our proposal, we used the DailyDialog dataset to detect up to 13 different topics. To detect inconsistencies, we manually generated multiple paraphrased questions. Then, we used multiple pre-trained chatbots to answer those questions. Our results in topic detection show a weighted F-1 value of 0.658, and a 3.4 MSE to predict the number of different responses.

Automatic Detection of Inconsistencies in Open-Domain Chatbots

Jorge Mira Prats, Marcos Estecha-Garitagoitia, Mario Rodríguez-Cantelar, Luis Fernando D’Haro