Abstract
The partially observable Markov decision process (POMDP) framework has been applied in dialogue systems as a formal framework to represent uncertainty explicitly while being robust to noise. In this context, estimating the dialogue POMDP model components is a significant challenge as they have a direct impact on the optimized dialogue POMDP policy. To achieve such an estimation, we propose methods for learning dialogue POMDP model components using noisy and unannotated dialogues. Specifically, we introduce techniques to learn the set of possible user intentions from dialogues, use them as the dialogue POMDP states, and learn a maximum likelihood POMDP transition model from data. Since it is crucial to reduce the observation state size, we then propose two observation models: the keyword model and the intention model. Using these two models, the number of observations is reduced significantly while the POMDP performance remains high particularly in the intention POMDP. Learning states and observations sustaining a POMDP are both covered in this first part (part I) and experimented from dialogues collected by SmartWheeler (an intelligent wheelchair which aims to help persons with disabilities). Part II covers the reward model learning required by the POMDP.






Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
Notice that these results are only based on the dialogue POMDP simulation; where there exists neither user utterance nor machine’s utterance but only the simulated action and observations.
References
Atrash, A., & Pineau, J. (2010). A Bayesian method for learning POMDP observation parameters for robot interaction management systems. In The POMDP practitioners workshop.
Blei, D. (2012). Introduction to probabilistic topic models. Communications of the ACM, 55(4), 77–84.
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022.
Chinaei, H. R., Chaib-draa, B., & Lamontagne, L. (2009). Learning user intentions in spoken dialogue systems. In Proceedings of the 1st International Conference on Agents and Artificial Intelligence (ICAART’09), Porto, Portugal.
Choi, J., & Kim, K.-E. (2011). Inverse reinforcement learning in partially observable environments. Journal of Machine Learning Research, 12, 691–730.
Daud, A., Li, J., Zhou, L., & Muhammad, F. (2010). Knowledge discovery through directed probabilistic topic models: A survey. Frontiers of Computer Science in China, 4(2), 280–301.
Doshi, F., & Roy, N. (2007). Efficient model learning for dialog management. In Proceedings of the 2nd ACM SIGCHI/SIGART conference on Human-Robot Interaction (HRI’07), Arlington, Virginia, USA.
Doshi, F., & Roy, N. (2008). Spoken language interaction with model uncertainty: An adaptive human-robot interaction system. Connection Science, 20(4), 299–318.
Gašić, M. (2011). Statistical dialogue modelling. PhD thesis, Department of Engineering, University of Cambridge.
Gruber, A., & Popat, A. (2007). Notes regarding computations in open htmm. http://openhtmm.googlecode.com/files/htmm_computations.pdf
Gruber, A., Rosen-Zvi, M., & Weiss, Y. (2007). Hidden topic Markov models. In Artificial intelligence and statistics (AISTATS’07), San Juan, Puerto Rico, USA.
Kaelbling, L., Littman, M., & Cassandra, A. (1998). Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101(1–2), 99–134.
Ko, Y., & Seo, J. (2004). Learning with unlabeled data for text categorization using bootstrapping and feature projection techniques. In Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics (ACL’04), Barcelona, Spain.
Matsubara, S., Kimura, S., Kawaguchi, N., Yamaguchi, Y., & Inagaki, Y. (2002). Example-based speech intention understanding and its application to in-car spoken dialogue system. In Proceedings of the 19th International Conference on Computational linguistics (Vol. 1), Taipei, Taiwan.
Ng, A. Y., & Russell, S. J. (2000). Algorithms for inverse reinforcement learning. In Proceedings of the 17th International Conference on Machine Learning (ICML’00), Stanford, CA, USA.
Paek, T., & Pieraccini, R. (2008). Automating spoken dialogue management design using machine learning: An industry perspective. Speech Communication, 50(8), 716–729.
Pineau, J., Gordon, G., & Thrun, S. (2003). Point-based value iteration: An anytime algorithm for POMDPs. In International Joint Conference on Artificial Intelligence (IJCAI’03), Acapulco, Mexico.
Pineau, J., West, R., Atrash, A., Villemure, J., & Routhier, F. (2011). On the feasibility of using a standardized test for evaluating a speech-controlled smart wheelchair. International Journal of Intelligent Control and Systems, 16(2), 124–131.
Png, S. & Pineau, J. (2011). Bayesian reinforcement learning for POMDP-based dialogue systems. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’11), Prague, Czech Republic.
Png, S., Pineau, J., & Chaib-Draa, B. (2012). Building adaptive dialogue systems via bayes-adaptive POMDPs. IEEE Journal of Selected Topics in Signal Processing, 6(8), 917–927.
Rabiner, L. R. (1990). Readings in speech recognition. In Chapter A tutorial on hidden Markov models and selected applications in speech recognition (pp. 267–296). San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.
Roy, N., Pineau, J., & Thrun, S. (2000). Spoken dialogue management using probabilistic reasoning. In Proceedings of the 38th Annual Meeting on Association for Computational Linguistics (ACL’00), Hong Kong.
Thomson, B. (2009). Statistical methods for spoken dialogue management. PhD thesis, Department of Engineering, University of Cambridge.
Weilhammer, K., Williams, J. D., & Young, S. (2004). The SACTI-2 corpus: Guide for research users. Cambridge University. Technical report.
Williams, J. D. (2006). Partially observable Markov decision processes for spoken dialogue management. PhD thesis, Department of Engineering, University of Cambridge.
Williams, J. D., & Young, S. (2005). The SACTI-1 corpus: Guide for research users. Department of Engineering, University of Cambridge. Technical report.
Williams, J. D., & Young, S. (2007). Partially observable Markov decision processes for spoken dialog systems. Computer Speech and Language, 21, 393–422.
Zhang, B., Cai, Q., Mao, J., Chang, E., & Guo, B. (2001a). Spoken dialogue management as planning and acting under uncertainty. In Proceedings of the 9th European Conference on Speech Communication and Technology (Eurospeech’01), Aalborg, Denmark.
Zhang, B., Cai, Q., Mao, J., & Guo, B. (2001b). Planning and acting under uncertainty: A new model for spoken dialogue system. In Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence (UAI’01), Seattle, Washington, USA.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Chinaei, H.R., Chaib-draa, B. Dialogue POMDP components (part I): learning states and observations. Int J Speech Technol 17, 309–323 (2014). https://doi.org/10.1007/s10772-014-9244-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-014-9244-6