{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,8,9]],"date-time":"2024-08-09T13:55:06Z","timestamp":1723211706227},"reference-count":40,"publisher":"Association for Computing Machinery (ACM)","issue":"3","funder":[{"DOI":"10.13039\/501100004963","name":"Seventh Framework Programme","doi-asserted-by":"publisher","award":["216594"],"id":[{"id":"10.13039\/501100004963","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Speech Lang. Process."],"published-print":{"date-parts":[[2011,5]]},"abstract":"\n Spoken Dialogue Systems (SDS) are systems which have the ability to interact with human beings using natural language as the medium of interaction. A dialogue policy plays a crucial role in determining the functioning of the dialogue management module. Handcrafting the dialogue policy is not always an option, considering the complexity of the dialogue task and the stochastic behavior of users. In recent years approaches based on Reinforcement Learning (RL) for policy optimization in dialogue management have been proved to be an efficient approach for dialogue policy optimization. Yet most of the conventional RL algorithms are data intensive and demand techniques such as user simulation. Doing so, additional modeling errors are likely to occur. This paper explores the possibility of using a set of approximate dynamic programming algorithms for policy optimization in SDS. Moreover, these algorithms are combined to a method for learning a sparse representation of the value function. Experimental results show that these algorithms when applied to dialogue management optimization are particularly\n sample efficient<\/jats:italic>\n , since they learn from few hundreds of dialogue examples. These algorithms learn in an\n off-policy<\/jats:italic>\n manner, meaning that they can learn optimal policies with dialogue examples generated with a quite simple strategy. Thus they can learn good dialogue policies directly\n from data<\/jats:italic>\n , avoiding user modeling errors.\n <\/jats:p>","DOI":"10.1145\/1966407.1966412","type":"journal-article","created":{"date-parts":[[2011,6,6]],"date-time":"2011-06-06T11:51:38Z","timestamp":1307361098000},"page":"1-21","update-policy":"http:\/\/dx.doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":30,"title":["Sample-efficient batch reinforcement learning for dialogue management optimization"],"prefix":"10.1145","volume":"7","author":[{"given":"Olivier","family":"Pietquin","sequence":"first","affiliation":[{"name":"Sup\u00e9lec and UMI 2958 (GeorgiaTech - CNRS), Metz, France"}]},{"given":"Matthieu","family":"Geist","sequence":"additional","affiliation":[{"name":"Sup\u00e9lec, Metz, France"}]},{"given":"Senthilkumar","family":"Chandramohan","sequence":"additional","affiliation":[{"name":"Sup\u00e9lec, Metz, France"}]},{"given":"Herv\u00e9","family":"Frezza-Buet","sequence":"additional","affiliation":[{"name":"Sup\u00e9lec and UMI 2958 (GeorgiaTech - CNRS), Metz, France"}]}],"member":"320","published-online":{"date-parts":[[2011,6,6]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"Dynamic Programming","author":"Bellman R.","unstructured":"Bellman , R. 1957. Dynamic Programming 6 th Ed. Dover Publications . Bellman, R. 1957. Dynamic Programming 6th Ed. Dover Publications.","edition":"6"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.2307\/2002797"},{"key":"e_1_2_1_3_1","first-page":"155","article-title":"Polynomial approximation\u2014a new computational technique in dynamic programming: allocation processes","volume":"17","author":"Bellman R.","year":"1973","unstructured":"Bellman , R. , Kalaba , R. , and Kotkin , B. 1973 . Polynomial approximation\u2014a new computational technique in dynamic programming: allocation processes . Math. Computat. 17 , 155 -- 161 . Bellman, R., Kalaba, R., and Kotkin, B. 1973. Polynomial approximation\u2014a new computational technique in dynamic programming: allocation processes. Math. Computat. 17, 155--161.","journal-title":"Math. Computat."},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF00114723"},{"key":"e_1_2_1_5_1","volume-title":"Proceedings of the International Conference on Speech Communication and Technologies (Interspeech'10)","author":"Chandramohan S.","unstructured":"Chandramohan , S. , Geist , M. , and Pietquin , O . 2010a. Optimizing spoken dialogue management with fitted value iteration . In Proceedings of the International Conference on Speech Communication and Technologies (Interspeech'10) . ISCA, 86--89. Chandramohan, S., Geist, M., and Pietquin, O. 2010a. Optimizing spoken dialogue management with fitted value iteration. In Proceedings of the International Conference on Speech Communication and Technologies (Interspeech'10). ISCA, 86--89."},{"key":"e_1_2_1_6_1","volume-title":"Proceedings of the 11th SIGDial Conference on Discourse and Dialogue. ACL, 107--115","author":"Chandramohan S.","unstructured":"Chandramohan , S. , Geist , M. , and Pietquin , O . 2010b. Sparse approximate dynamic programming for dialog management . In Proceedings of the 11th SIGDial Conference on Discourse and Dialogue. ACL, 107--115 . Chandramohan, S., Geist, M., and Pietquin, O. 2010b. Sparse approximate dynamic programming for dialog management. In Proceedings of the 11th SIGDial Conference on Discourse and Dialogue. ACL, 107--115."},{"key":"e_1_2_1_7_1","volume-title":"Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU). 80--87","author":"Eckert W.","unstructured":"Eckert , W. , Levin , E. , and Pieraccini , R . 1997. User Modeling for Spoken Dialogue System Evaluation . In Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU). 80--87 . Eckert, W., Levin, E., and Pieraccini, R. 1997. User Modeling for Spoken Dialogue System Evaluation. In Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU). 80--87."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSP.2004.830985"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.5555\/1046920.1088690"},{"key":"e_1_2_1_10_1","volume-title":"Proceedings of SIGDIAL'10","author":"Gasic M.","unstructured":"Gasic , M. , Jurcicek , F. , Keizer , S. , Mairesse , F. , Thomson , B. , Yu , K. , and Young , S . 2010. Gaussian processes for fast policy optimisation of pomdp-based dialogue managers . In Proceedings of SIGDIAL'10 . Gasic, M., Jurcicek, F., Keizer, S., Mairesse, F., Thomson, B., Yu, K., and Young, S. 2010. Gaussian processes for fast policy optimisation of pomdp-based dialogue managers. In Proceedings of SIGDIAL'10."},{"key":"e_1_2_1_11_1","volume-title":"Proceedings of the Workshop on Active Learning and Experimental Design (AL&E Collocated with AISTAT'10)","author":"Geist M.","unstructured":"Geist , M. and Pietquin , O . 2010a. Managing uncertainty within the ktd framework . In Proceedings of the Workshop on Active Learning and Experimental Design (AL&E Collocated with AISTAT'10) . Geist, M. and Pietquin, O. 2010a. Managing uncertainty within the ktd framework. In Proceedings of the Workshop on Active Learning and Experimental Design (AL&E Collocated with AISTAT'10)."},{"key":"e_1_2_1_12_1","volume-title":"Proceedings of the IEEE International Conference on Ultra Modern Control Systems (ICUMT'10)","author":"Geist M.","unstructured":"Geist , M. and Pietquin , O . 2010b. Statistically linearized least-squares temporal differences . In Proceedings of the IEEE International Conference on Ultra Modern Control Systems (ICUMT'10) . IEEE. Geist, M. and Pietquin, O. 2010b. Statistically linearized least-squares temporal differences. In Proceedings of the IEEE International Conference on Ultra Modern Control Systems (ICUMT'10). IEEE."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1016\/B978-1-55860-377-6.50040-2"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1162\/coli.2008.07-028-R2-05-82"},{"key":"e_1_2_1_15_1","volume-title":"Proceedings of Interspeech.","author":"Jurcicek F.","unstructured":"Jurcicek , F. , Thomson , B. , Keizer , S. , Gasic , M. , Mairesse , F. , Yu , K. , and Young , S . 2010. Natural Belief-Critic: a reinforcement algorithm for parameter estimation in statistical spoken dialogue systems . In Proceedings of Interspeech. Jurcicek, F., Thomson, B., Keizer, S., Gasic, M., Mairesse, F., Yu, K., and Young, S. 2010. Natural Belief-Critic: a reinforcement algorithm for parameter estimation in statistical spoken dialogue systems. In Proceedings of Interspeech."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.5555\/945365.964290"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1017\/S1351324900002539"},{"key":"e_1_2_1_18_1","volume-title":"Proceedings of the Meeting of the European Chapter of the Associaton for Computational Linguistics (EACL'06)","author":"Lemon O.","unstructured":"Lemon , O. , Georgila , K. , Henderson , J. , and Stuttle , M . 2006. An ISU dialogue system exhibiting reinforcement learning of dialogue policies: generic slot-filling in the TALK in-car system . In Proceedings of the Meeting of the European Chapter of the Associaton for Computational Linguistics (EACL'06) . Lemon, O., Georgila, K., Henderson, J., and Stuttle, M. 2006. An ISU dialogue system exhibiting reinforcement learning of dialogue policies: generic slot-filling in the TALK in-car system. In Proceedings of the Meeting of the European Chapter of the Associaton for Computational Linguistics (EACL'06)."},{"key":"e_1_2_1_19_1","volume-title":"Proceedings of the European Conference on Speech Communication and Technologies (Interspeech'07)","author":"Lemon O.","unstructured":"Lemon , O. and Pietquin , O . 2007. Machine Learning for Spoken Dialogue Systems . In Proceedings of the European Conference on Speech Communication and Technologies (Interspeech'07) . 2685--2688. Lemon, O. and Pietquin, O. 2007. Machine Learning for Spoken Dialogue Systems. In Proceedings of the European Conference on Speech Communication and Technologies (Interspeech'07). 2685--2688."},{"key":"e_1_2_1_20_1","volume-title":"Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP'98)","author":"Levin E.","unstructured":"Levin , E. and Pieraccini , R . 1998. Using markov decision process for learning dialogue strategies . In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP'98) . Levin, E. and Pieraccini, R. 1998. Using markov decision process for learning dialogue strategies. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP'98)."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/89.817450"},{"key":"e_1_2_1_22_1","volume-title":"Proceedings of the International Conference on Speech Communication and Technologies (InterSpeech'09)","author":"Li L.","unstructured":"Li , L. , Balakrishnan , S. , and Williams , J . 2009. Reinforcement Learning for Dialog Management using Least-Squares Policy Iteration and Fast Feature Selection . In Proceedings of the International Conference on Speech Communication and Technologies (InterSpeech'09) . Li, L., Balakrishnan, S., and Williams, J. 2009. Reinforcement Learning for Dialog Management using Least-Squares Policy Iteration and Fast Feature Selection. In Proceedings of the International Conference on Speech Communication and Technologies (InterSpeech'09)."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1991.3.2.246"},{"key":"e_1_2_1_24_1","unstructured":"Pietquin O. 2004. A Framework for Unsupervised Learning of Dialogue Strategies. SIMILAR Collection. Presses Universitaires de Louvain. Pietquin O. 2004. A Framework for Unsupervised Learning of Dialogue Strategies. SIMILAR Collection. Presses Universitaires de Louvain."},{"key":"e_1_2_1_25_1","volume-title":"Proceedings of the 7th IEEE International Conference on Multimedia and Expo. 425--428","author":"Pietquin O.","year":"2006","unstructured":"Pietquin , O. 2006 a. Consistent Goal-Directed User Model for Realistic Man-Machine Task-Oriented Spoken Dialogue Simulation . In Proceedings of the 7th IEEE International Conference on Multimedia and Expo. 425--428 . Pietquin, O. 2006a. Consistent Goal-Directed User Model for Realistic Man-Machine Task-Oriented Spoken Dialogue Simulation. In Proceedings of the 7th IEEE International Conference on Multimedia and Expo. 425--428."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1007\/11861461_19"},{"key":"e_1_2_1_27_1","volume-title":"Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP'02)","author":"Pietquin O.","unstructured":"Pietquin , O. and Renals , S . 2002. ASR System Modeling For Automatic Evaluation And Optimization of Dialogue Systems . In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP'02) . Vol. I . 45--48. Pietquin, O. and Renals, S. 2002. ASR System Modeling For Automatic Evaluation And Optimization of Dialogue Systems. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP'02). Vol. I. 45--48."},{"key":"e_1_2_1_28_1","volume-title":"Markov Decision Processes: Discrete Stochastic Dynamic Programming","author":"Puterman M. L.","unstructured":"Puterman , M. L. 1994. Markov Decision Processes: Discrete Stochastic Dynamic Programming . Wiley-Interscience . Puterman, M. L. 1994. Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley-Interscience."},{"key":"e_1_2_1_29_1","volume-title":"Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics (ACL\/HLT '08)","author":"Rieser V.","unstructured":"Rieser , V. and Lemon , O . 2008. Learning effective multimodal dialogue strategies from wizard-of-oz data: Bootstrapping and evaluation . In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics (ACL\/HLT '08) . Rieser, V. and Lemon, O. 2008. Learning effective multimodal dialogue strategies from wizard-of-oz data: Bootstrapping and evaluation. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics (ACL\/HLT '08)."},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1147\/rd.33.0210"},{"key":"e_1_2_1_31_1","volume-title":"Proceedings of workshop on Automatic Speech Recognition and Understanding (ASRU'05)","author":"Schatzmann J.","unstructured":"Schatzmann , J. , Stuttle , M. N. , Weilhammer , K. , and Young , S . 2005. Effects of the user model on simulation-based learning of dialogue strategies . In Proceedings of workshop on Automatic Speech Recognition and Understanding (ASRU'05) . Schatzmann, J., Stuttle, M. N., Weilhammer, K., and Young, S. 2005. Effects of the user model on simulation-based learning of dialogue strategies. In Proceedings of workshop on Automatic Speech Recognition and Understanding (ASRU'05)."},{"key":"e_1_2_1_32_1","volume-title":"Proceedings of the International Workshop on Automatic Speech Recognition and Understanding (ASRU'07)","author":"Schatzmann J.","unstructured":"Schatzmann , J. , Thomson , B. , and Young , S . 2007. Error simulation for training statistical dialogue systems . In Proceedings of the International Workshop on Automatic Speech Recognition and Understanding (ASRU'07) . Schatzmann, J., Thomson, B., and Young, S. 2007. Error simulation for training statistical dialogue systems. In Proceedings of the International Workshop on Automatic Speech Recognition and Understanding (ASRU'07)."},{"key":"e_1_2_1_33_1","volume-title":"Proceedings of NAACL Workshop on Adaptation in Dialogue Systems.","author":"Scheffler K.","unstructured":"Scheffler , K. and Young , S . 2001. Corpus-based dialogue simulation for automatic strategy learning and evaluation . In Proceedings of NAACL Workshop on Adaptation in Dialogue Systems. Scheffler, K. and Young, S. 2001. Corpus-based dialogue simulation for automatic strategy learning and evaluation. In Proceedings of NAACL Workshop on Adaptation in Dialogue Systems."},{"key":"e_1_2_1_34_1","volume-title":"Kernels: Support Vector Machines, Regularization, Optimization, and Beyond","author":"Scholkopf B.","year":"2001","unstructured":"Scholkopf , B. and Smola , A. J . 2001 . Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond . MIT Press , Cambridge, MA . Scholkopf, B. and Smola, A. J. 2001. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge, MA."},{"key":"e_1_2_1_35_1","volume-title":"Proceedings of the Annual Meeting of the Neural Iniformation Processing Society (NIPS'99)","author":"Singh S.","unstructured":"Singh , S. , Kearns , M. , Litman , D. , and Walker , M . 1999. Reinforcement learning for spoken dialogue systems . In Proceedings of the Annual Meeting of the Neural Iniformation Processing Society (NIPS'99) . Springer. Singh, S., Kearns, M., Litman, D., and Walker, M. 1999. Reinforcement learning for spoken dialogue systems. In Proceedings of the Annual Meeting of the Neural Iniformation Processing Society (NIPS'99). Springer."},{"key":"e_1_2_1_36_1","volume-title":"Reinforcement Learning: An Introduction (Adaptive Computation and Machine Learning)","author":"Sutton R. S.","year":"1998","unstructured":"Sutton , R. S. and Barto , A. G . 1998 . Reinforcement Learning: An Introduction (Adaptive Computation and Machine Learning) 3 rd Ed. The MIT Press . Sutton, R. S. and Barto, A. G. 1998. Reinforcement Learning: An Introduction (Adaptive Computation and Machine Learning) 3rd Ed. The MIT Press.","edition":"3"},{"key":"e_1_2_1_37_1","unstructured":"W3C 2008. VoiceXML 3.0 Specifications. W3C. http:\/\/www.w3.org\/TR\/voicexml30\/. W3C 2008. VoiceXML 3.0 Specifications. W3C. http:\/\/www.w3.org\/TR\/voicexml30\/."},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.3115\/976909.979652"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.csl.2006.06.008"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/TNN.2007.899161"}],"container-title":["ACM Transactions on Speech and Language Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1966407.1966412","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,29]],"date-time":"2022-12-29T20:34:46Z","timestamp":1672346086000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1966407.1966412"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2011,5]]},"references-count":40,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2011,5]]}},"alternative-id":["10.1145\/1966407.1966412"],"URL":"https:\/\/doi.org\/10.1145\/1966407.1966412","relation":{},"ISSN":["1550-4875","1550-4883"],"issn-type":[{"value":"1550-4875","type":"print"},{"value":"1550-4883","type":"electronic"}],"subject":[],"published":{"date-parts":[[2011,5]]},"assertion":[{"value":"2010-07-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2011-02-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2011-06-06","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}