{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,8,7]],"date-time":"2024-08-07T07:34:29Z","timestamp":1723016069797},"publisher-location":"California","reference-count":0,"publisher":"International Joint Conferences on Artificial Intelligence Organization","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2019,8]]},"abstract":"Gradient-based methods are often used for policy optimization in deep reinforcement learning, despite being vulnerable to local optima and saddle points. Although gradient-free methods (e.g., genetic algorithms or evolution strategies) help mitigate these issues, poor initialization and local optima are still concerns in highly nonconvex spaces. This paper presents a method for policy optimization based on Monte-Carlo tree search and gradient-free optimization. Our method, called Monte-Carlo tree search for policy optimization (MCTSPO), provides a better exploration-exploitation trade-off through the use of the upper confidence bound heuristic. We demonstrate improved performance on reinforcement learning tasks with deceptive or sparse reward functions compared to popular gradient-based and deep genetic algorithm baselines.<\/jats:p>","DOI":"10.24963\/ijcai.2019\/432","type":"proceedings-article","created":{"date-parts":[[2019,7,28]],"date-time":"2019-07-28T03:46:05Z","timestamp":1564285565000},"page":"3116-3122","source":"Crossref","is-referenced-by-count":3,"title":["Monte Carlo Tree Search for Policy Optimization"],"prefix":"10.24963","author":[{"given":"Xiaobai","family":"Ma","sequence":"first","affiliation":[{"name":"Aeronautics and Astronautics Department, Stanford University"}]},{"given":"Katherine","family":"Driggs-Campbell","sequence":"additional","affiliation":[{"name":"Electrical and Computer Engineering Department, University of Illinois Urbana-Champaign"}]},{"given":"Zongzhang","family":"Zhang","sequence":"additional","affiliation":[{"name":"National Key Laboratory for Novel Software Technology, Nanjing University"}]},{"given":"Mykel J.","family":"Kochenderfer","sequence":"additional","affiliation":[{"name":"Aeronautics and Astronautics Department, Stanford University"}]}],"member":"10584","event":{"number":"28","sponsor":["International Joint Conferences on Artificial Intelligence Organization (IJCAI)"],"acronym":"IJCAI-2019","name":"Twenty-Eighth International Joint Conference on Artificial Intelligence {IJCAI-19}","start":{"date-parts":[[2019,8,10]]},"theme":"Artificial Intelligence","location":"Macao, China","end":{"date-parts":[[2019,8,16]]}},"container-title":["Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence"],"original-title":[],"deposited":{"date-parts":[[2019,7,28]],"date-time":"2019-07-28T03:49:15Z","timestamp":1564285755000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.ijcai.org\/proceedings\/2019\/432"}},"subtitle":[],"proceedings-subject":"Artificial Intelligence Research Articles","short-title":[],"issued":{"date-parts":[[2019,8]]},"references-count":0,"URL":"https:\/\/doi.org\/10.24963\/ijcai.2019\/432","relation":{},"subject":[],"published":{"date-parts":[[2019,8]]}}}