{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,6,3]],"date-time":"2024-06-03T10:40:34Z","timestamp":1717411234565},"reference-count":40,"publisher":"World Scientific Pub Co Pte Ltd","issue":"02","funder":[{"DOI":"10.13039\/501100004543","name":"china scholarship council","doi-asserted-by":"publisher","award":["201706990015"],"id":[{"id":"10.13039\/501100004543","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Int. J. Info. Tech. Dec. Mak."],"published-print":{"date-parts":[[2023,3]]},"abstract":" The landmark achievements of AlphaGo Zero have created great research interest into self-play in reinforcement learning. In self-play, Monte Carlo Tree Search (MCTS) is used to train a deep neural network, which is then used itself in tree searches. The training is governed by many hyper-parameters. There has been surprisingly little research on design choices for hyper-parameter values and loss functions, presumably because of the prohibitive computational cost to explore the parameter space. In this paper, we investigate 12 hyper-parameters in an AlphaZero-like self-play algorithm and evaluate how these parameters contribute to training. Through multi-objective analysis, we identify four important hyper-parameters to further assess. To start, we find surprising results where too much training can sometimes lead to lower performance. Our main result is that the number of self-play iterations subsumes MCTS-search simulations, game episodes and training epochs. As a consequence of our experiments, we provide recommendations on setting hyper-parameter values in self-play. The outer loop of self-play iterations should be emphasized, in favor of the inner loop. This means hyper-parameters for the inner loop, should be set to lower values. A secondary result of our experiments concerns the choice of optimization goals, for which we also provide recommendations. <\/jats:p>","DOI":"10.1142\/s0219622022500547","type":"journal-article","created":{"date-parts":[[2022,8,19]],"date-time":"2022-08-19T03:29:16Z","timestamp":1660879756000},"page":"829-853","source":"Crossref","is-referenced-by-count":2,"title":["Analysis of Hyper-Parameters for AlphaZero-Like Deep Reinforcement Learning"],"prefix":"10.1142","volume":"22","author":[{"ORCID":"http:\/\/orcid.org\/0000-0003-1799-6273","authenticated-orcid":false,"given":"Hui","family":"Wang","sequence":"first","affiliation":[{"name":"Universiteit Leiden, Leiden Institute of Advanced Computer Science, Leiden, Netherlands"}]},{"given":"Michael","family":"Emmerich","sequence":"additional","affiliation":[{"name":"Universiteit Leiden, Leiden Institute of Advanced Computer Science, Leiden, Netherlands"}]},{"given":"Mike","family":"Preuss","sequence":"additional","affiliation":[{"name":"Universiteit Leiden, Leiden Institute of Advanced Computer Science, Leiden, Netherlands"}]},{"given":"Aske","family":"Plaat","sequence":"additional","affiliation":[{"name":"Universiteit Leiden, Leiden Institute of Advanced Computer Science, Leiden, Netherlands"}]}],"member":"219","published-online":{"date-parts":[[2022,9,24]]},"reference":[{"key":"S0219622022500547BIB001","doi-asserted-by":"publisher","DOI":"10.1038\/nature16961"},{"key":"S0219622022500547BIB002","doi-asserted-by":"publisher","DOI":"10.1038\/nature24270"},{"key":"S0219622022500547BIB003","doi-asserted-by":"publisher","DOI":"10.1126\/science.aar6404"},{"issue":"2","key":"S0219622022500547BIB004","first-page":"114","volume":"2","author":"Tao J.","year":"2016","journal-title":"Journal of Command and Control"},{"issue":"6","key":"S0219622022500547BIB005","doi-asserted-by":"crossref","first-page":"125","DOI":"10.21037\/atm.2016.03.25","volume":"4","author":"Zhang Z.","year":"2016","journal-title":"Annals of Translational Medicine"},{"key":"S0219622022500547BIB008","doi-asserted-by":"publisher","DOI":"10.1109\/TCIAIG.2012.2186810"},{"key":"S0219622022500547BIB009","first-page":"724","volume-title":"Proc. 6th Int. Conf. Agents and Artificial Intelligence 2014","volume":"1","author":"Ruijl B.","year":"2014"},{"key":"S0219622022500547BIB010","doi-asserted-by":"publisher","DOI":"10.1016\/j.neunet.2014.09.003"},{"key":"S0219622022500547BIB011","first-page":"1766","volume-title":"International Conference on Machine Learning","author":"Clark C.","year":"2015"},{"key":"S0219622022500547BIB012","doi-asserted-by":"publisher","DOI":"10.1145\/203330.203343"},{"key":"S0219622022500547BIB013","first-page":"262","volume-title":"Int. Conf. Computers and Games","author":"Heinz E. A.","year":"2000"},{"issue":"2","key":"S0219622022500547BIB014","doi-asserted-by":"crossref","first-page":"57","DOI":"10.4236\/jilsa.2010.22009","volume":"2","author":"Wiering M. A.","year":"2010","journal-title":"Journal of Intelligent Learning Systems and Applications"},{"key":"S0219622022500547BIB015","first-page":"108","volume-title":"Adaptive Dynamic Programming and Reinforcement Learning","author":"Van Der Ree M.","year":"2013"},{"issue":"1","key":"S0219622022500547BIB018","first-page":"1046","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","volume":"34","author":"Wu T.-R.","year":"2020"},{"key":"S0219622022500547BIB020","first-page":"518","volume-title":"Proc. IEEE Conf. Computer Vision and Pattern Recognition","author":"Dong X.","year":"2018"},{"issue":"5","key":"S0219622022500547BIB021","doi-asserted-by":"crossref","first-page":"1515","DOI":"10.1109\/TPAMI.2019.2956703","volume":"43","author":"Dong X.","year":"2019","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"S0219622022500547BIB022","first-page":"132","volume-title":"2018 Conf. Technologies and Applications of Artificial Intelligence (TAAI)","author":"Mandai Y.","year":"2018"},{"key":"S0219622022500547BIB023","doi-asserted-by":"publisher","DOI":"10.1023\/A:1007379606734"},{"issue":"3","key":"S0219622022500547BIB024","doi-asserted-by":"crossref","first-page":"294","DOI":"10.3233\/ICG-180060","volume":"40","author":"Matsuzaki K.","year":"2018","journal-title":"ICGA Journal"},{"key":"S0219622022500547BIB025","first-page":"142","volume-title":"2018 Conf. Technologies and Applications of Artificial Intelligence (TAAI)","author":"Matsuzaki K.","year":"2018"},{"issue":"11","key":"S0219622022500547BIB026","doi-asserted-by":"crossref","first-page":"4933","DOI":"10.1109\/TNNLS.2019.2959129","volume":"31","author":"Wu D.","year":"2020","journal-title":"IEEE Transactions on Neural Networks and Learning Systems"},{"issue":"2","key":"S0219622022500547BIB027","doi-asserted-by":"crossref","first-page":"329","DOI":"10.1016\/0304-3975(94)90131-7","volume":"123","author":"Iwata S.","year":"1994","journal-title":"Theoretical Computer Science"},{"key":"S0219622022500547BIB028","series-title":"Ellis Horwood series in artificial intelligence; Vol. 1","first-page":"113","volume-title":"Heuristic programming in artificial intelligence: The first Computer Olympiad","volume":"1","author":"Uiterwijk J. W. H. M."},{"key":"S0219622022500547BIB029","doi-asserted-by":"crossref","first-page":"59","DOI":"10.1007\/BF00288536","volume":"13","author":"Reisch S.","year":"1980","journal-title":"Acta Informatica"},{"key":"S0219622022500547BIB030","doi-asserted-by":"crossref","first-page":"155","DOI":"10.1109\/SSCI44817.2019.9002814","volume-title":"2019 IEEE Symp. Series on Computational Intelligence (SSCI)","author":"Wang H.","year":"2019"},{"issue":"3","key":"S0219622022500547BIB031","doi-asserted-by":"crossref","first-page":"189","DOI":"10.3233\/ICG-1997-20311","volume":"20","author":"Buro M.","year":"1997","journal-title":"ICGA Journal"},{"key":"S0219622022500547BIB032","doi-asserted-by":"publisher","DOI":"10.1109\/TEVC.2005.843750"},{"key":"S0219622022500547BIB033","first-page":"1","volume-title":"2014 IEEE Conf. Computational Intelligence and Games","author":"Thill M.","year":"2014"},{"key":"S0219622022500547BIB034","first-page":"051","volume":"7","author":"Zhang M. L.","year":"2012","journal-title":"Journal of Computer Applications"},{"key":"S0219622022500547BIB035","first-page":"672","volume-title":"IJCAI","author":"Banerjee B.","year":"2007"},{"key":"S0219622022500547BIB036","first-page":"138","volume-title":"Artificial Intelligence. BNAIC 2018. Communications in Computer and Information Science","volume":"1021","author":"Wang H.","year":"2019"},{"key":"S0219622022500547BIB037","first-page":"448","volume-title":"Proc. 32nd Int. Conf. Machine Learning","volume":"37","author":"Ioffe S.","year":"2015"},{"issue":"1","key":"S0219622022500547BIB039","first-page":"1929","volume":"15","author":"Srivastava N.","year":"2014","journal-title":"The Journal of Machine Learning Research"},{"key":"S0219622022500547BIB040","doi-asserted-by":"crossref","first-page":"113","DOI":"10.1007\/978-3-540-87608-3_11","volume-title":"Int. Conf. Computers and Games","author":"Coulom R.","year":"2008"},{"issue":"3","key":"S0219622022500547BIB041","doi-asserted-by":"crossref","first-page":"585","DOI":"10.1007\/s11047-018-9685-y","volume":"17","author":"Emmerich M. T. M.","year":"2018","journal-title":"Natural Computing"},{"key":"S0219622022500547BIB044","doi-asserted-by":"publisher","DOI":"10.1016\/j.dss.2020.113429"},{"key":"S0219622022500547BIB045","first-page":"1","volume-title":"IEEE Transactions on Cybernetics","author":"Li T.","year":"2021"},{"issue":"1","key":"S0219622022500547BIB046","doi-asserted-by":"crossref","first-page":"239","DOI":"10.1016\/j.ejor.2017.07.030","volume":"265","author":"Chao X.","year":"2018","journal-title":"European Journal of Operational Research"},{"key":"S0219622022500547BIB047","first-page":"11","volume-title":"Proc. 4th Annual Conf. Genetic and Evolutionary Computation","author":"Birattari M.","year":"2002"},{"key":"S0219622022500547BIB048","doi-asserted-by":"crossref","first-page":"507","DOI":"10.1007\/978-3-642-25566-3_40","volume-title":"Int. Conf. Learning and Intelligent Optimization","author":"Hutter F.","year":"2011"}],"container-title":["International Journal of Information Technology & Decision Making"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.worldscientific.com\/doi\/pdf\/10.1142\/S0219622022500547","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,17]],"date-time":"2023-02-17T06:49:06Z","timestamp":1676616546000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.worldscientific.com\/doi\/10.1142\/S0219622022500547"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,9,24]]},"references-count":40,"journal-issue":{"issue":"02","published-print":{"date-parts":[[2023,3]]}},"alternative-id":["10.1142\/S0219622022500547"],"URL":"https:\/\/doi.org\/10.1142\/s0219622022500547","relation":{},"ISSN":["0219-6220","1793-6845"],"issn-type":[{"value":"0219-6220","type":"print"},{"value":"1793-6845","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,9,24]]}}}