Abstract
Monte Carlo Tree Search (MCTS) has become a widely popular sampled-based search algorithm for two-player games with perfect information. When actions are chosen simultaneously, players may need to mix between their strategies. In this paper, we discuss the adaptation of MCTS to simultaneous move games. We introduce a new algorithm, Online Outcome Sampling (OOS), that approaches a Nash equilibrium strategy over time. We compare both head-to-head performance and exploitability of several MCTS variants in Goofspiel. We show that regret matching and OOS perform best and that all variants produce less exploitable strategies than UCT.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.E.: Gambling in a rigged casino: the adversarial multi-armed bandit problem. In: Proceedings of the 36th Annual Symposium on Foundations of Computer Science, pp. 322–331 (1995)
Auger, D.: Multiple tree for partially observable Monte-Carlo tree search. In: Di Chio, C., et al. (eds.) EvoApplications 2011, Part I. LNCS, vol. 6624, pp. 53–62. Springer, Heidelberg (2011)
Bosansky, B., Lisy, V., Cermak, J., Vitek, R., Pechoucek, M.: Using double-oracle method and serialized alpha-beta search for pruning in simultaneous moves games. In: Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence (IJCAI), pp. 48–54 (2013)
Browne, C.B., Powley, E., Whitehouse, D., Lucas, S.M., Cowling, P.I., Rohlfshagen, P., Tavener, S., Perez, D., Samothrakis, S., Colton, S.: A survey of Monte Carlo tree search methods. IEEE Trans. Comput. Intell. AI Games 4(1), 1–43 (2012)
Buro, M.: Solving the Oshi-Zumo game. In: Van Den Herik, H.J., Iida, H., Heinz, E.A. (eds.) Advances in Computer Games. IFIP, vol. 135, pp. 361–366. Springer, Heidelberg (2003)
Cazenave, T., Saffidine, A.: Score bounded Monte-Carlo tree search. In: van den Herik, H.J., Iida, H., Plaat, A. (eds.) CG 2010. LNCS, vol. 6515, pp. 93–104. Springer, Heidelberg (2011)
Chaslot, G.M.J.B., Winands, M.H.M., Uiterwijk, J.W.H.M., van den Herik, H.J., Bouzy, B.: Progressive strategies for Monte-Carlo tree search. New Math. Nat. Comput. 4(3), 343–357 (2008)
Couetoux, A., Hoock, J.-B., Sokolovska, N., Teytaud, O., Bonnard, N.: Continuous upper confidence trees. In: Coello, C.A.C. (ed.) LION 2011. LNCS, vol. 6683, pp. 433–445. Springer, Heidelberg (2011)
Coulom, R.: Efficient selectivity and backup operators in Monte-Carlo tree search. In: van den Herik, H.J., Ciancarini, P., Donkers, H.J. (eds.) CG 2006. LNCS, vol. 4630, pp. 72–83. Springer, Heidelberg (2007)
Cowling, P.I., Powley, E.J., Whitehouse, D.: Information set Monte Carlo tree search. IEEE Trans. Comput. Intell. AI Games 4(2), 120–143 (2012)
Finnsson, H.: Cadia-player: a general game playing agent. Master’s thesis, Reykjavík University (2007)
Finnsson, H.: Simulation-based general game playing. Ph.D. thesis, Reykjavík University (2012)
Gelly, S., Kocsis, L., Schoenauer, M., Sebag, M., Silver, D., Szepesvári, C., Teytaud, O.: The grand challenge of computer go: Monte Carlo tree search and extensions. Commun. ACM 55(3), 106–113 (2012)
Gibson, R., Lanctot, M., Burch, N., Szafron, D., Bowling, M.: Generalized sampling and variance in counterfactual regret minimization. In: Proceedings of the Twenty-Sixth Conference on Artificial Intelligence (AAAI-12), pp. 1355–1361 (2012)
Hart, S., Mas-Colell, A.: A simple adaptive procedure leading to correlated equilibrium. Econometrica 68(5), 1127–1150 (2000)
Kocsis, L., Szepesvári, C.: Bandit based Monte-Carlo planning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 282–293. Springer, Heidelberg (2006)
Lanctot, M., Waugh, K., Bowling, M., Zinkevich, M.: Sampling for regret minimization in extensive games. In: Advances in Neural Information Processing Systems (NIPS 2009), pp. 1078–1086 (2009)
Lanctot, M.: Monte Carlo sampling and regret minimization for equilibrium computation and decision-making in large extensive form games. Ph.D. thesis, Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada (2013)
Littman, M.L.: Markov games as a framework for multi-agent reinforcement learning. In. Proceedings of the Eleventh International Conference on Machine Learning, pp. 157–163. Morgan Kaufmann (1994)
Perick, P., St-Pierre, D.L., Maes, F., Ernst, D.: Comparison of different selection strategies in Monte-Carlo tree search for the game of Tron. In: Proceedings of the IEEE Conference on Computational Intelligence and Games (CIG), pp. 242–249 (2012)
Rhoads, G.C., Bartholdi, L.: Computer solution to the game of pure strategy. Games 3(4), 150–156 (2012)
Ross, S.M.: Goofspiel – the game of pure strategy. J. Appl. Probab. 8(3), 621–625 (1971)
Saffidine, A., Finnsson, H., Buro, M.: Alpha-beta pruning for games with simultaneous moves. In: Proceedings of the Thirty-Second Conference on Artificial Intelligence (AAAI-12), pp. 556–562 (2012)
Samothrakis, S., Robles, D., Lucas, S.M.: A UCT agent for Tron: initial investigations. In: Proceedings of the 2010 IEEE Symposium on Computational Intelligence and Games (CIG), pp. 365–371 (2010)
Shafiei, M., Sturtevant, N.R., Schaeffer, J.: Comparing UCT versus CFR in simultaneous games. In: Proceeding of the IJCAI Workshop on General Game-Playing (GIGA), pp. 75–82 (2009)
Teytaud, O., Flory, S.: Upper confidence trees with short term partial information. In: Di Chio, C., et al. (eds.) EvoApplications 2011, Part I. LNCS, vol. 6624, pp. 153–162. Springer, Heidelberg (2011)
Winands, M.H.M., Björnsson, Y., Saito, J.-T.: Monte-Carlo tree search solver. In: van den Herik, H.J., Xu, X., Ma, Z., Winands, M.H.M. (eds.) CG 2008. LNCS, vol. 5131, pp. 25–36. Springer, Heidelberg (2008)
Zinkevich, M., Johanson, M., Bowling, M., Piccione, C.: Regret minimization in games with incomplete information. In: Advances in Neural Information Processing Systems 20 (NIPS 2007), pp. 905–912 (2008)
Acknowledgments
We would like to thank Laurent Bartholdi for sharing his code for solving Goofspiel. We would also like to thank Olivier Teytaud for advice in optimizing Exp3. This work is partially funded by the Netherlands Organisation for Scientific Research (NWO) in the framework of the project Go4Nature, grant number 612.000.938 and the Czech Science Foundation, grant no. P202/12/2054.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Lanctot, M., Lisý, V., Winands, M.H.M. (2014). Monte Carlo Tree Search in Simultaneous Move Games with Applications to Goofspiel. In: Cazenave, T., Winands, M., Iida, H. (eds) Computer Games. CGW 2013. Communications in Computer and Information Science, vol 408. Springer, Cham. https://doi.org/10.1007/978-3-319-05428-5_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-05428-5_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-05427-8
Online ISBN: 978-3-319-05428-5
eBook Packages: Computer ScienceComputer Science (R0)