Abstract
In this work we look at the recent results in policy gradient learning in a general-sum game scenario, in the form of two algorithms, IGA and WoLF-IGA. We address the drawbacks in convergence properties of these algorithms, and propose a more accurate version of WoLF-IGA that is guaranteed to converge to Nash Equilibrium policies in self-play (or against an IGA learner). We also present a control theoretic interpretation of variable learning rate which not only justifies WoLF-IGA, but also shows it to achieve fastest convergence under some constraints. Finally we derive optimal learning rates for fastest convergence in practical simulations.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Bikramjit Banerjee, Sandip Sen, and Jing Peng. Fast concurrent reinforcement learners. In Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence, Seattle, WA, 2001.
M. Bowling and M. Veloso. Multiagent learning using a variable learning rate. Artificial Intelligence, 2002. In Press.
Caroline Claus and Craig Boutilier. The dynamics of reinforcement learning in cooperative multiagent systems. In Proceedings of the Fifteenth National Conference on Artificial Intelligence, pages 746–752, Menlo Park, CA, 1998. AAAI Press/MIT Press.
J. Hu and M. P. Wellman. Multiagent reinforcement learning: Theoretical framework and an algorithm. In Proc. of the 15th Int. Conf. on Machine Learning (ML’98), pages 242–250, San Francisco, CA, 1998. Morgan Kaufmann.
M. L. Littman. Markov games as a framework for multi-agent reinforcement learning. In Proc. of the 11th Int. Conf. on Machine Learning, pages 157–163, San Mateo, CA, 1994. Morgan Kaufmann.
John F. Nash. Non-cooperative games. Annals of Mathematics, 54:286–295, 1951.
G. Owen. Game Theory. Academic Press, UK, 1995.
S. Singh, M. Kearns, and Y. Mansour. Nash convergence of gradient dynamics in general-sum games. In Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence, pages 541–548, 2000.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Banerjee, B., Peng, J. (2002). Convergent Gradient Ascent in General-Sum Games. In: Elomaa, T., Mannila, H., Toivonen, H. (eds) Machine Learning: ECML 2002. ECML 2002. Lecture Notes in Computer Science(), vol 2430. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36755-1_1
Download citation
DOI: https://doi.org/10.1007/3-540-36755-1_1
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44036-9
Online ISBN: 978-3-540-36755-0
eBook Packages: Springer Book Archive