Abstract
Dynamic programming offers an exact, general solution method for completely known sequential decision problems, formulated as Markov Decision Processes (MDP), with a finite number of states. Recently, there has been a great amount of interest in the adaptive version of the problem, where the task to be solved is not completely known a priori. In such a case, an agent has to acquire the necessary knowledge through learning, while simultaneously solving the optimal control or decision problem. A large variety of algorithms, variously known as Adaptive Dynamic Programming (ADP) or Reinforcement Learning (RL), has been proposed in the literature. However, almost invariably such algorithms suffer from slow convergence in terms of the number of experiments needed. In this paper we investigate how the learning speed can be considerably improved by exploiting and combining knowledge accumulated by multiple agents. These agents operate in the same task environment but follow possibly different trajectories. We discuss methods of combining the knowledge structures associated with the multiple agents and different strategies (with varying overheads) for knowledge communication between agents. Results of simulation experiments are also presented to indicate that combining multiple learning agents is a promising direction to improve learning speed. The method also performs significantly better than some of the fastest MDP learning algorithms such as the prioritized sweeping.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Barto, A.G., Bradtke, S.J., Singh, S.P.: Learning to Act using Real-Time Dynamic Programming. Artificial Intelligence, Special Volume; Computational Research on Interaction and Agency 72(1), 81–138 (1995)
Laskari, Y., Metral, M., Maes, P.: Collaborative interface agents. In: Proceedings of AAAI conference (1994)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: A survey. Journal of Artificial Intelligence Research 4, 237–285 (1996)
Narendra, K.S., Balakrishnan, J.: Adaptive control using multiple models. IEEE Transactions on Automatic Control 42(2) (February 1997)
Tesauro, G.: TD-Gammon, a self-teaching backgammon program, achieves master level play. Neural Computation 6(2), 215–219 (1994)
Bellman, R.: Dynamic Programming. Princeton University Press, Princeton (1957)
Moore, A.W., Atkeson, C.G.: Prioritized Sweeping: Reinforcement Learning with Less Data and Less Real Time. Machine Learning 13 (1993)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Mukhopadhyay, S., Varghese, J. (2000). Multi-agent Adaptive Dynamic Programming. In: Cairó, O., Sucar, L.E., Cantu, F.J. (eds) MICAI 2000: Advances in Artificial Intelligence. MICAI 2000. Lecture Notes in Computer Science(), vol 1793. Springer, Berlin, Heidelberg. https://doi.org/10.1007/10720076_52
Download citation
DOI: https://doi.org/10.1007/10720076_52
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67354-5
Online ISBN: 978-3-540-45562-2
eBook Packages: Springer Book Archive