Abstract
Multi-agent Reinforcement Learning is required to adapt to the dynamic of the environment by transferring the learning outcomes in the case of the non-communicative and dynamic environment. Profit minimizing reinforcement learning with the oblivion of memory (PMRL-OM) enables agents to learn a co-operative policy using learning dynamics instead of communication information. It enables the agents to adapt to the dynamics of the other agents’ behaviors without any design of the relationship or communication rules between agents. It helps easily to add robots to the system with keeping co-operation in a multi-robot system. However, it is available for long-term dynamic changes, but not for the short-them changes because it used the outcome with enough trials. This paper picked up cyclic environmental changes as short-term changes and aimed to improve the performance in cyclic environmental changes and analyze theoretically the rationality of this approach. Specifically, we extend PMRL-OM based on an analysis of the PMRL-OM approach. Our experiments evaluated the performance of the proposed method for a navigation task in a maze-type environment undergoing cyclic environmental change, with the results showing that the proposed method gave an enhanced performance. Our method also enabled the adaptation to cyclic change to occur sooner than for the existing PMRL-OM method. In addition, the theoretical analysis not only investigates the PMRL-OM rationality but also suggests optimal parameter values for the proposed method. The proposed method contributed to XAI by showing the precise profits of the agents and the approach with rationality.
This research was supported by JSPS KAKENHI Grant Number JP21K17807.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bargiacchi, E., Verstraeten, T., Roijersk, D.M., Nowé, A., van Hasselt, H.: Learning to coordinate with coordination graphs in repeated single-stage multi-agent decision problems. In: The 35th International Conference on Machine Learning, vol. 80, 482–490 (2018)
Chen, L., et al.: Multiagent path finding using deep reinforcement learning coupled with hot supervision contrastive loss. IEEE Trans. Industr. Electron. 70(7), 7032–7040 (2023). https://doi.org/10.1109/TIE.2022.3206745
Ding, S., Aoyama, H., Lin, D.: Combining multiagent reinforcement learning and search method for drone delivery on a non-grid graph. In: Advances in Practical Applications of Agents, Multi-Agent Systems, and Complex Systems Simulation. The PAAMS Collection: 20th International Conference, PAAMS 2022, L’Aquila, Italy, July 13–15, 2022, Proceedings, pp. 112–126. Springer-Verlag, Berlin, Heidelberg (2022)
Du, Y., et al.: Learning correlated communication topology in multi-agent reinforcement learning. In: Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems, pp. 456–464. AAMAS ’21, International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC (2021)
Grefenstette, J.J.: Credit assignment in rule discovery systems based on genetic algorithms. Mach. Learn. 3(2), 225–245 (1988). https://doi.org/10.1023/A:1022614421909
Li, J., Shi, H., Hwang, K.S.: Using fuzzy logic to learn abstract policies in large-scale multiagent reinforcement learning. IEEE Trans. Fuzzy Syst. 30(12), 5211–5224 (2022). https://doi.org/10.1109/TFUZZ.2022.3170646
Raileanu, R., Denton, E., Szlam, A., Fergus, R.: Modeling others using oneself in multi-agent reinforcement learning. In: Dy, J., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 80, pp. 4257–4266. PMLR (10–15 Jul 2018). https://proceedings.mlr.press/v80/raileanu18a.html
Rashid, T., Samvelyan, M., Schroeder, C., Farquhar, G., Foerster, J., Whiteson, S.: QMIX: Monotonic value function factorisation for deep multi-agent reinforcement learning. In: The 35th International Conference on Machine Learning, vol. 80, pp. 4295–4304 (2018). http://proceedings.mlr.press/v80/rashid18a.html
Rashid, T., Farquhar, G., Peng, B., Whiteson, S.: Weighted qmix: Expanding monotonic value function factorisation for deep multi-agent reinforcement learning. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. NIPS’20, Curran Associates Inc., Red Hook, NY, USA (2020)
Sigaud, O., Buffet, O.: Markov Decision Processes in Artificial Intelligence. Wiley-IEEE Press (2010)
Uwano, F., Takadama, K.: Utilizing observed information for no-communication multi-agent reinforcement learning toward cooperation in dynamic environment. SICE J. Contr. Measure. Syst. Integr. 12(5), 199–208 (2019). https://doi.org/10.9746/jcmsi.12.199
Uwano, F., Tatebe, N., Tajima, Y., Nakata, M., Kovacs, T., Takadama, K.: Multi-agent cooperation based on reinforcement learning with internal reward in maze problem. SICE J. Contr., Measure. Syst. Integr. 11(4), 321–330 (2018). https://doi.org/10.9746/jcmsi.11.321
Uwano, F., Takadama, K.: Directionality reinforcement learning to operate multi-agent system without communication (2021). 10.48550/ARXIV.2110.05773, arXiv:2110.05773
Zhou, Z., Xu, H.: Decentralized adaptive optimal tracking control for massive autonomous vehicle systems with heterogeneous dynamics: A stackelberg game. IEEE Trans. Neural Netw. Learn. Syst. 32(12), 5654–5663 (2021). https://doi.org/10.1109/TNNLS.2021.3100417
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Uwano, F., Takadama, K. (2023). Reinforcement Learning in Cyclic Environmental Changes for Agents in Non-Communicative Environments: A Theoretical Approach. In: Calvaresi, D., et al. Explainable and Transparent AI and Multi-Agent Systems. EXTRAAMAS 2023. Lecture Notes in Computer Science(), vol 14127. Springer, Cham. https://doi.org/10.1007/978-3-031-40878-6_9
Download citation
DOI: https://doi.org/10.1007/978-3-031-40878-6_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-40877-9
Online ISBN: 978-3-031-40878-6
eBook Packages: Computer ScienceComputer Science (R0)