Reinforcement Learning in Cyclic Environmental Changes for Agents in Non-Communicative Environments: A Theoretical Approach

Uwano, Fumito; Takadama, Keiki

doi:10.1007/978-3-031-40878-6_9

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14127))

Included in the following conference series:

International Workshop on Explainable, Transparent Autonomous Agents and Multi-Agent Systems

359 Accesses

Abstract

Multi-agent Reinforcement Learning is required to adapt to the dynamic of the environment by transferring the learning outcomes in the case of the non-communicative and dynamic environment. Profit minimizing reinforcement learning with the oblivion of memory (PMRL-OM) enables agents to learn a co-operative policy using learning dynamics instead of communication information. It enables the agents to adapt to the dynamics of the other agents’ behaviors without any design of the relationship or communication rules between agents. It helps easily to add robots to the system with keeping co-operation in a multi-robot system. However, it is available for long-term dynamic changes, but not for the short-them changes because it used the outcome with enough trials. This paper picked up cyclic environmental changes as short-term changes and aimed to improve the performance in cyclic environmental changes and analyze theoretically the rationality of this approach. Specifically, we extend PMRL-OM based on an analysis of the PMRL-OM approach. Our experiments evaluated the performance of the proposed method for a navigation task in a maze-type environment undergoing cyclic environmental change, with the results showing that the proposed method gave an enhanced performance. Our method also enabled the adaptation to cyclic change to occur sooner than for the existing PMRL-OM method. In addition, the theoretical analysis not only investigates the PMRL-OM rationality but also suggests optimal parameter values for the proposed method. The proposed method contributed to XAI by showing the precise profits of the agents and the approach with rationality.

This research was supported by JSPS KAKENHI Grant Number JP21K17807.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 6634; Price includes VAT (Japan)

Softcover Book: JPY 8293; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Estimation of the Change of Agents Behavior Strategy Using State-Action History

Strategy for Learning Cooperative Behavior with Local Information for Multi-agent Systems

Comparative Analysis of Classic and Reinforcement Learning Approaches for Robot Navigation in Dynamic Environments

References

Bargiacchi, E., Verstraeten, T., Roijersk, D.M., Nowé, A., van Hasselt, H.: Learning to coordinate with coordination graphs in repeated single-stage multi-agent decision problems. In: The 35th International Conference on Machine Learning, vol. 80, 482–490 (2018)
Google Scholar
Chen, L., et al.: Multiagent path finding using deep reinforcement learning coupled with hot supervision contrastive loss. IEEE Trans. Industr. Electron. 70(7), 7032–7040 (2023). https://doi.org/10.1109/TIE.2022.3206745
Article Google Scholar
Ding, S., Aoyama, H., Lin, D.: Combining multiagent reinforcement learning and search method for drone delivery on a non-grid graph. In: Advances in Practical Applications of Agents, Multi-Agent Systems, and Complex Systems Simulation. The PAAMS Collection: 20th International Conference, PAAMS 2022, L’Aquila, Italy, July 13–15, 2022, Proceedings, pp. 112–126. Springer-Verlag, Berlin, Heidelberg (2022)
Google Scholar
Du, Y., et al.: Learning correlated communication topology in multi-agent reinforcement learning. In: Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems, pp. 456–464. AAMAS ’21, International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC (2021)
Google Scholar
Grefenstette, J.J.: Credit assignment in rule discovery systems based on genetic algorithms. Mach. Learn. 3(2), 225–245 (1988). https://doi.org/10.1023/A:1022614421909
Article Google Scholar
Li, J., Shi, H., Hwang, K.S.: Using fuzzy logic to learn abstract policies in large-scale multiagent reinforcement learning. IEEE Trans. Fuzzy Syst. 30(12), 5211–5224 (2022). https://doi.org/10.1109/TFUZZ.2022.3170646
Article Google Scholar
Raileanu, R., Denton, E., Szlam, A., Fergus, R.: Modeling others using oneself in multi-agent reinforcement learning. In: Dy, J., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 80, pp. 4257–4266. PMLR (10–15 Jul 2018). https://proceedings.mlr.press/v80/raileanu18a.html
Rashid, T., Samvelyan, M., Schroeder, C., Farquhar, G., Foerster, J., Whiteson, S.: QMIX: Monotonic value function factorisation for deep multi-agent reinforcement learning. In: The 35th International Conference on Machine Learning, vol. 80, pp. 4295–4304 (2018). http://proceedings.mlr.press/v80/rashid18a.html
Rashid, T., Farquhar, G., Peng, B., Whiteson, S.: Weighted qmix: Expanding monotonic value function factorisation for deep multi-agent reinforcement learning. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. NIPS’20, Curran Associates Inc., Red Hook, NY, USA (2020)
Google Scholar
Sigaud, O., Buffet, O.: Markov Decision Processes in Artificial Intelligence. Wiley-IEEE Press (2010)
Google Scholar
Uwano, F., Takadama, K.: Utilizing observed information for no-communication multi-agent reinforcement learning toward cooperation in dynamic environment. SICE J. Contr. Measure. Syst. Integr. 12(5), 199–208 (2019). https://doi.org/10.9746/jcmsi.12.199
Article Google Scholar
Uwano, F., Tatebe, N., Tajima, Y., Nakata, M., Kovacs, T., Takadama, K.: Multi-agent cooperation based on reinforcement learning with internal reward in maze problem. SICE J. Contr., Measure. Syst. Integr. 11(4), 321–330 (2018). https://doi.org/10.9746/jcmsi.11.321
Article Google Scholar
Uwano, F., Takadama, K.: Directionality reinforcement learning to operate multi-agent system without communication (2021). 10.48550/ARXIV.2110.05773, arXiv:2110.05773
Zhou, Z., Xu, H.: Decentralized adaptive optimal tracking control for massive autonomous vehicle systems with heterogeneous dynamics: A stackelberg game. IEEE Trans. Neural Netw. Learn. Syst. 32(12), 5654–5663 (2021). https://doi.org/10.1109/TNNLS.2021.3100417
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Okayama University, 3-1-1 Tsushima-naka, Kita-ku, Okayama, Japan
Fumito Uwano
The University of Electro-Communications, 1-5-1 Chofugaoka, Chofu-shi, Tokyo, Japan
Keiki Takadama

Authors

Fumito Uwano
View author publications
You can also search for this author in PubMed Google Scholar
Keiki Takadama
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fumito Uwano .

Editor information

Editors and Affiliations

University of Applied Sciences and Arts Western Switzerland, Sierre, Switzerland
Davide Calvaresi
Luxembourg Institute of Science and Technology, Esch-sur-Alzette, Luxembourg
Amro Najjar
Alma Mater Studiorum, Università di Bologna, Bologna, Italy
Andrea Omicini
Ozyegin University, Istanbul, Türkiye
Reyhan Aydogan
Alma Mater Studiorum, Università di Bologna, Bologna, Italy
Rachele Carli
Alma Mater Studiorum, Università di Bologna, Bologna, Italy
Giovanni Ciatto
Université de Technologie de Belfort-Montbéliard, Belfort Cedex, France
Yazan Mualla
Umeå University, Umeå, Sweden
Kary Främling

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Uwano, F., Takadama, K. (2023). Reinforcement Learning in Cyclic Environmental Changes for Agents in Non-Communicative Environments: A Theoretical Approach. In: Calvaresi, D., et al. Explainable and Transparent AI and Multi-Agent Systems. EXTRAAMAS 2023. Lecture Notes in Computer Science(), vol 14127. Springer, Cham. https://doi.org/10.1007/978-3-031-40878-6_9

Download citation

DOI: https://doi.org/10.1007/978-3-031-40878-6_9
Published: 05 September 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-40877-9
Online ISBN: 978-3-031-40878-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Reinforcement Learning in Cyclic Environmental Changes for Agents in Non-Communicative Environments: A Theoretical Approach