Q-Mixing Network for Multi-agent Pathfinding in Partially Observable Grid Environments

Davydov, Vasilii; Skrynnik, Alexey; Yakovlev, Konstantin; Panov, Aleksandr

doi:10.1007/978-3-030-86855-0_12

Vasilii Davydov^12,13,
Alexey Skrynnik¹¹,
Konstantin Yakovlev^11,12 &
…
Aleksandr Panov^11,12

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12948))

Included in the following conference series:

Russian Conference on Artificial Intelligence

834 Accesses

Abstract

In this paper, we consider the problem of multi-agent navigation in partially observable grid environments. This problem is challenging for centralized planning approaches as they typically rely on full knowledge of the environment. To this end, we suggest utilizing the reinforcement learning approach when the agents first learn the policies that map observations to actions and then follow these policies to reach their goals. To tackle the challenge associated with learning cooperative behavior, i.e. in many cases agents need to yield to each other to accomplish a mission, we use a mixing Q-network that complements learning individual policies. In the experimental evaluation, we show that such approach leads to plausible results and scales well to a large number of agents.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 9723; Price includes VAT (Japan)

Softcover Book: JPY 12154; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Multi-agent Q-learning Based Navigation in an Unknown Environment

Pheromone Based Independent Reinforcement Learning for Multiagent Navigation

Decentralized Unlabeled Multi-agent Navigation in Continuous Space

References

Barer, M., Sharon, G., Stern, R., Felner, A.: Suboptimal variants of the conflict-based search algorithm for the multi-agent pathfinding problem. In: Proceedings of The 7th Annual Symposium on Combinatorial Search (SoCS 2014), pp. 19–27 (July 2014)
Google Scholar
Boyarski, E., et al.: ICBS: Improved conflict-based search algorithm for multi-agent pathfinding. In: Proceedings of The 24th International Joint Conference on Artificial Intelligence (IJCAI 2015), pp. 740–746 (2015)
Google Scholar
Čáp, M., Novák, P., Kleiner, A., Selecký, M.: Prioritized planning algorithms for trajectory coordination of multiple mobile robots. IEEE Trans. Autom. Sci. Eng. 12(3), 835–849 (2015)
Article Google Scholar
Ha, D., Dai, A., Le, Q.V.: Hypernetworks. In: Proceedings of the International Conference on Learning Representations (2016)
Google Scholar
Felner, A., Li, J., Boyarski, E., Ma, H., Cohen, L., Kumar, T. S., Koenig, S.: Adding heuristics to conflict-based search for multi-agent path finding. In: Proceedings of the 28th International Conference on Automated Planning and Scheduling (ICAPS 2018), pp. 83–87 (2018)
Google Scholar
Gorodetskiy, A., Shlychkova, A., Panov, A.I.: Delta schema network in model-based reinforcement learning. In: Goertzel, B., Panov, A.I., Potapov, A., Yampolskiy, R. (eds.) AGI 2020. LNCS (LNAI), vol. 12177, pp. 172–182. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-52152-3_18
Chapter Google Scholar
Martinson, M., Skrynnik, A., Panov, A.I.: Navigating autonomous vehicle at the road intersection simulator with reinforcement learning. In: Kuznetsov, S.O., Panov, A.I., Yakovlev, K.S. (eds.) RCAI 2020. LNCS (LNAI), vol. 12412, pp. 71–84. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59535-7_6
Chapter Google Scholar
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Article Google Scholar
Panov, A.I., Yakovlev, K.S., Suvorov, R.: Grid path planning with deep reinforcement learning: preliminary results. Procedia Comput. Sci. 123, 347–353 (2018)
Article Google Scholar
Rashid, T., Samvelyan, M., Schroeder, C., Farquhar, G., Foerster, J., Whiteson, S.: Qmix: monotonic value function factorisation for deep multi-agent reinforcement learning. In: International Conference on Machine Learning, PMLR, pp. 4295–4304 (2018)
Google Scholar
Sartoretti, G., et al.: Primal: pathfinding via reinforcement and imitation multi-agent learning. IEEE Robot. Autom. Lett. 4(3), 2378–2385 (2019)
Article Google Scholar
Schrittwieser, J., Hubert, T., Mandhane, A., Barekatain, M., Antonoglou, I., Silver, D.: Online and Offline Reinforcement Learning by Planning with a Learned Model (2021)
Google Scholar
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
Sharon, G., Stern, R., Felner, A., Sturtevant., N.R.: Conflict-based search for optimal multiagent path finding. Artif. Intell. J. 218, 40–66 (2015)
Google Scholar
Shikunov, M., Panov, A.I.: Hierarchical reinforcement learning approach for the road intersection task. In: Samsonovich, A.V. (ed.) BICA 2019. AISC, vol. 948, pp. 495–506. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-25719-4_64
Chapter Google Scholar
Sunehag, P., et al.: Value-decomposition networks for cooperative multi-agent learning based on team reward. In: Proceedings of the 17th International Conference on Autonomous Agents and Multiagent Systems (2017)
Google Scholar
Surynek, P., Felner, A., Stern, R., Boyarski, E.: Efficient sat approach to multi-agent path finding under the sum of costs objective. In: Proceedings of the 22nd European Conference on Artificial Intelligence (ECAI 2016), pp. 810–818. IOS Press (2016)
Google Scholar
van den Berg, J., Guy, S.J., Lin, M., Manocha, D.: Reciprocal n-body collision avoidance. In: Pradalier, C., Siegwart, R., Hirzinger, G. (eds.) Robotics Research. Springer Tracts in Advanced Robotics, vol. 70, pp. 3–19. Springer, Berlin (2011). https://doi.org/10.1007/978-3-642-19457-3_1
Yakovlev, K., Andreychuk, A., Vorobyev, V.: Prioritized multi-agent path finding for differential drive robots. In: Proceedings of the 2019 European Conference on Mobile Robots (ECMR 2019), IEEE, pp. 1–6 (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

Federal Research Center Computer Science and Control, Russian Academy of Sciences, Artificial Intelligence Research Institute, Moscow, Russia
Alexey Skrynnik, Konstantin Yakovlev & Aleksandr Panov
Moscow Institute of Physics and Technology, Moscow, Russia
Vasilii Davydov, Konstantin Yakovlev & Aleksandr Panov
Moscow Aviation Institute, Moscow, Russia
Vasilii Davydov

Authors

Vasilii Davydov
View author publications
You can also search for this author in PubMed Google Scholar
Alexey Skrynnik
View author publications
You can also search for this author in PubMed Google Scholar
Konstantin Yakovlev
View author publications
You can also search for this author in PubMed Google Scholar
Aleksandr Panov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Aleksandr Panov .

Editor information

Editors and Affiliations

Rostov State Transport University, Rostov-on-Don, Russia
Sergei M. Kovalev
National Research University Higher School of Economics, Moscow, Russia
Sergei O. Kuznetsov
Russian Academy of Sciences, Moscow, Russia
Aleksandr I. Panov

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Davydov, V., Skrynnik, A., Yakovlev, K., Panov, A. (2021). Q-Mixing Network for Multi-agent Pathfinding in Partially Observable Grid Environments. In: Kovalev, S.M., Kuznetsov, S.O., Panov, A.I. (eds) Artificial Intelligence. RCAI 2021. Lecture Notes in Computer Science(), vol 12948. Springer, Cham. https://doi.org/10.1007/978-3-030-86855-0_12

Download citation

DOI: https://doi.org/10.1007/978-3-030-86855-0_12
Published: 04 October 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86854-3
Online ISBN: 978-3-030-86855-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics