Abstract
In this paper, we consider the problem of multi-agent navigation in partially observable grid environments. This problem is challenging for centralized planning approaches as they typically rely on full knowledge of the environment. To this end, we suggest utilizing the reinforcement learning approach when the agents first learn the policies that map observations to actions and then follow these policies to reach their goals. To tackle the challenge associated with learning cooperative behavior, i.e. in many cases agents need to yield to each other to accomplish a mission, we use a mixing Q-network that complements learning individual policies. In the experimental evaluation, we show that such approach leads to plausible results and scales well to a large number of agents.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Barer, M., Sharon, G., Stern, R., Felner, A.: Suboptimal variants of the conflict-based search algorithm for the multi-agent pathfinding problem. In: Proceedings of The 7th Annual Symposium on Combinatorial Search (SoCS 2014), pp. 19–27 (July 2014)
Boyarski, E., et al.: ICBS: Improved conflict-based search algorithm for multi-agent pathfinding. In: Proceedings of The 24th International Joint Conference on Artificial Intelligence (IJCAI 2015), pp. 740–746 (2015)
Čáp, M., Novák, P., Kleiner, A., Selecký, M.: Prioritized planning algorithms for trajectory coordination of multiple mobile robots. IEEE Trans. Autom. Sci. Eng. 12(3), 835–849 (2015)
Ha, D., Dai, A., Le, Q.V.: Hypernetworks. In: Proceedings of the International Conference on Learning Representations (2016)
Felner, A., Li, J., Boyarski, E., Ma, H., Cohen, L., Kumar, T. S., Koenig, S.: Adding heuristics to conflict-based search for multi-agent path finding. In: Proceedings of the 28th International Conference on Automated Planning and Scheduling (ICAPS 2018), pp. 83–87 (2018)
Gorodetskiy, A., Shlychkova, A., Panov, A.I.: Delta schema network in model-based reinforcement learning. In: Goertzel, B., Panov, A.I., Potapov, A., Yampolskiy, R. (eds.) AGI 2020. LNCS (LNAI), vol. 12177, pp. 172–182. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-52152-3_18
Martinson, M., Skrynnik, A., Panov, A.I.: Navigating autonomous vehicle at the road intersection simulator with reinforcement learning. In: Kuznetsov, S.O., Panov, A.I., Yakovlev, K.S. (eds.) RCAI 2020. LNCS (LNAI), vol. 12412, pp. 71–84. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59535-7_6
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Panov, A.I., Yakovlev, K.S., Suvorov, R.: Grid path planning with deep reinforcement learning: preliminary results. Procedia Comput. Sci. 123, 347–353 (2018)
Rashid, T., Samvelyan, M., Schroeder, C., Farquhar, G., Foerster, J., Whiteson, S.: Qmix: monotonic value function factorisation for deep multi-agent reinforcement learning. In: International Conference on Machine Learning, PMLR, pp. 4295–4304 (2018)
Sartoretti, G., et al.: Primal: pathfinding via reinforcement and imitation multi-agent learning. IEEE Robot. Autom. Lett. 4(3), 2378–2385 (2019)
Schrittwieser, J., Hubert, T., Mandhane, A., Barekatain, M., Antonoglou, I., Silver, D.: Online and Offline Reinforcement Learning by Planning with a Learned Model (2021)
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
Sharon, G., Stern, R., Felner, A., Sturtevant., N.R.: Conflict-based search for optimal multiagent path finding. Artif. Intell. J. 218, 40–66 (2015)
Shikunov, M., Panov, A.I.: Hierarchical reinforcement learning approach for the road intersection task. In: Samsonovich, A.V. (ed.) BICA 2019. AISC, vol. 948, pp. 495–506. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-25719-4_64
Sunehag, P., et al.: Value-decomposition networks for cooperative multi-agent learning based on team reward. In: Proceedings of the 17th International Conference on Autonomous Agents and Multiagent Systems (2017)
Surynek, P., Felner, A., Stern, R., Boyarski, E.: Efficient sat approach to multi-agent path finding under the sum of costs objective. In: Proceedings of the 22nd European Conference on Artificial Intelligence (ECAI 2016), pp. 810–818. IOS Press (2016)
van den Berg, J., Guy, S.J., Lin, M., Manocha, D.: Reciprocal n-body collision avoidance. In: Pradalier, C., Siegwart, R., Hirzinger, G. (eds.) Robotics Research. Springer Tracts in Advanced Robotics, vol. 70, pp. 3–19. Springer, Berlin (2011). https://doi.org/10.1007/978-3-642-19457-3_1
Yakovlev, K., Andreychuk, A., Vorobyev, V.: Prioritized multi-agent path finding for differential drive robots. In: Proceedings of the 2019 European Conference on Mobile Robots (ECMR 2019), IEEE, pp. 1–6 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Davydov, V., Skrynnik, A., Yakovlev, K., Panov, A. (2021). Q-Mixing Network for Multi-agent Pathfinding in Partially Observable Grid Environments. In: Kovalev, S.M., Kuznetsov, S.O., Panov, A.I. (eds) Artificial Intelligence. RCAI 2021. Lecture Notes in Computer Science(), vol 12948. Springer, Cham. https://doi.org/10.1007/978-3-030-86855-0_12
Download citation
DOI: https://doi.org/10.1007/978-3-030-86855-0_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86854-3
Online ISBN: 978-3-030-86855-0
eBook Packages: Computer ScienceComputer Science (R0)