Abstract
This paper considers a new variant of the pursuit-evasion problem, called the cooperative target defense problem with three agents (attacker, targeter, and defender) in a 3D space. The targeter tries to fly as quickly as possible from a starting point to the terminal, while the defender seeks to protect it from the attacker. The problem is difficult to solve under traditional game theory methods, while deep reinforcement learning (DRL) has shown strong adaptability in these complex and higher-dimensional tasks. Inspired by the successful applications of Proximal Policy Optimization (PPO), this paper proposes a PPO-based algorithm for the problem, intending to derive the optimal behavioral policies for both sides. We design the corresponding state space, action space, and rewards of the agents. Three kinds of reward functions are proposed for the attacker and compared by experimental results. Our study provides a good foundation for the cooperative target defense problem.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Andrychowicz, O.M., et al.: Learning dexterous in-hand manipulation. Int. J. Robot. Res. 39(1), 3–20 (2020)
Degrave, J., et al.: Magnetic control of tokamak plasmas through deep reinforcement learning. Nature 602(7897), 414–419 (2022)
Fu, H., Liu, H.H.T.: Optimal solution of a target defense game with two defenders and a faster intrude. Unmanned Syst. 9(03), 247–262 (2021)
Givigi, S.N., Schwartz, H.M., Lu, X.: A reinforcement learning adaptive fuzzy controller for differential games. J. Intell. Rob. Syst. 59(1), 3–30 (2010)
Hinton, G.E., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)
Kong, W., Zhou, D., Yang, Z., Zhao, Y., Zhang, K.: UAV autonomous aerial combat maneuver strategy generation with observation error based on state-adversarial deep deterministic policy gradient and inverse reinforcement learning. Electronics 9(7), 1121 (2020)
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)
Liang, L., Deng, F., Lu, M., Chen, J.: Analysis of role switch for cooperative target defense differential game. IEEE Trans. Autom. Control 66(2), 902–909 (2020)
Liang, L., Deng, F., Peng, Z., Li, X., Zha, W.: A differential game for cooperative target defense. Automatica 102, 58–71 (2019)
Lin, B., Qiao, L., Jia, Z., Sun, Z., Sun, M., Zhang, W.: Control strategies for target-attacker-defender games of USVs. In: 2021 6th International Conference on Automation, Control and Robotics Engineering (CACRE), pp. 191–198 (2021). https://doi.org/10.1109/CACRE52464.2021.9501329
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
Sun, W., Tsiotras, P., Lolla, T., Subramani, D.N., Lermusiaux, P.F.: Multiple-pursuer/one-evader pursuit-evasion game in dynamic flowfields. J. Guid. Control. Dyn. 40(7), 1627–1637 (2017)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (2018)
Tang, X., Ye, D., Huang, L., Sun, Z., Sun, J.: Pursuit-evasion game switching strategies for spacecraft with incomplete-information. Aerosp. Sci. Technol. 119, 107112 (2021)
Vinyals, O., et al.: Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575(7782), 350–354 (2019)
Von Moll, A., Casbeer, D.W., Garcia, E., Milutinović, D.: Pursuit-evasion of an evader by multiple pursuers. In: 2018 International Conference on Unmanned Aircraft Systems (ICUAS), pp. 133–142. IEEE (2018)
Yu, C., Velu, A., Vinitsky, E., Wang, Y., Bayen, A., Wu, Y.: The surprising effectiveness of PPO in cooperative, multi-agent games. arXiv preprint arXiv:2103.01955 (2021)
Zhou, Z., Zhang, W., Ding, J., Huang, H., Stipanović, D.M., Tomlin, C.J.: Cooperative pursuit with voronoi partitions. Automatica 72, 64–72 (2016)
Acknowledgment
This work was supported by the National Natural Science Foundation of China under Grant 61973244 and Grant 61573277. It is also supported by the open fund of CETC Key Laboratory of Data Link Technology (CLDL-20202101-1).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Xiong, Y., Wang, Z., Ke, L. (2022). A Deep Reinforcement Learning Approach for Cooperative Target Defense. In: Tan, Y., Shi, Y. (eds) Data Mining and Big Data. DMBD 2022. Communications in Computer and Information Science, vol 1744. Springer, Singapore. https://doi.org/10.1007/978-981-19-9297-1_2
Download citation
DOI: https://doi.org/10.1007/978-981-19-9297-1_2
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-9296-4
Online ISBN: 978-981-19-9297-1
eBook Packages: Computer ScienceComputer Science (R0)