Learning Deep Decentralized Policy Network by Collective Rewards for Real-Time Combat Game

Learning Deep Decentralized Policy Network by Collective Rewards for Real-Time Combat Game

Peixi Peng, Junliang Xing, Lili Cao, Lisen Mu, Chang Huang

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence
Main track. Pages 1305-1311. https://doi.org/10.24963/ijcai.2019/181

The task of real-time combat game is to coordinate multiple units to defeat their enemies controlled by the given opponent in a real-time combat scenario. It is difficult to design a high-level Artificial Intelligence (AI) program for such a task due to its extremely large state-action space and real-time requirements. This paper formulates this task as a collective decentralized partially observable Markov decision process, and designs a Deep Decentralized Policy Network (DDPN) to model the polices. To train DDPN effectively, a novel two-stage learning algorithm is proposed which combines imitation learning from opponent and reinforcement learning by no-regret dynamics. Extensive experimental results on various combat scenarios indicate that proposed method can defeat different opponent models and significantly outperforms many state-of-the-art approaches.
Keywords:
Heuristic Search and Game Playing: Game Playing
Machine Learning Applications: Applications of Reinforcement Learning
Heuristic Search and Game Playing: Game Playing and Machine Learning
Machine Learning Applications: Game Playing