Learning Deep Decentralized Policy Network by Collective Rewards for Real-Time Combat Game

	Learning Deep Decentralized Policy Network by Collective Rewards for Real-Time Combat Game
	Peixi Peng1 ; Junliang Xing1 ; Lili Cao 1; Lisen Mu 2; Chang Huang 2
	2019
会议日期	August 10-16, 2019
会议地点	Macao, China
关键词	Multi-agent Learning Deep Decentralized Policy Network Real-time Combat Game
英文摘要	The task of real-time combat game is to coordinate multiple units to defeat their enemies controlled by the given opponent in a real-time combat scenario. It is difficult to design a high-level Artificial Intelligence (AI) program for such a task due to its extremely large state-action space and real-time requirements. This paper formulates this task as a collective decentralized partially observable Markov decision process, and designs a Deep Decentralized Policy Network (DDPN) to model the polices. To train DDPN effectively, a novel two-stage learning algorithm is proposed which combines imitation learning from opponent and reinforcement learning by no-regret dynamics. Extensive experimental results on various combat scenarios indicate that proposed method can defeat different opponent models and significantly outperforms many state-of-the-art approaches.
内容类型	会议论文
源URL	[http://ir.ia.ac.cn/handle/173211/26156]
专题	中国科学院自动化研究所
通讯作者	Junliang Xing
作者单位	1.Institute of Automation, Chinese Academy of Sciences 2.Horizon Robotics
推荐引用方式 GB/T 7714	Peixi Peng,Junliang Xing,Lili Cao,et al. Learning Deep Decentralized Policy Network by Collective Rewards for Real-Time Combat Game[C]. 见:. Macao, China. August 10-16, 2019.