Relay Hindsight Experience Replay: Self-guided continual reinforcement learning for sequential object manipulation tasks with sparse rewards

doi:10.1016/j.neucom.2023.126620

	Relay Hindsight Experience Replay: Self-guided continual reinforcement learning for sequential object manipulation tasks with sparse rewards
	Luo, Yongle 2,3; Wang, Yuxin 2,3; Dong, Kun 2,3; Zhang, Qiang 2,3; Cheng, Erkang 2,3; Sun, Zhiyong 2,3; Song, Bo 1,2,3
刊名	NEUROCOMPUTING
	2023-11-07
卷号	557
关键词	Deep reinforcement learning Robotic manipulation Continual learning Hindsight experience replay Sparse reward
ISSN号	0925-2312
DOI	10.1016/j.neucom.2023.126620
通讯作者	Song, Bo(songbo@iim.ac.cn)
英文摘要	Learning with sparse rewards remains a challenging problem in reinforcement learning (RL). In particular, for sequential object manipulation tasks, the RL agent generally only receives a reward upon successful completion of the entire task, leading to low exploration efficiency. To address this sample inefficiency, we propose a novel self-guided continual RL framework, named Relay Hindsight Experience Replay (RHER). RHER decomposes a sequential task into several subtasks with increasing complexity, allowing the agent to learn from the simplest subtask and gradually complete the task. It is crucial that a Self-Guided Exploration Strategy (SGES) is proposed to use the already-learned simpler subtask policy to guide the exploration of a more complex subtask. This strategy allows the agent to break the barriers of sparse reward sequential tasks and achieve efficient learning stage by stage. As a result, the proposed RHER method achieves state-of-the-art performance on the benchmark tasks (FetchPush and FetchPickAndPlace). Furthermore, the experimental results demonstrate the superiority and high efficiency of RHER on a variety of single-object and multi-object manipulation tasks (e.g., ObstaclePush, DrawerBox, TStack, etc.). Finally, the proposed RHER method can also learn a contact-rich task on a real robot from scratch within 250 episodes.
资助项目	NSFC[61973294] ; Anhui Provin-cial Key RD Program[2022i01020020] ; University Synergy Innovation Program of Anhui Province, China[GXXT-2021-030]
WOS研究方向	Computer Science
语种	英语
出版者	ELSEVIER
WOS记录号	WOS:001077267000001
资助机构	NSFC ; Anhui Provin-cial Key RD Program ; University Synergy Innovation Program of Anhui Province, China
内容类型	期刊论文
源URL	[http://ir.hfcas.ac.cn:8080/handle/334002/132582]
专题	中国科学院合肥物质科学研究院
通讯作者	Song, Bo
作者单位	1.Jianghuai Frontier Technol Coordinat & Innovat Ctr, Hefei 230088, Peoples R China 2.Univ Sci & Technol China, Hefei 230026, Peoples R China 3.Chinese Acad Sci, Hefei Inst Phys Sci, Inst Intelligent Machines, Hefei 230031, Peoples R China
推荐引用方式 GB/T 7714	Luo, Yongle,Wang, Yuxin,Dong, Kun,et al. Relay Hindsight Experience Replay: Self-guided continual reinforcement learning for sequential object manipulation tasks with sparse rewards[J]. NEUROCOMPUTING,2023,557.
APA	Luo, Yongle.,Wang, Yuxin.,Dong, Kun.,Zhang, Qiang.,Cheng, Erkang.,...&Song, Bo.(2023).Relay Hindsight Experience Replay: Self-guided continual reinforcement learning for sequential object manipulation tasks with sparse rewards.NEUROCOMPUTING,557.
MLA	Luo, Yongle,et al."Relay Hindsight Experience Replay: Self-guided continual reinforcement learning for sequential object manipulation tasks with sparse rewards".NEUROCOMPUTING 557(2023).