Relay Hindsight Experience Replay: Self-guided continual reinforcement learning for sequential object manipulation tasks with sparse rewards
Luo, Yongle2,3; Wang, Yuxin2,3; Dong, Kun2,3; Zhang, Qiang2,3; Cheng, Erkang2,3; Sun, Zhiyong2,3; Song, Bo1,2,3
刊名NEUROCOMPUTING
2023-11-07
卷号557
关键词Deep reinforcement learning Robotic manipulation Continual learning Hindsight experience replay Sparse reward
ISSN号0925-2312
DOI10.1016/j.neucom.2023.126620
通讯作者Song, Bo(songbo@iim.ac.cn)
英文摘要Learning with sparse rewards remains a challenging problem in reinforcement learning (RL). In particular, for sequential object manipulation tasks, the RL agent generally only receives a reward upon successful completion of the entire task, leading to low exploration efficiency. To address this sample inefficiency, we propose a novel self-guided continual RL framework, named Relay Hindsight Experience Replay (RHER). RHER decomposes a sequential task into several subtasks with increasing complexity, allowing the agent to learn from the simplest subtask and gradually complete the task. It is crucial that a Self-Guided Exploration Strategy (SGES) is proposed to use the already-learned simpler subtask policy to guide the exploration of a more complex subtask. This strategy allows the agent to break the barriers of sparse reward sequential tasks and achieve efficient learning stage by stage. As a result, the proposed RHER method achieves state-of-the-art performance on the benchmark tasks (FetchPush and FetchPickAndPlace). Furthermore, the experimental results demonstrate the superiority and high efficiency of RHER on a variety of single-object and multi-object manipulation tasks (e.g., ObstaclePush, DrawerBox, TStack, etc.). Finally, the proposed RHER method can also learn a contact-rich task on a real robot from scratch within 250 episodes.
资助项目NSFC[61973294] ; Anhui Provin-cial Key RD Program[2022i01020020] ; University Synergy Innovation Program of Anhui Province, China[GXXT-2021-030]
WOS研究方向Computer Science
语种英语
出版者ELSEVIER
WOS记录号WOS:001077267000001
资助机构NSFC ; Anhui Provin-cial Key RD Program ; University Synergy Innovation Program of Anhui Province, China
内容类型期刊论文
源URL[http://ir.hfcas.ac.cn:8080/handle/334002/132582]  
专题中国科学院合肥物质科学研究院
通讯作者Song, Bo
作者单位1.Jianghuai Frontier Technol Coordinat & Innovat Ctr, Hefei 230088, Peoples R China
2.Univ Sci & Technol China, Hefei 230026, Peoples R China
3.Chinese Acad Sci, Hefei Inst Phys Sci, Inst Intelligent Machines, Hefei 230031, Peoples R China
推荐引用方式
GB/T 7714
Luo, Yongle,Wang, Yuxin,Dong, Kun,et al. Relay Hindsight Experience Replay: Self-guided continual reinforcement learning for sequential object manipulation tasks with sparse rewards[J]. NEUROCOMPUTING,2023,557.
APA Luo, Yongle.,Wang, Yuxin.,Dong, Kun.,Zhang, Qiang.,Cheng, Erkang.,...&Song, Bo.(2023).Relay Hindsight Experience Replay: Self-guided continual reinforcement learning for sequential object manipulation tasks with sparse rewards.NEUROCOMPUTING,557.
MLA Luo, Yongle,et al."Relay Hindsight Experience Replay: Self-guided continual reinforcement learning for sequential object manipulation tasks with sparse rewards".NEUROCOMPUTING 557(2023).
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace