Relay Hindsight Experience Replay: Self-guided continual reinforcement learning for sequential object manipulation tasks with sparse rewards | |
Luo, Yongle2,3; Wang, Yuxin2,3; Dong, Kun2,3; Zhang, Qiang2,3; Cheng, Erkang2,3; Sun, Zhiyong2,3; Song, Bo1,2,3 | |
刊名 | NEUROCOMPUTING |
2023-11-07 | |
卷号 | 557 |
关键词 | Deep reinforcement learning Robotic manipulation Continual learning Hindsight experience replay Sparse reward |
ISSN号 | 0925-2312 |
DOI | 10.1016/j.neucom.2023.126620 |
通讯作者 | Song, Bo(songbo@iim.ac.cn) |
英文摘要 | Learning with sparse rewards remains a challenging problem in reinforcement learning (RL). In particular, for sequential object manipulation tasks, the RL agent generally only receives a reward upon successful completion of the entire task, leading to low exploration efficiency. To address this sample inefficiency, we propose a novel self-guided continual RL framework, named Relay Hindsight Experience Replay (RHER). RHER decomposes a sequential task into several subtasks with increasing complexity, allowing the agent to learn from the simplest subtask and gradually complete the task. It is crucial that a Self-Guided Exploration Strategy (SGES) is proposed to use the already-learned simpler subtask policy to guide the exploration of a more complex subtask. This strategy allows the agent to break the barriers of sparse reward sequential tasks and achieve efficient learning stage by stage. As a result, the proposed RHER method achieves state-of-the-art performance on the benchmark tasks (FetchPush and FetchPickAndPlace). Furthermore, the experimental results demonstrate the superiority and high efficiency of RHER on a variety of single-object and multi-object manipulation tasks (e.g., ObstaclePush, DrawerBox, TStack, etc.). Finally, the proposed RHER method can also learn a contact-rich task on a real robot from scratch within 250 episodes. |
资助项目 | NSFC[61973294] ; Anhui Provin-cial Key RD Program[2022i01020020] ; University Synergy Innovation Program of Anhui Province, China[GXXT-2021-030] |
WOS研究方向 | Computer Science |
语种 | 英语 |
出版者 | ELSEVIER |
WOS记录号 | WOS:001077267000001 |
资助机构 | NSFC ; Anhui Provin-cial Key RD Program ; University Synergy Innovation Program of Anhui Province, China |
内容类型 | 期刊论文 |
源URL | [http://ir.hfcas.ac.cn:8080/handle/334002/132582] |
专题 | 中国科学院合肥物质科学研究院 |
通讯作者 | Song, Bo |
作者单位 | 1.Jianghuai Frontier Technol Coordinat & Innovat Ctr, Hefei 230088, Peoples R China 2.Univ Sci & Technol China, Hefei 230026, Peoples R China 3.Chinese Acad Sci, Hefei Inst Phys Sci, Inst Intelligent Machines, Hefei 230031, Peoples R China |
推荐引用方式 GB/T 7714 | Luo, Yongle,Wang, Yuxin,Dong, Kun,et al. Relay Hindsight Experience Replay: Self-guided continual reinforcement learning for sequential object manipulation tasks with sparse rewards[J]. NEUROCOMPUTING,2023,557. |
APA | Luo, Yongle.,Wang, Yuxin.,Dong, Kun.,Zhang, Qiang.,Cheng, Erkang.,...&Song, Bo.(2023).Relay Hindsight Experience Replay: Self-guided continual reinforcement learning for sequential object manipulation tasks with sparse rewards.NEUROCOMPUTING,557. |
MLA | Luo, Yongle,et al."Relay Hindsight Experience Replay: Self-guided continual reinforcement learning for sequential object manipulation tasks with sparse rewards".NEUROCOMPUTING 557(2023). |
个性服务 |
查看访问统计 |
相关权益政策 |
暂无数据 |
收藏/分享 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论