CORC  > 清华大学
Greedy feature replacement for online value function approximation
Feng-fei ZHAO ; Zheng QIN ; Zhuo SHAO ; Jun FANG ; Bo-yan REN ; Feng-fei ZHAO ; Zheng QIN ; Zhuo SHAO ; Jun FANG ; Bo-yan REN
2016-03-30 ; 2016-03-30
关键词Reinforcement learning Function approximation Feature dependency Online expansion Feature replacement TP181
其他题名Greedy feature replacement for online value function approximation
中文摘要Reinforcement learning(RL) in real-world problems requires function approximations that depend on selecting the appropriate feature representations. Representational expansion techniques can make linear approximators represent value functions more effectively; however, most of these techniques function well only for low dimensional problems. In this paper, we present the greedy feature replacement(GFR), a novel online expansion technique, for value-based RL algorithms that use binary features. Given a simple initial representation, the feature representation is expanded incrementally. New feature dependencies are added automatically to the current representation and conjunctive features are used to replace current features greedily. The virtual temporal difference(TD) error is recorded for each conjunctive feature to judge whether the replacement can improve the approximation. Correctness guarantees and computational complexity analysis are provided for GFR. Experimental results in two domains show that GFR achieves much faster learning and has the capability to handle large-scale problems.; Reinforcement learning(RL) in real-world problems requires function approximations that depend on selecting the appropriate feature representations. Representational expansion techniques can make linear approximators represent value functions more effectively; however, most of these techniques function well only for low dimensional problems. In this paper, we present the greedy feature replacement(GFR), a novel online expansion technique, for value-based RL algorithms that use binary features. Given a simple initial representation, the feature representation is expanded incrementally. New feature dependencies are added automatically to the current representation and conjunctive features are used to replace current features greedily. The virtual temporal difference(TD) error is recorded for each conjunctive feature to judge whether the replacement can improve the approximation. Correctness guarantees and computational complexity analysis are provided for GFR. Experimental results in two domains show that GFR achieves much faster learning and has the capability to handle large-scale problems.
语种英语 ; 英语
内容类型期刊论文
源URL[http://ir.lib.tsinghua.edu.cn/ir/item.do?handle=123456789/146795]  
专题清华大学
推荐引用方式
GB/T 7714
Feng-fei ZHAO,Zheng QIN,Zhuo SHAO,et al. Greedy feature replacement for online value function approximation[J],2016, 2016.
APA Feng-fei ZHAO.,Zheng QIN.,Zhuo SHAO.,Jun FANG.,Bo-yan REN.,...&Bo-yan REN.(2016).Greedy feature replacement for online value function approximation..
MLA Feng-fei ZHAO,et al."Greedy feature replacement for online value function approximation".(2016).
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace