AlphaHoldem: High-Performance Artificial Intelligence for Heads-Up No-Limit Poker via End-to-End Reinforcement Learning

doi:无

CORC > 自动化研究所 > 中国科学院自动化研究所 > 融合创新中心 > 决策指挥与体系智能

	AlphaHoldem: High-Performance Artificial Intelligence for Heads-Up No-Limit Poker via End-to-End Reinforcement Learning
	Zhao EM(赵恩民)2,3; Yan RY(闫仁业)2,3; Li JQ(李金秋)2,3; Li K(李凯)3; Xing JL(兴军亮)1,2,3
	2021-02
会议日期	2022-02-22
会议地点	线上
DOI	无
英文摘要	Heads-up no-limit Texas hold’em (HUNL) is the quintessen tial game with imperfect information. Representative prior works like DeepStack and Libratus heavily rely on counter factual regret minimization (CFR) and its variants to tackle HUNL. However, the prohibitive computation cost of CFR iteration makes it diffificult for subsequent researchers to learn the CFR model in HUNL and apply it in other practical ap plications. In this work, we present AlphaHoldem, a high performance and lightweight HUNL AI obtained with an end to-end self-play reinforcement learning framework. The pro posed framework adopts a pseudo-siamese architecture to di rectly learn from the input state information to the output ac tions by competing the learned model with its different his torical versions. The main technical contributions include a novel state representation of card and betting information, a multi-task self-play training loss function, and a new model evaluation and selection metric to generate the fifinal model. In a study involving 100,000 hands of poker, AlphaHoldem defeats Slumbot and DeepStack using only one PC with three days training. At the same time, AlphaHoldem only takes 2.9 milliseconds for each decision-making using only a single GPU, more than 1,000 times faster than DeepStack.
语种	英语
内容类型	会议论文
源URL	[http://ir.ia.ac.cn/handle/173211/52251]
专题	融合创新中心_决策指挥与体系智能
作者单位	1.Tsinghua University 2.School of Artificial Intelligence, University of Chinese Academy of Sciences 3.Institute of Automation, Chinese Academy of Sciences
推荐引用方式 GB/T 7714	Zhao EM,Yan RY,Li JQ,et al. AlphaHoldem: High-Performance Artificial Intelligence for Heads-Up No-Limit Poker via End-to-End Reinforcement Learning[C]. 见:. 线上. 2022-02-22.

个性服务

查看访问统计

相关权益政策

暂无数据

收藏/分享

所有评论 (0)

[发表评论/异议/意见]

暂无评论

评论
权益异议
反馈意见

评注功能仅针对注册用户开放，请您登录

您对该条目有什么异议，请向管理员反馈。
内容：
Email：	*
单位:
验证码：	刷新

您在知识库使用过程中有什么好的想法或者建议可以反馈给我们。
标题：	*
内容：
Email：	*
验证码：	刷新

相关链接

CORC

联系我们