AlphaHoldem: High-Performance Artificial Intelligence for Heads-Up No-Limit Poker via End-to-End Reinforcement Learning | |
Zhao EM(赵恩民)2,3; Yan RY(闫仁业)2,3; Li JQ(李金秋)2,3; Li K(李凯)3; Xing JL(兴军亮)1,2,3 | |
2021-02 | |
会议日期 | 2022-02-22 |
会议地点 | 线上 |
DOI | 无 |
英文摘要 | Heads-up no-limit Texas hold’em (HUNL) is the quintessen
tial game with imperfect information. Representative prior
works like DeepStack and Libratus heavily rely on counter
factual regret minimization (CFR) and its variants to tackle
HUNL. However, the prohibitive computation cost of CFR
iteration makes it diffificult for subsequent researchers to learn
the CFR model in HUNL and apply it in other practical ap
plications. In this work, we present AlphaHoldem, a high
performance and lightweight HUNL AI obtained with an end
to-end self-play reinforcement learning framework. The pro
posed framework adopts a pseudo-siamese architecture to di
rectly learn from the input state information to the output ac
tions by competing the learned model with its different his
torical versions. The main technical contributions include a
novel state representation of card and betting information, a
multi-task self-play training loss function, and a new model
evaluation and selection metric to generate the fifinal model.
In a study involving 100,000 hands of poker, AlphaHoldem
defeats Slumbot and DeepStack using only one PC with three
days training. At the same time, AlphaHoldem only takes 2.9
milliseconds for each decision-making using only a single
GPU, more than 1,000 times faster than DeepStack. |
语种 | 英语 |
内容类型 | 会议论文 |
源URL | [http://ir.ia.ac.cn/handle/173211/52251] |
专题 | 融合创新中心_决策指挥与体系智能 |
作者单位 | 1.Tsinghua University 2.School of Artificial Intelligence, University of Chinese Academy of Sciences 3.Institute of Automation, Chinese Academy of Sciences |
推荐引用方式 GB/T 7714 | Zhao EM,Yan RY,Li JQ,et al. AlphaHoldem: High-Performance Artificial Intelligence for Heads-Up No-Limit Poker via End-to-End Reinforcement Learning[C]. 见:. 线上. 2022-02-22. |
个性服务 |
查看访问统计 |
相关权益政策 |
暂无数据 |
收藏/分享 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论