不完美的信息游戏中的声音算法

论文标题

不完美的信息游戏中的声音算法

Sound Algorithms in Imperfect Information Games

论文作者

Šustr, Michal, Schmid, Martin, Moravčík, Matej, Burch, Neil, Lanctot, Marc, Bowling, Michael

论文摘要

从一开始，搜索就在计算机游戏研究中发挥了基本作用。尽管在线搜索通常用于象棋和GO等完美信息游戏中，但仅仅引入了不完美的信息游戏的在线搜索方法。本文解决了什么是在两人零和游戏的不完善信息设置中，什么是声音在线算法。我们认为，可利用性和$ε$ -Nash Equilibria的固定策略〜定义不适合衡量在线算法的最差性能。因此，我们正式化了$ε$ soundness，这个概念将在线算法的最差性能与$ε$ -NASH平衡的性能联系起来。通常，由于$ε$ - 单调可能很难计算，因此我们引入了一个一致性框架 - 将在线算法的行为连接到NASH平衡的层次结构。这些多个级别的一致性描述了在线算法在什么意义上播放“就像固定的NASH平衡”。这些概念进一步说明了完美和不完美的信息设置之间的区别，因为相同的一致性保证在完美和不完美的信息游戏中具有不同的最差在线性能。声音的定义和一致性层次结构最终提供了适当的工具来分析重复不完美的信息游戏中的在线算法。因此，我们从新的角度检查了以前的一些在线算法，从而为他们最差的性能保证带来了新的见解。

Search has played a fundamental role in computer game research since the very beginning. And while online search has been commonly used in perfect information games such as Chess and Go, online search methods for imperfect information games have only been introduced relatively recently. This paper addresses the question of what is a sound online algorithm in an imperfect information setting of two-player zero-sum games. We argue that the~fixed-strategy~definitions of exploitability and $ε$-Nash equilibria are ill-suited to measure an online algorithm's worst-case performance. We thus formalize $ε$-soundness, a concept that connects the worst-case performance of an online algorithm to the performance of an $ε$-Nash equilibrium. As $ε$-soundness can be difficult to compute in general, we introduce a consistency framework -- a hierarchy that connects an online algorithm's behavior to a Nash equilibrium. These multiple levels of consistency describe in what sense an online algorithm plays "just like a fixed Nash equilibrium". These notions further illustrate the difference between perfect and imperfect information settings, as the same consistency guarantees have different worst-case online performance in perfect and imperfect information games. The definitions of soundness and the consistency hierarchy finally provide appropriate tools to analyze online algorithms in repeated imperfect information games. We thus inspect some of the previous online algorithms in a new light, bringing new insights into their worst-case performance guarantees.

下载PDF全文

下载文献需遵守相关版权规定

论文标题