论文标题
取样平衡:结构化游戏中的快速无重组学习
Sampling Equilibria: Fast No-Regret Learning in Structured Games
论文作者
论文摘要
游戏中的学习和均衡计算是计算机科学和经济学之间的基本问题,其应用从政治到机器学习。该领域的许多工作围绕一种称为\ emph {随机加权的多数}(RWM)的简单算法,也称为“对冲”或“乘法更新”,众所周知,该算法在统计上是最佳的,可以在统计上获得最佳的对抗性(Littlestone''94,littleth '94,Freund and freund and Schapire''9999。不幸的是,RWM带有固有的计算障碍:它需要在所有可能的动作上进行分配中维护和采样。在典型的感兴趣的环境中,动作空间呈指数级,在实践中似乎使RWM无用。 在这项工作中,我们对各种\ emph {结构化}游戏进行反驳,这表明有可能有效(大约)在\ emph {polylogarithmic}时间的RWM中采样了动作空间。这为问题提供了第一个有效的无regret算法,例如\ emph {(离散)上校blotto game},\ emph {matroid Expestion},\ emph {matroid Security}和basic \ emph {dueling games}。作为直接推论,我们给出了一个polyogarithmic的时间元叠加,以计算这些游戏的近似NASH平衡,在几种重要设置中,这些游戏的成倍速度比以前的方法更快。此外,我们的算法是第一个有效地计算出具有一般总和,超过两个玩家的游戏的更多涉及的游戏的均值,以及多种资源类型的上校。
Learning and equilibrium computation in games are fundamental problems across computer science and economics, with applications ranging from politics to machine learning. Much of the work in this area revolves around a simple algorithm termed \emph{randomized weighted majority} (RWM), also known as "Hedge" or "Multiplicative Weights Update," which is well known to achieve statistically optimal rates in adversarial settings (Littlestone and Warmuth '94, Freund and Schapire '99). Unfortunately, RWM comes with an inherent computational barrier: it requires maintaining and sampling from a distribution over all possible actions. In typical settings of interest the action space is exponentially large, seemingly rendering RWM useless in practice. In this work, we refute this notion for a broad variety of \emph{structured} games, showing it is possible to efficiently (approximately) sample the action space in RWM in \emph{polylogarithmic} time. This gives the first efficient no-regret algorithms for problems such as the \emph{(discrete) Colonel Blotto game}, \emph{matroid congestion}, \emph{matroid security}, and basic \emph{dueling games}. As an immediate corollary, we give a polylogarithmic time meta-algorithm to compute approximate Nash Equilibria for these games that is exponentially faster than prior methods in several important settings. Further, our algorithm is the first to efficiently compute equilibria for more involved variants of these games with general sums, more than two players, and, for Colonel Blotto, multiple resource types.