论文标题

非固定匪徒带有背包

Non-stationary Bandits with Knapsacks

论文作者

Liu, Shang, Jiang, Jiashuo, Li, Xiaocheng

论文摘要

在本文中,我们研究了非固定环境中带有背包(BWK)土匪的问题。 BWK问题概括了多臂强盗(MAB)问题,以建模与玩每个ARM相关的资源消耗。每次,决策者/播放器选择玩一只手臂,他/她将获得奖励,并从多种资源类型中的每种资源中消耗一定数量的资源。目的是在有限的地平线上最大化累积奖励受到资源的某些背包约束。现有作品在随机或对抗环境下研究BWK问题。我们的论文考虑了一个非平稳的环境,该环境在这两个极端之间连续插值。我们首先表明,由于存在约束,传统的变化预算概念不足以表征BWK问题的非平稳性,以使人遗憾,然后我们提出了一个新的全球非平稳性测量概念。我们采用两种非平稳性措施来得出问题的上限和下限。我们的结果基于对基本线性程序的原始二偶分析,并突出了约束与非平稳性之间的相互作用。最后,我们还将非平稳性度量扩展到了在线凸出优化的问题,并相应地获得了新的后悔界限。

In this paper, we study the problem of bandits with knapsacks (BwK) in a non-stationary environment. The BwK problem generalizes the multi-arm bandit (MAB) problem to model the resource consumption associated with playing each arm. At each time, the decision maker/player chooses to play an arm, and s/he will receive a reward and consume certain amount of resource from each of the multiple resource types. The objective is to maximize the cumulative reward over a finite horizon subject to some knapsack constraints on the resources. Existing works study the BwK problem under either a stochastic or adversarial environment. Our paper considers a non-stationary environment which continuously interpolates between these two extremes. We first show that the traditional notion of variation budget is insufficient to characterize the non-stationarity of the BwK problem for a sublinear regret due to the presence of the constraints, and then we propose a new notion of global non-stationarity measure. We employ both non-stationarity measures to derive upper and lower bounds for the problem. Our results are based on a primal-dual analysis of the underlying linear programs and highlight the interplay between the constraints and the non-stationarity. Finally, we also extend the non-stationarity measure to the problem of online convex optimization with constraints and obtain new regret bounds accordingly.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源