非固定匪徒带有背包

论文标题

非固定匪徒带有背包

Non-stationary Bandits with Knapsacks

论文作者

Liu, Shang, Jiang, Jiashuo, Li, Xiaocheng

论文摘要

在本文中，我们研究了非固定环境中带有背包（BWK）土匪的问题。 BWK问题概括了多臂强盗（MAB）问题，以建模与玩每个ARM相关的资源消耗。每次，决策者/播放器选择玩一只手臂，他/她将获得奖励，并从多种资源类型中的每种资源中消耗一定数量的资源。目的是在有限的地平线上最大化累积奖励受到资源的某些背包约束。现有作品在随机或对抗环境下研究BWK问题。我们的论文考虑了一个非平稳的环境，该环境在这两个极端之间连续插值。我们首先表明，由于存在约束，传统的变化预算概念不足以表征BWK问题的非平稳性，以使人遗憾，然后我们提出了一个新的全球非平稳性测量概念。我们采用两种非平稳性措施来得出问题的上限和下限。我们的结果基于对基本线性程序的原始二偶分析，并突出了约束与非平稳性之间的相互作用。最后，我们还将非平稳性度量扩展到了在线凸出优化的问题，并相应地获得了新的后悔界限。

In this paper, we study the problem of bandits with knapsacks (BwK) in a non-stationary environment. The BwK problem generalizes the multi-arm bandit (MAB) problem to model the resource consumption associated with playing each arm. At each time, the decision maker/player chooses to play an arm, and s/he will receive a reward and consume certain amount of resource from each of the multiple resource types. The objective is to maximize the cumulative reward over a finite horizon subject to some knapsack constraints on the resources. Existing works study the BwK problem under either a stochastic or adversarial environment. Our paper considers a non-stationary environment which continuously interpolates between these two extremes. We first show that the traditional notion of variation budget is insufficient to characterize the non-stationarity of the BwK problem for a sublinear regret due to the presence of the constraints, and then we propose a new notion of global non-stationarity measure. We employ both non-stationarity measures to derive upper and lower bounds for the problem. Our results are based on a primal-dual analysis of the underlying linear programs and highlight the interplay between the constraints and the non-stationarity. Finally, we also extend the non-stationarity measure to the problem of online convex optimization with constraints and obtain new regret bounds accordingly.

下载PDF全文

下载文献需遵守相关版权规定

论文标题