论文标题

多通溪流土匪的急剧记忆重新折叠权衡取舍

A Sharp Memory-Regret Trade-Off for Multi-Pass Streaming Bandits

论文作者

Agarwal, Arpit, Khanna, Sanjeev, Patil, Prathamesh

论文摘要

随机$ k $武装的强盗问题由于其在在线广告到临床试验的各个领域的应用而进行了广泛的研究。但是,实际上,武器的数量可能很大,从而产生了很大的记忆要求,以同时处理它们。在本文中,我们考虑了一个流式设置,该设置在流中呈现手臂,并且该算法使用有限的内存来处理这些手臂。在这里,目标不仅是为了最大程度地减少遗憾,而且是在最小的记忆中这样做。此问题的先前算法在两个设置之一中运行:它们要么使用$ω(\ log \ log t)$通过流(Rathod,2021; Chaudhuri和Kalyanakrishnan,2020; Liau et al。,2018),或者只有一个Pass(Maiti等人,2021年)。 在本文中,我们研究了允许任何$ b \ geq 1 $ $ b $通过流的$ b $通过时的记忆与遗憾之间的权衡,并为任何$ b $ pass算法建立紧张的遗憾上和下限。 Our results uncover a surprising *sharp transition phenomenon*: $O(1)$ memory is sufficient to achieve $\widetildeΘ\Big(T^{\frac{1}{2} + \frac{1}{2^{B+2}-2}}\Big)$ regret in $B$ passes, and increasing the memory to any quantity that is $o(K)$ has almost除非我们使用$ω(k)$内存,否则对进一步减少这种遗憾没有影响。我们的主要技术贡献是我们的下限,它需要使用信息理论技术以及圆形消除的想法,以表明 *残留问题 *在随后的通行证中仍然具有挑战性。

The stochastic $K$-armed bandit problem has been studied extensively due to its applications in various domains ranging from online advertising to clinical trials. In practice however, the number of arms can be very large resulting in large memory requirements for simultaneously processing them. In this paper we consider a streaming setting where the arms are presented in a stream and the algorithm uses limited memory to process these arms. Here, the goal is not only to minimize regret, but also to do so in minimal memory. Previous algorithms for this problem operate in one of the two settings: they either use $Ω(\log \log T)$ passes over the stream (Rathod, 2021; Chaudhuri and Kalyanakrishnan, 2020; Liau et al., 2018), or just a single pass (Maiti et al., 2021). In this paper we study the trade-off between memory and regret when $B$ passes over the stream are allowed, for any $B \geq 1$, and establish tight regret upper and lower bounds for any $B$-pass algorithm. Our results uncover a surprising *sharp transition phenomenon*: $O(1)$ memory is sufficient to achieve $\widetildeΘ\Big(T^{\frac{1}{2} + \frac{1}{2^{B+2}-2}}\Big)$ regret in $B$ passes, and increasing the memory to any quantity that is $o(K)$ has almost no impact on further reducing this regret, unless we use $Ω(K)$ memory. Our main technical contribution is our lower bound which requires the use of information-theoretic techniques as well as ideas from round elimination to show that the *residual problem* remains challenging over subsequent passes.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源