论文标题
主动缓存同步的在线学习
Online Learning for Active Cache Synchronization
论文作者
论文摘要
现有的多臂强盗(MAB)模型做出了两个隐式假设:只有在播放时,ARM才会产生回报,而代理商观察到生成的每一个收益。本文介绍了同步匪徒,这是一个单元素的变体,所有武器始终产生成本,但是该经纪人只有在弹奏手臂时才会观察到手臂的瞬时成本。同步mAB的灵感来自在线缓存方案,例如Web Crawling,其中手臂对应于缓存的物品并播放手臂,意味着从服务器下载其新副本。我们提出了一种用于同步土匪的在线学习算法的MirrorSync,为其建立了$ O(T^{2/3})$的对抗性遗憾,并展示如何使其实用。
Existing multi-armed bandit (MAB) models make two implicit assumptions: an arm generates a payoff only when it is played, and the agent observes every payoff that is generated. This paper introduces synchronization bandits, a MAB variant where all arms generate costs at all times, but the agent observes an arm's instantaneous cost only when the arm is played. Synchronization MABs are inspired by online caching scenarios such as Web crawling, where an arm corresponds to a cached item and playing the arm means downloading its fresh copy from a server. We present MirrorSync, an online learning algorithm for synchronization bandits, establish an adversarial regret of $O(T^{2/3})$ for it, and show how to make it practical.