论文标题
一种索引政策,用于最大程度地减少马尔可夫资源的不确定性
An Index Policy for Minimizing the Uncertainty-of-Information of Markov Sources
论文作者
论文摘要
本文使用信息的不确定性(UOI)作为性能指标,重点介绍有限国家马尔可夫来源的新鲜度。通过香农的熵衡量,UOI不仅可以捕获马尔可夫来源的过渡动力学,而且还可以捕获由最后观察值不同值引起的信息质量的不同演变。我们考虑使用M有限状态的Markov来源通过M通信渠道将信息传输到远程监视器的信息更新系统。我们的目标是探索最佳的调度策略,以最大程度地减少马尔可夫来源的汇率。该问题被配制为不安的多臂强盗(RMAB)。我们放松RMAB,然后将放松的问题分解为单个匪徒问题。分析单一匪徒问题提供了有用的属性,而放松的问题可以减少以最大程度地提高凹入和分段线性功能,从而使我们能够开发一种梯度方法来解决放松的问题并获得其最佳策略。通过解决放松问题的最佳政策,我们获得了原始RMAB问题的索引政策。值得注意的是,所提出的索引政策是普遍的,因为它适用于具有有限成本功能的一般RMAB。
This paper focuses on the information freshness of finite-state Markov sources, using the uncertainty of information (UoI) as the performance metric. Measured by Shannon's entropy, UoI can capture not only the transition dynamics of the Markov source but also the different evolutions of information quality caused by the different values of the last observation. We consider an information update system with M finite-state Markov sources transmitting information to a remote monitor via m communication channels. Our goal is to explore the optimal scheduling policy to minimize the sum-UoI of the Markov sources. The problem is formulated as a restless multi-armed bandit (RMAB). We relax the RMAB and then decouple the relaxed problem into M single bandit problems. Analyzing the single bandit problem provides useful properties with which the relaxed problem reduces to maximizing a concave and piecewise linear function, allowing us to develop a gradient method to solve the relaxed problem and obtain its optimal policy. By rounding up the optimal policy for the relaxed problem, we obtain an index policy for the original RMAB problem. Notably, the proposed index policy is universal in the sense that it applies to general RMABs with bounded cost functions.