论文标题
确定的等效性导致平均场耦合马尔可夫链的团队最佳控制
A Certainty Equivalence Result in Team-Optimal Control of Mean-Field Coupled Markov Chains
论文作者
论文摘要
本文研究了大量同质的马尔可夫决策过程,其中过渡概率和成本在状态的经验分布中(也称为均值场)。每个过程的状态不知道其他过程,这意味着信息结构已完全分散。目的是将平均成本(定义为个人成本的经验平均值)最小化,为此提出了亚最佳解决方案。该解决方案不取决于过程的数量,但是随着过程趋向于无穷大,它会收敛到所谓的平均场共享的最佳解决方案。在某些温和条件下,表明所提出的分散溶液的收敛速率与过程数量的平方根成正比。通常,找到此亚最佳解决方案涉及在不可数集上的非平滑非凸优化问题。为了克服这一缺点,引入了一个组合优化问题,该问题达到了相同的收敛速率。
This paper studies a large number of homogeneous Markov decision processes where the transition probabilities and costs are coupled in the empirical distribution of states (also called mean-field). The state of each process is not known to others, which means that the information structure is fully decentralized. The objective is to minimize the average cost, defined as the empirical mean of individual costs, for which a sub-optimal solution is proposed. This solution does not depend on the number of processes, yet it converges to the optimal solution of the so-called mean-field sharing as the number of processes tends to infinity. Under some mild conditions, it is shown that the convergence rate of the proposed decentralized solution is proportional to the square root of the inverse of the number of processes. Finding this sub-optimal solution involves a non-smooth non-convex optimization problem over an uncountable set, in general. To overcome this drawback, a combinatorial optimization problem is introduced that achieves the same rate of convergence.