对抗性马尔可夫决策过程的FPL的精致分析

论文标题

对抗性马尔可夫决策过程的FPL的精致分析

Refined Analysis of FPL for Adversarial Markov Decision Processes

论文作者

Wang, Yuanhao, Dong, Kefan

论文摘要

我们考虑了对抗性马尔可夫决策过程（MDP）问题，其中可以选择MDP的奖励，并且可以知道或未知的过渡函数。在这两种情况下，都提出了以前的文献中基于遵循的练习练习（FPL）算法。但是，基于FPL的算法的既有遗憾界限都比基于镜像的算法差。我们改进了两个设置中基于FPL的算法的分析，使用更快，更简单的算法匹配当前的最佳后悔界限。

We consider the adversarial Markov Decision Process (MDP) problem, where the rewards for the MDP can be adversarially chosen, and the transition function can be either known or unknown. In both settings, Follow-the-PerturbedLeader (FPL) based algorithms have been proposed in previous literature. However, the established regret bounds for FPL based algorithms are worse than algorithms based on mirrordescent. We improve the analysis of FPL based algorithms in both settings, matching the current best regret bounds using faster and simpler algorithms.

下载PDF全文

下载文献需遵守相关版权规定

论文标题