通过镜下下降逆增强学习的强大模仿

论文标题

通过镜下下降逆增强学习的强大模仿

Robust Imitation via Mirror Descent Inverse Reinforcement Learning

论文作者

Han, Dong-Sig, Kim, Hyunseo, Lee, Hyundo, Ryu, Je-Hwan, Zhang, Byoung-Tak

论文摘要

最近，对抗性的模仿学习显示了一种可扩展的奖励获取方法，用于逆增强学习（IRL）问题。但是，估计的奖励信号通常变得不确定，并且无法训练可靠的统计模型，因为现有方法倾向于直接解决硬性优化问题。受称为“镜下降”的一阶优化方法的启发，本文提出了预测一系列奖励函数的顺序，这些奖励函数是受约束凸问题的迭代解决方案。由于目标密度估计所产生的不确定性，因为奖励学习的量受到局部几何约束的调节，因此耐受的IRL解决方案耐受性。我们证明，提议的镜像血统更新规则可确保Bregman Divergence的强大最小化，以$ \ Mathcal {O}（1/T）$的严格遗憾，用于步骤尺寸$ \ {η_T\} _ {t = 1}^{t} $。我们的IRL方法应用于对抗框架的顶部，它在广泛的基准套件中优于现有的对抗方法。

Recently, adversarial imitation learning has shown a scalable reward acquisition method for inverse reinforcement learning (IRL) problems. However, estimated reward signals often become uncertain and fail to train a reliable statistical model since the existing methods tend to solve hard optimization problems directly. Inspired by a first-order optimization method called mirror descent, this paper proposes to predict a sequence of reward functions, which are iterative solutions for a constrained convex problem. IRL solutions derived by mirror descent are tolerant to the uncertainty incurred by target density estimation since the amount of reward learning is regulated with respect to local geometric constraints. We prove that the proposed mirror descent update rule ensures robust minimization of a Bregman divergence in terms of a rigorous regret bound of $\mathcal{O}(1/T)$ for step sizes $\{η_t\}_{t=1}^{T}$. Our IRL method was applied on top of an adversarial framework, and it outperformed existing adversarial methods in an extensive suite of benchmarks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题