有效的沃斯坦恒星自然梯度用于加固学习

论文标题

有效的沃斯坦恒星自然梯度用于加固学习

Efficient Wasserstein Natural Gradients for Reinforcement Learning

论文作者

Moskovitz, Ted, Arbel, Michael, Huszar, Ferenc, Gretton, Arthur

论文摘要

提出了一种新颖的优化方法，以应用于政策梯度方法和增强学习的演化策略（RL）。该过程使用计算高效的Wasserstein自然梯度（WNG）下降，该下降利用了沃斯坦（Wasserstein）惩罚引起的几何形状来进行速度优化。该方法遵循最新主题，其中包括建立信托区域的目标的分歧罚款。有关挑战任务的实验表明，计算成本和性能的改善对高级基准。

A novel optimization approach is proposed for application to policy gradient methods and evolution strategies for reinforcement learning (RL). The procedure uses a computationally efficient Wasserstein natural gradient (WNG) descent that takes advantage of the geometry induced by a Wasserstein penalty to speed optimization. This method follows the recent theme in RL of including a divergence penalty in the objective to establish a trust region. Experiments on challenging tasks demonstrate improvements in both computational cost and performance over advanced baselines.

下载PDF全文

下载文献需遵守相关版权规定

论文标题