论文标题
有效的沃斯坦恒星自然梯度用于加固学习
Efficient Wasserstein Natural Gradients for Reinforcement Learning
论文作者
论文摘要
提出了一种新颖的优化方法,以应用于政策梯度方法和增强学习的演化策略(RL)。该过程使用计算高效的Wasserstein自然梯度(WNG)下降,该下降利用了沃斯坦(Wasserstein)惩罚引起的几何形状来进行速度优化。该方法遵循最新主题,其中包括建立信托区域的目标的分歧罚款。有关挑战任务的实验表明,计算成本和性能的改善对高级基准。
A novel optimization approach is proposed for application to policy gradient methods and evolution strategies for reinforcement learning (RL). The procedure uses a computationally efficient Wasserstein natural gradient (WNG) descent that takes advantage of the geometry induced by a Wasserstein penalty to speed optimization. This method follows the recent theme in RL of including a divergence penalty in the objective to establish a trust region. Experiments on challenging tasks demonstrate improvements in both computational cost and performance over advanced baselines.