指导课程学习，用于在复杂的地形上行走

论文标题

指导课程学习，用于在复杂的地形上行走

Guided Curriculum Learning for Walking Over Complex Terrain

论文作者

Tidd, Brendan, Hudson, Nicolas, Cosgun, Akansel

论文摘要

可靠的两足动物在复杂的地形上行走是一个具有挑战性的问题，使用课程可以帮助学习。课程学习是从一个可实现的任务版本开始的想法，并随着满足成功标准而增加难度。我们提出了一个三阶段的课程，以训练深度加强学习政策，以在各种挑战的地形上行走。在第一阶段，代理商在轻松的地形上开始，并且地形难度逐渐增加，而从目标政策中得出的力则应用于机器人关节和基地。在第二阶段，指导力逐渐减少到零。最后，在第三阶段，将随机扰动升高应用于机器人基础上，以提高策略的鲁棒性。在模拟实验中，我们表明我们的方法在学习步行策略方面有效，彼此分开，以五种地形类型：平坦，障碍，间隙，楼梯和步骤。此外，我们证明，在没有人类演示的情况下，一个简单的手工设计的步行轨迹在学习遍历复杂地形类型之前就足够了。在消融研究中，我们表明，将课程的三个阶段中的任何一个都降低了学习表现。

Reliable bipedal walking over complex terrain is a challenging problem, using a curriculum can help learning. Curriculum learning is the idea of starting with an achievable version of a task and increasing the difficulty as a success criteria is met. We propose a 3-stage curriculum to train Deep Reinforcement Learning policies for bipedal walking over various challenging terrains. In the first stage, the agent starts on an easy terrain and the terrain difficulty is gradually increased, while forces derived from a target policy are applied to the robot joints and the base. In the second stage, the guiding forces are gradually reduced to zero. Finally, in the third stage, random perturbations with increasing magnitude are applied to the robot base, so the robustness of the policies are improved. In simulation experiments, we show that our approach is effective in learning walking policies, separate from each other, for five terrain types: flat, hurdles, gaps, stairs, and steps. Moreover, we demonstrate that in the absence of human demonstrations, a simple hand designed walking trajectory is a sufficient prior to learn to traverse complex terrain types. In ablation studies, we show that taking out any one of the three stages of the curriculum degrades the learning performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题