基于能量的腿机器人的地形横穿性建模通过深度逆增强学习

论文标题

基于能量的腿机器人的地形横穿性建模通过深度逆增强学习

Energy-based Legged Robots Terrain Traversability Modeling via Deep Inverse Reinforcement Learning

论文作者

Gan, Lu, Grizzle, Jessy W., Eustice, Ryan M., Ghaffari, Maani

论文摘要

这项工作报告了开发针对腿部机器人的深层增强学习方法的地形遍历性建模，该模型既包含了外部感受和本体感受性的感觉数据。现有作品使用机器人不合时宜的外部感受的环境特征或手工制作的运动功能；取而代之的是，我们建议还从本体感受的感官数据中学习机器人特异性的惯性特征，以在单个深层神经网络中奖励近似。合并惯性功能可以改善模型保真度，并提供取决于在部署过程中机器人状态的奖励。我们使用最大熵深的逆增强学习（Medirl）算法训练奖励网络，并提出同时最大程度地减少轨迹排名损失，以应对腿部机器人示范的次优。所证明的轨迹通过运动能源消耗来排名，以学习能源感知的奖励功能和比示范更节能的政策。我们使用MIT MIN-CHEETAH机器人和迷你Cheetah模拟器收集的数据集评估我们的方法。该代码可在https://github.com/ganlumomo/minicheetah-traversability-irl上公开获取。

This work reports on developing a deep inverse reinforcement learning method for legged robots terrain traversability modeling that incorporates both exteroceptive and proprioceptive sensory data. Existing works use robot-agnostic exteroceptive environmental features or handcrafted kinematic features; instead, we propose to also learn robot-specific inertial features from proprioceptive sensory data for reward approximation in a single deep neural network. Incorporating the inertial features can improve the model fidelity and provide a reward that depends on the robot's state during deployment. We train the reward network using the Maximum Entropy Deep Inverse Reinforcement Learning (MEDIRL) algorithm and propose simultaneously minimizing a trajectory ranking loss to deal with the suboptimality of legged robot demonstrations. The demonstrated trajectories are ranked by locomotion energy consumption, in order to learn an energy-aware reward function and a more energy-efficient policy than demonstration. We evaluate our method using a dataset collected by an MIT Mini-Cheetah robot and a Mini-Cheetah simulator. The code is publicly available at https://github.com/ganlumomo/minicheetah-traversability-irl.

下载PDF全文

下载文献需遵守相关版权规定

论文标题