Truerma：在笛卡尔空间中学习快速，平滑的机器人轨迹，并具有递归中点适应

论文标题

Truerma：在笛卡尔空间中学习快速，平滑的机器人轨迹，并具有递归中点适应

TrueRMA: Learning Fast and Smooth Robot Trajectories with Recursive Midpoint Adaptations in Cartesian Space

论文作者

Kiemel, Jonas C., Meißner, Pascal, Kröger, Torsten

论文摘要

我们提出了Truerma，这是一种在广泛的起点和端点上学习成本优化的机器人轨迹的数据效率，无模型的方法。关键思想是通过递归预测相对于直线中点的正交适应来计算笛卡尔空间中的轨迹航路点。我们通过在航路点周围添加圆形混合物来生成一个可区分的路径，使用逆运动学求解器计算相应的关节位置，并考虑速度和加速度限制的时间优势参数化。在训练过程中，轨迹在物理模拟器中执行，成本是根据用户指定的成本函数分配的，而成本功能不需要可区分。鉴于起点和终点为输入，对神经网络进行了训练，可以预测中点适应，从而最大程度地降低了通过增强学习的轨迹的成本。我们成功地训练了一个Kuka IIWA机器人，以将球放在板上，同时在指定点之间移动，并将Truerma与两个基线的性能进行比较。结果表明，我们的方法需要更少的培训数据来学习任务，同时生成较短，更快的轨迹。

We present TrueRMA, a data-efficient, model-free method to learn cost-optimized robot trajectories over a wide range of starting points and endpoints. The key idea is to calculate trajectory waypoints in Cartesian space by recursively predicting orthogonal adaptations relative to the midpoints of straight lines. We generate a differentiable path by adding circular blends around the waypoints, calculate the corresponding joint positions with an inverse kinematics solver and calculate a time-optimal parameterization considering velocity and acceleration limits. During training, the trajectory is executed in a physics simulator and costs are assigned according to a user-specified cost function which is not required to be differentiable. Given a starting point and an endpoint as input, a neural network is trained to predict midpoint adaptations that minimize the cost of the resulting trajectory via reinforcement learning. We successfully train a KUKA iiwa robot to keep a ball on a plate while moving between specified points and compare the performance of TrueRMA against two baselines. The results show that our method requires less training data to learn the task while generating shorter and faster trajectories.

下载PDF全文

下载文献需遵守相关版权规定

论文标题