论文标题
通过最小化控制问题的学习解决方案歧管
Learning Solution Manifolds for Control Problems via Energy Minimization
论文作者
论文摘要
各种控制任务,例如逆运动学(IK),轨迹优化(TO)和模型预测控制(MPC)通常被称为能量最小化问题。解决此类问题的数值解决方案是完善的。但是,这些通常太慢,无法直接用于实时应用程序。另一种选择是在离线阶段学习控制问题的解决方案歧管。尽管这种蒸馏过程可以在模仿学习环境中琐碎地表达为行为克隆(BC)问题,但我们的实验突出了由于不兼容的局部最小值,插值文物和状态空间不足而引起的许多重大缺点。在本文中,我们提出了有效且在数值稳健的BC的替代方案。我们将解决方案歧管的学习作为最小化控制目标的能量术语的最小化。我们通过一种新颖的方法将这种能量积分最小化,该方法将蒙特卡洛启发的自适应采样策略与用于解决控制任务的单个实例的衍生物结合在一起。我们在一系列复杂性的机器人控制问题上评估了制定性能的性能,并通过与传统方法(例如行为克隆和数据集聚合(匕首))进行比较来强调其益处。
A variety of control tasks such as inverse kinematics (IK), trajectory optimization (TO), and model predictive control (MPC) are commonly formulated as energy minimization problems. Numerical solutions to such problems are well-established. However, these are often too slow to be used directly in real-time applications. The alternative is to learn solution manifolds for control problems in an offline stage. Although this distillation process can be trivially formulated as a behavioral cloning (BC) problem in an imitation learning setting, our experiments highlight a number of significant shortcomings arising due to incompatible local minima, interpolation artifacts, and insufficient coverage of the state space. In this paper, we propose an alternative to BC that is efficient and numerically robust. We formulate the learning of solution manifolds as a minimization of the energy terms of a control objective integrated over the space of problems of interest. We minimize this energy integral with a novel method that combines Monte Carlo-inspired adaptive sampling strategies with the derivatives used to solve individual instances of the control task. We evaluate the performance of our formulation on a series of robotic control problems of increasing complexity, and we highlight its benefits through comparisons against traditional methods such as behavioral cloning and Dataset aggregation (Dagger).