关于非线性最佳控制中成本函数设计的计算后果

论文标题

关于非线性最佳控制中成本函数设计的计算后果

On the Computational Consequences of Cost Function Design in Nonlinear Optimal Control

论文作者

Westenbroek, Tyler, Siththaranjan, Anand, Sarwari, Mohsin, Tomlin, Claire J., Sastry, Shankar S.

论文摘要

最佳控制是稳定复杂非线性系统的重要工具。然而，尽管诸如逐渐控制，动态编程和强化学习等方法产生了广泛的影响，但特定系统的成本功能的设计通常仍然是启发式驱动的反复试验过程。在本文中，我们试图了解成本函数的选择如何与控制系统的基础结构相互作用，并影响获得稳定控制器所需的计算量。我们将成本设计问题视为一个两步过程，设计人员指定要进行惩罚的系统的输出，然后调节输入的相对权重和成本的输出。为了表征与以特定成本获得稳定控制器相关的计算负担，我们绑定了通过退缩的地平线方法和动态编程方法所需的迭代次数所需的预测范围，以满足此要求。从设计的角度来看，当所选输出诱导最小相或非最低相行为时，我们的理论结果突出了可能的质量分离。模拟研究表明，这种分离也适用于现代强化学习方法。

Optimal control is an essential tool for stabilizing complex nonlinear systems. However, despite the extensive impacts of methods such as receding horizon control, dynamic programming and reinforcement learning, the design of cost functions for a particular system often remains a heuristic-driven process of trial and error. In this paper we seek to gain insights into how the choice of cost function interacts with the underlying structure of the control system and impacts the amount of computation required to obtain a stabilizing controller. We treat the cost design problem as a two-step process where the designer specifies outputs for the system that are to be penalized and then modulates the relative weighting of the inputs and the outputs in the cost. To characterize the computational burden associated to obtaining a stabilizing controller with a particular cost, we bound the prediction horizon required by receding horizon methods and the number of iterations required by dynamic programming methods to meet this requirement. Our theoretical results highlight a qualitative separation between what is possible, from a design perspective, when the chosen outputs induce either minimum-phase or non-minimum-phase behavior. Simulation studies indicate that this separation also holds for modern reinforcement learning methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题