使用具有正式保证的数据构建MDP抽象

论文标题

使用具有正式保证的数据构建MDP抽象

Constructing MDP Abstractions Using Data with Formal Guarantees

论文作者

Lavaei, Abolfazl, Soudjani, Sadegh, Frazzoli, Emilio, Zamani, Majid

论文摘要

本文涉及一种数据驱动的技术，该技术用于构建有限的马尔可夫决策过程（MDP），作为具有未知动态的离散时间随机控制系统的有限抽象，同时提供正式的紧密性保证。所提出的方案基于随机分配函数（SBF）的概念，以捕获未知随机系统的状态轨迹与有限MDP的状态轨迹之间的概率距离。在我们提出的环境中，我们首先将SBF的相应条件重新制定为强大的凸面程序（RCP）。然后，我们通过从系统轨迹收集有限数量的数据来提出与原始RCP相关联的方案凸面程序（SCP）。我们最终通过在SCP和RCP的最佳值之间建立概率关系，在数据驱动的有限MDP和未知随机系统之间构建SBF。我们还提出了两种不同的方法，用于从数据中构建有限的MDP。我们说明了结果对具有未知动力学的非线性喷气发动机压缩机的功效。我们将数据驱动的有限MDP构建为原始系统的合适替代品，以合成控制系统以一定的满意度和理想的置信水平来维护系统的控制器。

This paper is concerned with a data-driven technique for constructing finite Markov decision processes (MDPs) as finite abstractions of discrete-time stochastic control systems with unknown dynamics while providing formal closeness guarantees. The proposed scheme is based on notions of stochastic bisimulation functions (SBF) to capture the probabilistic distance between state trajectories of an unknown stochastic system and those of finite MDP. In our proposed setting, we first reformulate corresponding conditions of SBF as a robust convex program (RCP). We then propose a scenario convex program (SCP) associated to the original RCP by collecting a finite number of data from trajectories of the system. We ultimately construct an SBF between the data-driven finite MDP and the unknown stochastic system with a given confidence level by establishing a probabilistic relation between optimal values of the SCP and the RCP. We also propose two different approaches for the construction of finite MDPs from data. We illustrate the efficacy of our results over a nonlinear jet engine compressor with unknown dynamics. We construct a data-driven finite MDP as a suitable substitute of the original system to synthesize controllers maintaining the system in a safe set with some probability of satisfaction and a desirable confidence level.

下载PDF全文

下载文献需遵守相关版权规定

论文标题