概率安全优化中的元学习

论文标题

概率安全优化中的元学习

Meta-active Learning in Probabilistically-Safe Optimization

论文作者

Schrum, Mariah L., Connolly, Mark, Cole, Eric, Ghetiya, Mihir, Gross, Robert, Gombolay, Matthew C.

论文摘要

学习控制具有潜在动态的安全至关重要系统（例如，对于深脑刺激），需要采取计算的风险，以尽可能有效地获取信息。为了解决此问题，我们提出了一种概率安全的元学习方法，以有效地学习系统动态和最佳配置。我们将这个问题作为元学习函数作为元学习，该功能由编码采样历史记录的长期术语内存网络（LSTM）表示。此采集功能是离线学习的，以学习高质量抽样策略。我们采用混合企业线性程序作为我们的政策，其LSTM采集功能的最终线性化层直接编码为以交易预期信息增益（例如，在系统动力学模型的准确性上提高了预期信息增益的目标），并具有安全控制的可能性。我们为积极学习设定了新的最先进的方法，以控制具有变化动态的高维系统（即损坏的飞机），可实现46％的信息增益和计算时间比基线的20％速度。此外，我们证明了系统学习最佳参数设置的能力，以避免不必要的副作用（即触发癫痫发作），以优于先前的先验方法，而信息获益增加了58％。此外，我们的算法达到了97％的终止状态的可能性，同时仅损失了15％的信息增益。

Learning to control a safety-critical system with latent dynamics (e.g. for deep brain stimulation) requires taking calculated risks to gain information as efficiently as possible. To address this problem, we present a probabilistically-safe, meta-active learning approach to efficiently learn system dynamics and optimal configurations. We cast this problem as meta-learning an acquisition function, which is represented by a Long-Short Term Memory Network (LSTM) encoding sampling history. This acquisition function is meta-learned offline to learn high quality sampling strategies. We employ a mixed-integer linear program as our policy with the final, linearized layers of our LSTM acquisition function directly encoded into the objective to trade off expected information gain (e.g., improvement in the accuracy of the model of system dynamics) with the likelihood of safe control. We set a new state-of-the-art in active learning for control of a high-dimensional system with altered dynamics (i.e., a damaged aircraft), achieving a 46% increase in information gain and a 20% speedup in computation time over baselines. Furthermore, we demonstrate our system's ability to learn the optimal parameter settings for deep brain stimulation in a rat's brain while avoiding unwanted side effects (i.e., triggering seizures), outperforming prior state-of-the-art approaches with a 58% increase in information gain. Additionally, our algorithm achieves a 97% likelihood of terminating in a safe state while losing only 15% of information gain.

下载PDF全文

下载文献需遵守相关版权规定

论文标题