论文标题

通过EM的可证明的层次模仿学习

Provable Hierarchical Imitation Learning via EM

论文作者

Zhang, Zhiyu, Paschalidis, Ioannis

论文摘要

由于最近的经验成功,分层增强学习的选项框架正在越来越受欢迎。我们考虑从专家演示中学习一个期权类型的层次结构政策,而不是从奖励中学习奖励。这样的问题称为分层模仿学习。将此问题转换为潜在变量模型中的参数推断,我们从理论上表征了Daniel等人提出的EM方法。 (2016)。人口水平算法被分析为中间步骤,由于样本相关,这是非平凡的。如果专家策略可以通过选项框架的变体进行参数化,则在规律性条件下,我们证明所提出的算法以高概率收敛到围绕真实参数的标准球。据我们所知,这是仅观察原始国家行动对的层次模仿学习算法的第一个性能保证。

Due to recent empirical successes, the options framework for hierarchical reinforcement learning is gaining increasing popularity. Rather than learning from rewards which suffers from the curse of dimensionality, we consider learning an options-type hierarchical policy from expert demonstrations. Such a problem is referred to as hierarchical imitation learning. Converting this problem to parameter inference in a latent variable model, we theoretically characterize the EM approach proposed by Daniel et al. (2016). The population level algorithm is analyzed as an intermediate step, which is nontrivial due to the samples being correlated. If the expert policy can be parameterized by a variant of the options framework, then under regularity conditions, we prove that the proposed algorithm converges with high probability to a norm ball around the true parameter. To our knowledge, this is the first performance guarantee for an hierarchical imitation learning algorithm that only observes primitive state-action pairs.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源