用符号奖励机器的层次结构贝叶斯方法进行逆增强学习

论文标题

用符号奖励机器的层次结构贝叶斯方法进行逆增强学习

A Hierarchical Bayesian Approach to Inverse Reinforcement Learning with Symbolic Reward Machines

论文作者

Zhou, Weichao, Li, Wenchao

论文摘要

错误指定的奖励会降低样本效率，并在加强学习（RL）问题中引起不希望的行为。我们建议在指定奖励信号时合并高级任务知识的符号奖励机。符号奖励机器通过允许过渡携带谓词和符号奖励输出来增强现有的奖励机形式主义。这种形式主义非常适合逆增强学习，从而从一些专家演示中确定对符号价值的适当作业。我们提出了一种层次的贝叶斯方法，用于推断最可能的作业，以便具体的奖励机可以以高准确的方式区分专家与其他轨迹相关的轨迹。实验结果表明，学到的奖励机可以显着提高复杂的RL任务的训练效率，并在不同的任务环境配置中很好地推广。

A misspecified reward can degrade sample efficiency and induce undesired behaviors in reinforcement learning (RL) problems. We propose symbolic reward machines for incorporating high-level task knowledge when specifying the reward signals. Symbolic reward machines augment existing reward machine formalism by allowing transitions to carry predicates and symbolic reward outputs. This formalism lends itself well to inverse reinforcement learning, whereby the key challenge is determining appropriate assignments to the symbolic values from a few expert demonstrations. We propose a hierarchical Bayesian approach for inferring the most likely assignments such that the concretized reward machine can discriminate expert demonstrated trajectories from other trajectories with high accuracy. Experimental results show that learned reward machines can significantly improve training efficiency for complex RL tasks and generalize well across different task environment configurations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题