通过反事实轨迹解释强化学习政策

论文标题

通过反事实轨迹解释强化学习政策

Explaining Reinforcement Learning Policies through Counterfactual Trajectories

论文作者

Frost, Julius, Watkins, Olivia, Weiner, Eric, Abbeel, Pieter, Darrell, Trevor, Plummer, Bryan, Saenko, Kate

论文摘要

为了使人自信地决定在哪里使用RL代理来实现现实世界任务，人类开发人员必须验证该代理在测试时间的表现良好。某些策略解释性方法通过在一组代理推出中捕获政策的决策来促进这一点。但是，即使是最有用的训练时间行为轨迹，也可能几乎没有洞悉代理商的行为。相比之下，我们的方法通过在更广泛的轨迹分布中显示代理的行为，传达了代理在分布下方的行为。我们通过指导代理到更多样化的看不见的状态并在那里展示代理的行为来生成这些轨迹。在用户研究中，我们证明了我们的方法使用户能够在两个代理验证任务之一上的一种比基线方法更好。

In order for humans to confidently decide where to employ RL agents for real-world tasks, a human developer must validate that the agent will perform well at test-time. Some policy interpretability methods facilitate this by capturing the policy's decision making in a set of agent rollouts. However, even the most informative trajectories of training time behavior may give little insight into the agent's behavior out of distribution. In contrast, our method conveys how the agent performs under distribution shifts by showing the agent's behavior across a wider trajectory distribution. We generate these trajectories by guiding the agent to more diverse unseen states and showing the agent's behavior there. In a user study, we demonstrate that our method enables users to score better than baseline methods on one of two agent validation tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题