检测器：通过有针对性的增强学习扩展符号计划运营商

论文标题

检测器：通过有针对性的增强学习扩展符号计划运营商

SPOTTER: Extending Symbolic Planning Operators through Targeted Reinforcement Learning

论文作者

Sarathy, Vasanth, Kasenberg, Daniel, Goel, Shivam, Sinapov, Jivko, Scheutz, Matthias

论文摘要

符号计划模型允许决策代理以任意方式对动作进行测序，以实现动态域中的各种目标。但是，它们通常是手工制作的，并且倾向于需要精确的配方，而这些配方对人为错误并不强大。强化学习（RL）方法不需要此类模型，而是通过探索环境和收集奖励来学习域动态。但是，RL方法倾向于需要数百万个经验，并且经常学习不容易转移到其他任务的政策。在本文中，我们解决了整合这些方法的开放问题的一个方面：决策代理如何在试图实现目标的同时解决其符号计划模型中的差异？我们提出了一个名为Spotter的集成框架，该框架使用RL来增强和支持（“ Spot”）计划代理，通过发现代理商所需的新操作员来实现最初无法实现代理商的目标。 Spotter的表现优于纯RL方法，同时也发现可转移的符号知识，并且不需要监督，成功的计划跟踪或有关缺失计划操作员的任何先验知识。

Symbolic planning models allow decision-making agents to sequence actions in arbitrary ways to achieve a variety of goals in dynamic domains. However, they are typically handcrafted and tend to require precise formulations that are not robust to human error. Reinforcement learning (RL) approaches do not require such models, and instead learn domain dynamics by exploring the environment and collecting rewards. However, RL approaches tend to require millions of episodes of experience and often learn policies that are not easily transferable to other tasks. In this paper, we address one aspect of the open problem of integrating these approaches: how can decision-making agents resolve discrepancies in their symbolic planning models while attempting to accomplish goals? We propose an integrated framework named SPOTTER that uses RL to augment and support ("spot") a planning agent by discovering new operators needed by the agent to accomplish goals that are initially unreachable for the agent. SPOTTER outperforms pure-RL approaches while also discovering transferable symbolic knowledge and does not require supervision, successful plan traces or any a priori knowledge about the missing planning operator.

下载PDF全文

下载文献需遵守相关版权规定

论文标题