通过奖励机器中的基于视觉机器人技术在基于视觉的机器人中进行了控制

论文标题

通过奖励机器中的基于视觉机器人技术在基于视觉的机器人中进行了控制

Disentangled Planning and Control in Vision Based Robotics via Reward Machines

论文作者

Camacho, Alberto, Varley, Jacob, Jain, Deepali, Iscen, Atil, Kalashnikov, Dmitry

论文摘要

在这项工作中，我们使用奖励机（DQRM）增强了深入的Q学习代理，以提高针对机器人任务的基于学习视觉的策略的速度，并克服DQN的某些局限性，以防止其融合到优质政策。奖励机（RM）是一台有限状态机，将任务分解为离散计划图，并为代理提供奖励功能，以指导其完成任务完成。奖励机器可以用于奖励成型，并告知政策当前的抽象状态。抽象状态是对当前状态的高级简化，该状态根据任务相关的功能定义。这两个监督信号的奖励成型和对当前抽象状态的知识相互补充，并且可以用来改善政策绩效，如几个基于视觉的机器人选择和放置任务所证明的那样。特别是对于基于视觉的机器人技术应用程序，构建奖励机通常比尝试在没有这种结构的情况下学习任务的策略要容易得多。

In this work we augment a Deep Q-Learning agent with a Reward Machine (DQRM) to increase speed of learning vision-based policies for robot tasks, and overcome some of the limitations of DQN that prevent it from converging to good-quality policies. A reward machine (RM) is a finite state machine that decomposes a task into a discrete planning graph and equips the agent with a reward function to guide it toward task completion. The reward machine can be used for both reward shaping, and informing the policy what abstract state it is currently at. An abstract state is a high level simplification of the current state, defined in terms of task relevant features. These two supervisory signals of reward shaping and knowledge of current abstract state coming from the reward machine complement each other and can both be used to improve policy performance as demonstrated on several vision based robotic pick and place tasks. Particularly for vision based robotics applications, it is often easier to build a reward machine than to try and get a policy to learn the task without this structure.

下载PDF全文

下载文献需遵守相关版权规定

论文标题