通过基于任务的评估从演示中学习绩效图

论文标题

通过基于任务的评估从演示中学习绩效图

Learning Performance Graphs from Demonstrations via Task-Based Evaluations

论文作者

Puranic, Aniruddh G., Deshmukh, Jyotirmoy V., Nikolaidis, Stefanos

论文摘要

在从示范中学习（LFD）范式中，理解和评估演示行为在提取机器人的控制策略中起着至关重要的作用。没有这些知识，机器人可能会推断出不正确的奖励功能，从而导致不良或不安全的控制政策。最近的工作提出了一个LFD框架，用户提供了一组正式的任务规格来指导LFD，以应对奖励成型的挑战。但是，在此框架中，规格是在性能图中手动订购的（指定规格之间相对重要性的部分顺序）。本文的主要贡献是直接从用户提供的演示中学习性能图的算法，并表明使用学习性能图生成的奖励功能与手动指定的性能图生成了相似的策略。我们执行了一项用户研究，该研究表明用户在模拟高速公路驾驶域中的行为指定的优先级与自动推断性能图匹配。这就确定了我们可以准确评估无需专家标准的任务规范的用户演示。

In the learning from demonstration (LfD) paradigm, understanding and evaluating the demonstrated behaviors plays a critical role in extracting control policies for robots. Without this knowledge, a robot may infer incorrect reward functions that lead to undesirable or unsafe control policies. Recent work has proposed an LfD framework where a user provides a set of formal task specifications to guide LfD, to address the challenge of reward shaping. However, in this framework, specifications are manually ordered in a performance graph (a partial order that specifies relative importance between the specifications). The main contribution of this paper is an algorithm to learn the performance graph directly from the user-provided demonstrations, and show that the reward functions generated using the learned performance graph generate similar policies to those from manually specified performance graphs. We perform a user study that shows that priorities specified by users on behaviors in a simulated highway driving domain match the automatically inferred performance graph. This establishes that we can accurately evaluate user demonstrations with respect to task specifications without expert criteria.

下载PDF全文

下载文献需遵守相关版权规定

论文标题