不断发展的图形规划师：视觉和语言导航的上下文全球规划

论文标题

不断发展的图形规划师：视觉和语言导航的上下文全球规划

Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation

论文作者

Deng, Zhiwei, Narasimhan, Karthik, Russakovsky, Olga

论文摘要

执行有效计划的能力对于建立指导辅助代理至关重要。当在新环境中导航时，代理会受到（1）将自然语言指示与对世界逐渐不断增长的知识联系起来的挑战；（2）以有效的探索和错误校正形式执行远程计划和决策。尽管进行了广泛的努力，但目前的方法仍在两个方面仍受到限制。在本文中，我们介绍了不断发展的图形规划师（EGP），该模型基于原始感觉输入执行全局计划进行导航。该模型动态构建图形表示，概括了动作空间以进行更灵活的决策，并对代理图表表示有效计划。我们将模型评估为具有挑战性的视觉和语言导航（VLN）任务，并与先前的导航体系结构相比，获得了卓越的性能。例如，通过纯模仿学习，我们在房间到室导航任务的测试拆分上达到了53％的成功率，超过了先前的导航体系结构高达5％。

The ability to perform effective planning is crucial for building an instruction-following agent. When navigating through a new environment, an agent is challenged with (1) connecting the natural language instructions with its progressively growing knowledge of the world; and (2) performing long-range planning and decision making in the form of effective exploration and error correction. Current methods are still limited on both fronts despite extensive efforts. In this paper, we introduce the Evolving Graphical Planner (EGP), a model that performs global planning for navigation based on raw sensory input. The model dynamically constructs a graphical representation, generalizes the action space to allow for more flexible decision making, and performs efficient planning on a proxy graph representation. We evaluate our model on a challenging Vision-and-Language Navigation (VLN) task with photorealistic images and achieve superior performance compared to previous navigation architectures. For instance, we achieve a 53% success rate on the test split of the Room-to-Room navigation task through pure imitation learning, outperforming previous navigation architectures by up to 5%.

下载PDF全文

下载文献需遵守相关版权规定

论文标题