柠檬：基于语言的环境通过执行指导预训练

论文标题

柠檬：基于语言的环境通过执行指导预训练

LEMON: Language-Based Environment Manipulation via Execution-Guided Pre-training

论文作者

Shi, Qi, Liu, Qian, Chen, Bei, Zhang, Yu, Liu, Ting, Lou, Jian-Guang

论文摘要

基于语言的环境操纵要求代理按照自然语言说明来操纵环境，这是由于环境的巨大空间而具有挑战性的。为了应对这一挑战，在最近的工作中提出了各种方法。尽管这些方法适合其预期环境，但它们很难跨环境概括。在这项工作中，我们提出了柠檬，这是一个基于语言的环境操纵任务的一般框架。具体而言，我们首先为基于语言的环境操纵任务指定了一种任务无关的方法，该方法可以使用相同的生成语言模型来处理各种环境。然后，我们提出了一种执行指导的预训练策略，将环境的先验知识注入语言模型，并具有纯合成的预训练语料库。在包括炼金术，场景，tangrams，propara和食谱在内的任务的实验结果证明了柠檬的有效性：它在四个任务上实现了新的最新结果，并且执行指导的预训练策略为所有实验任务带来了显着的改进。

Language-based environment manipulation requires agents to manipulate the environment following natural language instructions, which is challenging due to the huge space of the environments. To address this challenge, various approaches have been proposed in recent work. Although these approaches work well for their intended environments, they are difficult to generalize across environments. In this work, we propose LEMON, a general framework for language-based environment manipulation tasks. Specifically, we first specify a task-agnostic approach for language-based environment manipulation tasks, which can deal with various environments using the same generative language model. Then we propose an execution-guided pre-training strategy to inject prior knowledge of environments to the language model with a pure synthetic pre-training corpus. Experimental results on tasks including Alchemy, Scene, Tangrams, ProPara and Recipes demonstrate the effectiveness of LEMON: it achieves new state-of-the-art results on four of the tasks, and the execution-guided pre-training strategy brings remarkable improvements on all experimental tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题