继续仔细预测以获得更好的零和几次提示性

论文标题

继续仔细预测以获得更好的零和几次提示性

Continued Pretraining for Better Zero- and Few-Shot Promptability

论文作者

Wu, Zhaofeng, Logan IV, Robert L., Walsh, Pete, Bhagia, Akshita, Groeneveld, Dirk, Singh, Sameer, Beltagy, Iz

论文摘要

最近引入的语言模型提示方法可以在零和几乎没有弹头的设置中实现高精度，同时几乎不需要学习特定于任务的参数。然而，这些方法仍然经常落后于完整的模型登录。在这项工作中，我们调查了专门的持续预处理阶段是否可以提高“提示性”，即具有自然语言提示的零弹性表现或通过及时调整进行少量表现。我们揭示了现有的持续预处理方法缺乏提示性的设置。我们还确定了当前的方法论差距，并通过彻底的大规模实验填充。我们证明，与现有方法相比，在多任务学习期间，一个简单的食谱，持续的预处理结合了可训练的提示，可提高零和少量设置的提示性，相对相对31％。另一方面，我们发现使用MAML风格的元学习持续进行预处理，这种方法可以直接优化几乎没有弹药的提示性，从而产生低标准的性能。我们使用两种及时的调整方法来验证我们的发现，根据我们的结果，我们提供了具体的建议，以优化不同用例的提示性。

Recently introduced language model prompting methods can achieve high accuracy in zero- and few-shot settings while requiring few to no learned task-specific parameters. Nevertheless, these methods still often trail behind full model finetuning. In this work, we investigate if a dedicated continued pretraining stage could improve "promptability", i.e., zero-shot performance with natural language prompts or few-shot performance with prompt tuning. We reveal settings where existing continued pretraining methods lack promptability. We also identify current methodological gaps, which we fill with thorough large-scale experiments. We demonstrate that a simple recipe, continued pretraining that incorporates a trainable prompt during multi-task learning, leads to improved promptability in both zero- and few-shot settings compared to existing methods, up to 31% relative. On the other hand, we find that continued pretraining using MAML-style meta-learning, a method that directly optimizes few-shot promptability, yields subpar performance. We validate our findings with two prompt tuning methods, and, based on our results, we provide concrete recommendations to optimize promptability for different use cases.

下载PDF全文

下载文献需遵守相关版权规定

论文标题