绕过对抗性自然语言推理数据集

论文标题

绕过对抗性自然语言推理数据集

ANLIzing the Adversarial Natural Language Inference Dataset

论文作者

Williams, Adina, Thrush, Tristan, Kiela, Douwe

论文摘要

我们对对抗性NLI（ANLI）进行了深入的误差分析，这是最近引入的大规模的人类和模型的自然语言推理数据集，该数据集收集了多个回合。我们提出了一个针对黄金分类标签的推理不同方面的细粒注释方案，并将其用于手工编码所有三个ANLI开发集。我们使用这些注释来回答各种有趣的问题：哪种推理类型最常见，哪些模型在每种推理类型上具有最高的性能，哪种类型是现状模型最具挑战性的？我们希望我们的注释能够对在ANLI培训的模型进行更细粒度的评估，使我们对模型失败和成功的地方有更深入的了解，并帮助我们确定如何在将来培训更好的模型。

We perform an in-depth error analysis of Adversarial NLI (ANLI), a recently introduced large-scale human-and-model-in-the-loop natural language inference dataset collected over multiple rounds. We propose a fine-grained annotation scheme of the different aspects of inference that are responsible for the gold classification labels, and use it to hand-code all three of the ANLI development sets. We use these annotations to answer a variety of interesting questions: which inference types are most common, which models have the highest performance on each reasoning type, and which types are the most challenging for state of-the-art models? We hope that our annotations will enable more fine-grained evaluation of models trained on ANLI, provide us with a deeper understanding of where models fail and succeed, and help us determine how to train better models in future.

下载PDF全文

下载文献需遵守相关版权规定

论文标题