论文标题
绕过对抗性自然语言推理数据集
ANLIzing the Adversarial Natural Language Inference Dataset
论文作者
论文摘要
我们对对抗性NLI(ANLI)进行了深入的误差分析,这是最近引入的大规模的人类和模型的自然语言推理数据集,该数据集收集了多个回合。我们提出了一个针对黄金分类标签的推理不同方面的细粒注释方案,并将其用于手工编码所有三个ANLI开发集。我们使用这些注释来回答各种有趣的问题:哪种推理类型最常见,哪些模型在每种推理类型上具有最高的性能,哪种类型是现状模型最具挑战性的?我们希望我们的注释能够对在ANLI培训的模型进行更细粒度的评估,使我们对模型失败和成功的地方有更深入的了解,并帮助我们确定如何在将来培训更好的模型。
We perform an in-depth error analysis of Adversarial NLI (ANLI), a recently introduced large-scale human-and-model-in-the-loop natural language inference dataset collected over multiple rounds. We propose a fine-grained annotation scheme of the different aspects of inference that are responsible for the gold classification labels, and use it to hand-code all three of the ANLI development sets. We use these annotations to answer a variety of interesting questions: which inference types are most common, which models have the highest performance on each reasoning type, and which types are the most challenging for state of-the-art models? We hope that our annotations will enable more fine-grained evaluation of models trained on ANLI, provide us with a deeper understanding of where models fail and succeed, and help us determine how to train better models in future.