通过多种偏差的并发建模来改善质量检查的概括

论文标题

通过多种偏差的并发建模来改善质量检查的概括

Improving QA Generalization by Concurrent Modeling of Multiple Biases

论文作者

Wu, Mingzhu, Moosavi, Nafise Sadat, Rücklé, Andreas, Gurevych, Iryna

论文摘要

现有的NLP数据集包含各种偏差，模型可以轻松利用这些偏差，以在相应的评估集上实现高性能。但是，专注于数据集特定的偏见限制了他们从更一般的数据模式中了解有关任务的更广泛知识的能力。在本文中，我们调查了借鉴方法改善概括的影响，并提出了一个通用框架，以通过同时建模培训数据中的多个偏差来改善内域和室外数据集的性能。我们的框架根据其包含的偏见和训练数据中这些偏见的强度加权每个示例。然后，它在训练目标中使用了这些权重，因此模型较少依赖于高偏置权重的示例。我们通过来自各个领域的培训数据的培训数据，广泛评估我们的框架，这些数据具有不同的优势偏见。我们在两种不同的设置中执行评估，其中模型同时在单个域或多个域上进行了训练，并在两种设置中都显示出与最新的偏见方法相比在两种情况下显示出其有效性。

Existing NLP datasets contain various biases that models can easily exploit to achieve high performances on the corresponding evaluation sets. However, focusing on dataset-specific biases limits their ability to learn more generalizable knowledge about the task from more general data patterns. In this paper, we investigate the impact of debiasing methods for improving generalization and propose a general framework for improving the performance on both in-domain and out-of-domain datasets by concurrent modeling of multiple biases in the training data. Our framework weights each example based on the biases it contains and the strength of those biases in the training data. It then uses these weights in the training objective so that the model relies less on examples with high bias weights. We extensively evaluate our framework on extractive question answering with training data from various domains with multiple biases of different strengths. We perform the evaluations in two different settings, in which the model is trained on a single domain or multiple domains simultaneously, and show its effectiveness in both settings compared to state-of-the-art debiasing methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题