用于样本有效培训和全面评估的结构多样性抽样

论文标题

用于样本有效培训和全面评估的结构多样性抽样

Structurally Diverse Sampling for Sample-Efficient Training and Comprehensive Evaluation

论文作者

Gupta, Shivanshu, Singh, Sameer, Gardner, Matt

论文摘要

越来越多的研究表明，NLP模型无法从组成上概括，并试图通过专门的体系结构，培训方案和数据增强等方法来减轻它。在这项工作中，我们研究了一种不同的方法：对具有不同结构的实例进行培训。我们为从带有结构化输出的标记的实例库中进行了一个模型算法，用于在标记的实例池中进行此类实例。评估5个不同复杂性的5个语义解析数据集的构图模板拆分和传统的IID分裂，我们表明，使用我们的算法的结构多样化的训练导致了比10个数据集类型对中9分的9分中的9分。通常，我们发现与随机火车组相比，结构多样性可以始终如一地提高样品效率。此外，我们表明，结构上多样化的采样产生了全面的测试集，这些测试集比IID测试集更具挑战性。最后，我们提供了两种解释，以改善各种火车集的概括：1）改善了输出子结构的覆盖范围，以及2）减少这些子结构之间的虚假相关性。

A growing body of research has demonstrated the inability of NLP models to generalize compositionally and has tried to alleviate it through specialized architectures, training schemes, and data augmentation, among other approaches. In this work, we study a different approach: training on instances with diverse structures. We propose a model-agnostic algorithm for subsampling such sets of instances from a labeled instance pool with structured outputs. Evaluating on both compositional template splits and traditional IID splits of 5 semantic parsing datasets of varying complexity, we show that structurally diverse training using our algorithm leads to comparable or better generalization than prior algorithms in 9 out of 10 dataset-split type pairs. In general, we find structural diversity to consistently improve sample efficiency compared to random train sets. Moreover, we show that structurally diverse sampling yields comprehensive test sets that are a lot more challenging than IID test sets. Finally, we provide two explanations for improved generalization from diverse train sets: 1) improved coverage of output substructures, and 2) a reduction in spurious correlations between these substructures.

下载PDF全文

下载文献需遵守相关版权规定

论文标题