亚当：与自适应黑暗实例的密集检索蒸馏

论文标题

亚当：与自适应黑暗实例的密集检索蒸馏

Adam: Dense Retrieval Distillation with Adaptive Dark Examples

论文作者

Tao, Chongyang, Liu, Chang, Shen, Tao, Xu, Can, Geng, Xiubo, Jiao, Binxing, Jiang, Daxin

论文摘要

为了提高双重编码猎犬的性能，一种有效的方法是跨编码器排名者的知识蒸馏。现有作品在监督学习环境之后构建了候选段落，其中查询与积极的段落和一批负面关系配对。但是，通过经验观察，我们发现，即使是先进方法中的艰难负面因素仍然太微不足道，无法区分老师，从而阻止教师通过其软标签将丰富的黑暗知识转移给学生。为了减轻这个问题，我们提出了一个知识蒸馏框架，可以更好地通过自适应的黑暗例子来转移老师在老师中拥有的黑暗知识。与以前仅依靠一个积极和艰苦的负面因素作为候选段落的作品不同，我们创建了黑暗示例，这些示例都通过在离散空间中的混合和掩盖与查询具有中等相关性。此外，随着不同培训实例中的知识质量随教师的置信度评分而变化，我们提出了一种自进度的蒸馏策略，该策略会自适应地集中于高质量实例的一部分，以进行基于黑暗的示例知识蒸馏，以帮助学生学习得更好。我们对两个广泛使用的基准进行实验，并验证我们方法的有效性。

To improve the performance of the dual-encoder retriever, one effective approach is knowledge distillation from the cross-encoder ranker. Existing works construct the candidate passages following the supervised learning setting where a query is paired with a positive passage and a batch of negatives. However, through empirical observation, we find that even the hard negatives from advanced methods are still too trivial for the teacher to distinguish, preventing the teacher from transferring abundant dark knowledge to the student through its soft label. To alleviate this issue, we propose ADAM, a knowledge distillation framework that can better transfer the dark knowledge held in the teacher with Adaptive Dark exAMples. Different from previous works that only rely on one positive and hard negatives as candidate passages, we create dark examples that all have moderate relevance to the query through mixing-up and masking in discrete space. Furthermore, as the quality of knowledge held in different training instances varies as measured by the teacher's confidence score, we propose a self-paced distillation strategy that adaptively concentrates on a subset of high-quality instances to conduct our dark-example-based knowledge distillation to help the student learn better. We conduct experiments on two widely-used benchmarks and verify the effectiveness of our method.

下载PDF全文

下载文献需遵守相关版权规定

论文标题