铅：基于自由特征的蒸馏

论文标题

铅：基于自由特征的蒸馏

LEAD: Liberal Feature-based Distillation for Dense Retrieval

论文作者

Sun, Hao, Liu, Xiao, Gong, Yeyun, Dong, Anlei, Lu, Jingwen, Zhang, Yan, Yang, Linjun, Majumder, Rangan, Duan, Nan

论文摘要

知识蒸馏通常用于将知识从强大的教师模型转移到相对较弱的学生模型。传统方法包括基于响应的方法和基于特征的方法。基于响应的方法被广泛使用，但由于对中间信号的无知，其性能的上限较高限制，而基于特征的方法对词汇，图形器和模型体系结构的限制有限制。在本文中，我们提出了一种基于自由特征的蒸馏方法（LEAD）。 Lead将教师模型和学生模型的中间层之间的分布保持一致，这是有效，可扩展，便携式，并且对词汇，代币或模型体系结构没有要求。广泛的实验表明，铅在广泛使用的基准测试中的有效性，包括MS MARCO通道排名，TREC 2019 DL轨道，MS MARCO文档排名和TREC 2020 DL Track。我们的代码可在https://github.com/microsoft/simxns/tree/main/lead中找到。

Knowledge distillation is often used to transfer knowledge from a strong teacher model to a relatively weak student model. Traditional methods include response-based methods and feature-based methods. Response-based methods are widely used but suffer from lower upper limits of performance due to their ignorance of intermediate signals, while feature-based methods have constraints on vocabularies, tokenizers and model architectures. In this paper, we propose a liberal feature-based distillation method (LEAD). LEAD aligns the distribution between the intermediate layers of teacher model and student model, which is effective, extendable, portable and has no requirements on vocabularies, tokenizers, or model architectures. Extensive experiments show the effectiveness of LEAD on widely-used benchmarks, including MS MARCO Passage Ranking, TREC 2019 DL Track, MS MARCO Document Ranking and TREC 2020 DL Track. Our code is available in https://github.com/microsoft/SimXNS/tree/main/LEAD.

下载PDF全文

下载文献需遵守相关版权规定

论文标题