论文标题
多量表功能提取和用于在线知识蒸馏的融合
Multi scale Feature Extraction and Fusion for Online Knowledge Distillation
论文作者
论文摘要
在线知识蒸馏会在所有学生模型之间进行知识转移,以减轻对预培训模型的依赖。但是,现有的在线方法在很大程度上依赖于预测分布并忽略了代表性知识的进一步探索。在本文中,我们提出了一种用于在线知识蒸馏的新型多尺度特征提取和融合方法(MFEF),其中包括三个关键组成部分:多尺度特征提取,双重注意力和特征融合,以生成更有用的特征图来进行蒸馏。提出了在通道维度中的多尺度提取利用分界线和catenate的特征,以提高特征图的多尺度表示能力。为了获得更准确的信息,我们设计了双重注意,以适应重要的渠道和空间区域。此外,我们通过功能融合汇总并融合了以前的处理功能地图,以帮助培训学生模型。关于CIF AR-10,CIF AR-100和Cinic-10的广泛实验表明,MFEF转移了更有益的代表性知识,以蒸馏,并且在各种网络体系结构之间均优于替代方法
Online knowledge distillation conducts knowledge transfer among all student models to alleviate the reliance on pre-trained models. However, existing online methods rely heavily on the prediction distributions and neglect the further exploration of the representational knowledge. In this paper, we propose a novel Multi-scale Feature Extraction and Fusion method (MFEF) for online knowledge distillation, which comprises three key components: Multi-scale Feature Extraction, Dual-attention and Feature Fusion, towards generating more informative feature maps for distillation. The multiscale feature extraction exploiting divide-and-concatenate in channel dimension is proposed to improve the multi-scale representation ability of feature maps. To obtain more accurate information, we design a dual-attention to strengthen the important channel and spatial regions adaptively. Moreover, we aggregate and fuse the former processed feature maps via feature fusion to assist the training of student models. Extensive experiments on CIF AR-10, CIF AR-100, and CINIC-10 show that MFEF transfers more beneficial representational knowledge for distillation and outperforms alternative methods among various network architectures