特征映射级在线对抗知识蒸馏

论文标题

特征映射级在线对抗知识蒸馏

Feature-map-level Online Adversarial Knowledge Distillation

论文作者

Chung, Inseop, Park, SeongUk, Kim, Jangho, Kwak, Nojun

论文摘要

特征地图包含有关图像强度和空间相关性的丰富信息。但是，以前的在线知识蒸馏方法仅利用类概率。因此，在本文中，我们提出了一种在线知识蒸馏方法，该方法不仅传递了类概率的知识，还可以使用对抗性培训框架传输特征图的知识。我们通过使用歧视器来区分不同网络的特征图分布，同时训练多个网络。每个网络都有其相应的歧视器，该歧视器将功能图与其自身的伪造歧视为伪造，同时将另一个网络的图表分类为真实。通过训练网络欺骗相应的歧视器，它可以学习另一个网络的特征映射分布。我们表明，我们的方法的性能要比传统的直接对准方法（例如L1）更好，并且更适合在线蒸馏。另外，我们提出了一种新型的循环学习方案，用于一起培训两个以上的网络。我们已经将方法应用于分类任务的各种网络体系结构，并发现了性能的重大改进，尤其是在训练一对小型网络和大型网络的情况下。

Feature maps contain rich information about image intensity and spatial correlation. However, previous online knowledge distillation methods only utilize the class probabilities. Thus in this paper, we propose an online knowledge distillation method that transfers not only the knowledge of the class probabilities but also that of the feature map using the adversarial training framework. We train multiple networks simultaneously by employing discriminators to distinguish the feature map distributions of different networks. Each network has its corresponding discriminator which discriminates the feature map from its own as fake while classifying that of the other network as real. By training a network to fool the corresponding discriminator, it can learn the other network's feature map distribution. We show that our method performs better than the conventional direct alignment method such as L1 and is more suitable for online distillation. Also, we propose a novel cyclic learning scheme for training more than two networks together. We have applied our method to various network architectures on the classification task and discovered a significant improvement of performance especially in the case of training a pair of a small network and a large one.

下载PDF全文

下载文献需遵守相关版权规定

论文标题