探索简单的暹罗代表学习

论文标题

探索简单的暹罗代表学习

Exploring Simple Siamese Representation Learning

论文作者

Chen, Xinlei, He, Kaiming

论文摘要

在无监督的视觉表示学习的各种模型中，暹罗网络已成为一种共同的结构。这些模型最大程度地提高了一个图像的两个增强量之间的相似性，但要避免避免解决方案崩溃的条件。在本文中，我们报告了令人惊讶的经验结果，即简单的暹罗网络也可以学习有意义的表示形式，即使没有以下内容：（i）负示例对，（ii）大批次，（iii）动量编码器。我们的实验表明，损失和结构确实存在崩溃的解决方案，但是定型梯度操作在防止崩溃中起着至关重要的作用。我们提供了关于定型梯度的含义的假设，并进一步显示了概念验证实验对其进行了验证。我们的“ Simsiam”方法在ImageNet和下游任务上取得了竞争成果。我们希望这个简单的基准能够激励人们重新考虑暹罗体系结构的角色，以进行无监督的代表学习。代码将提供。

Siamese networks have become a common structure in various recent models for unsupervised visual representation learning. These models maximize the similarity between two augmentations of one image, subject to certain conditions for avoiding collapsing solutions. In this paper, we report surprising empirical results that simple Siamese networks can learn meaningful representations even using none of the following: (i) negative sample pairs, (ii) large batches, (iii) momentum encoders. Our experiments show that collapsing solutions do exist for the loss and structure, but a stop-gradient operation plays an essential role in preventing collapsing. We provide a hypothesis on the implication of stop-gradient, and further show proof-of-concept experiments verifying it. Our "SimSiam" method achieves competitive results on ImageNet and downstream tasks. We hope this simple baseline will motivate people to rethink the roles of Siamese architectures for unsupervised representation learning. Code will be made available.

下载PDF全文

下载文献需遵守相关版权规定

论文标题