在CNN中学习翻译不变性

论文标题

在CNN中学习翻译不变性

Learning Translation Invariance in CNNs

论文作者

Biscione, Valerio, Bowers, Jeffrey

论文摘要

当看到一个新对象时，人类可以立即在不同的视网膜位置识别它：我们说内部对象表示不变。人们普遍认为，由于它们所赋予的卷积和/或汇总操作，卷积神经网络（CNN）在翻译上是不变的。实际上，几项工作发现这些网络系统地无法识别未经训练的位置上的新对象。在这项工作中，我们表明，即使CNN并不是翻译的“建筑不变”，但它们确实可以“学习”以翻译不变。我们验证了这可以通过在Imagenet上进行预处理来实现，并且我们发现，使用更简单的数据集，可以在输入帆布上完全翻译这些项目。我们调查了这种预算训练如何影响内部网络表示形式，发现几乎总是获得了不变性，尽管由于灾难性的遗忘/干扰，这有时会因进一步的训练而破坏。这些实验表明，在具有正确的“潜在”特征（更自然的环境）的环境上预处理网络如何导致网络学习深度感知规则，从而极大地改善了随后的概括。

When seeing a new object, humans can immediately recognize it across different retinal locations: we say that the internal object representation is invariant to translation. It is commonly believed that Convolutional Neural Networks (CNNs) are architecturally invariant to translation thanks to the convolution and/or pooling operations they are endowed with. In fact, several works have found that these networks systematically fail to recognise new objects on untrained locations. In this work we show how, even though CNNs are not 'architecturally invariant' to translation, they can indeed 'learn' to be invariant to translation. We verified that this can be achieved by pretraining on ImageNet, and we found that it is also possible with much simpler datasets in which the items are fully translated across the input canvas. We investigated how this pretraining affected the internal network representations, finding that the invariance was almost always acquired, even though it was some times disrupted by further training due to catastrophic forgetting/interference. These experiments show how pretraining a network on an environment with the right 'latent' characteristics (a more naturalistic environment) can result in the network learning deep perceptual rules which would dramatically improve subsequent generalization.

下载PDF全文

下载文献需遵守相关版权规定

论文标题