关于受监督和自我监督模型之间令人惊讶的相似之处

论文标题

关于受监督和自我监督模型之间令人惊讶的相似之处

On the surprising similarities between supervised and self-supervised models

论文作者

Geirhos, Robert, Narayanappa, Kantharaju, Mitzkus, Benjamin, Bethge, Matthias, Wichmann, Felix A., Brendel, Wieland

论文摘要

人类如何学会获得对象的强大，灵活和强大的代表？尽管此过程的大部分仍然未知，但很明显，人类不需要数百万个对象标签。令人兴奋的是，自我监督学习的最新算法进步现在使卷积神经网络（CNN）也能够在没有监督标签的情况下学习有用的视觉对象表示。鉴于这一最近的突破，我们在这里将自我监督的网络与监督模型和人类行为进行了比较。我们在15个通用数据集上测试了可用人类行为数据的15个概括数据集（130k高度控制的心理物理试验）。 Surprisingly, current self-supervised CNNs share four key characteristics of their supervised counterparts: (1.) relatively poor noise robustness (with the notable exception of SimCLR), (2.) non-human category-level error patterns, (3.) non-human image-level error patterns (yet high similarity to supervised model errors) and (4.) a bias towards texture.综上所述，这些结果表明，通过当今监督和自我监督的培训目标所学到的策略最终非常相似，但远离类似人类的行为。话虽这么说，我们显然只是在被称为自我监督的机器视觉革命的开始，我们希望未来的自我监督模型与受到监督的模型的行为不同，并且 - 也许 - 可能与强大的人类对象识别更相似。

How do humans learn to acquire a powerful, flexible and robust representation of objects? While much of this process remains unknown, it is clear that humans do not require millions of object labels. Excitingly, recent algorithmic advancements in self-supervised learning now enable convolutional neural networks (CNNs) to learn useful visual object representations without supervised labels, too. In the light of this recent breakthrough, we here compare self-supervised networks to supervised models and human behaviour. We tested models on 15 generalisation datasets for which large-scale human behavioural data is available (130K highly controlled psychophysical trials). Surprisingly, current self-supervised CNNs share four key characteristics of their supervised counterparts: (1.) relatively poor noise robustness (with the notable exception of SimCLR), (2.) non-human category-level error patterns, (3.) non-human image-level error patterns (yet high similarity to supervised model errors) and (4.) a bias towards texture. Taken together, these results suggest that the strategies learned through today's supervised and self-supervised training objectives end up being surprisingly similar, but distant from human-like behaviour. That being said, we are clearly just at the beginning of what could be called a self-supervised revolution of machine vision, and we are hopeful that future self-supervised models behave differently from supervised ones, and---perhaps---more similar to robust human object recognition.

下载PDF全文

下载文献需遵守相关版权规定

论文标题