论文标题
视觉模型中的深度和表示
Depth and Representation in Vision Models
论文作者
论文摘要
深度学习模型在顺序层中开发了其输入的连续表示,最后将最终表示形式映射到输出。在这里,我们通过观察卷积图像分类模型使用各种层中存在的嵌入式自动码头的输入的能力来研究这些表示形式的信息内容。我们发现层越深,在训练之前,输入的表示层的表示越准确。不准确的表示来自非唯一性的结果,其中各种不同的输入给出了大致相同的嵌入。非唯一表示是向前传球中存在的转换的精确和近似不可抑制性的结果。学会对自然图像进行分类会导致早期但不是晚层的表示清晰度的提高,而较晚的层则形成了抽象图像。与其简单地选择分类所需的输入中存在的功能,还发现深层表示可以转换输入,以使其匹配培训数据的表示形式,以便将任意输入映射到训练过程中学习的歧管。这项工作为理论提供了支持,即即使是专门针对训练进行分类的模型,图像识别和输入生成的任务也是不可分割的。
Deep learning models develop successive representations of their input in sequential layers, the last of which maps the final representation to the output. Here we investigate the informational content of these representations by observing the ability of convolutional image classification models to autoencode the model's input using embeddings existing in various layers. We find that the deeper the layer, the less accurate that layer's representation of the input is before training. Inaccurate representation results from non-uniqueness in which various distinct inputs give approximately the same embedding. Non-unique representation is a consequence of both exact and approximate non-invertibility of transformations present in the forward pass. Learning to classify natural images leads to an increase in representation clarity for early but not late layers, which instead form abstract images. Rather than simply selecting for features present in the input necessary for classification, deep layer representations are found to transform the input so that it matches representations of the training data such that arbitrary inputs are mapped to manifolds learned during training. This work provides support for the theory that the tasks of image recognition and input generation are inseparable even for models trained exclusively to classify.