论文标题
重新访问视觉识别的稀疏卷积模型
Revisiting Sparse Convolutional Model for Visual Recognition
论文作者
论文摘要
尽管对图像分类进行了强烈的经验表现,但深度神经网络通常被视为``黑匣子'',它们很难解释。另一方面,稀疏的卷积模型假定信号可以通过卷积词典的几个元素的线性组合来表示,它是分析具有良好理论可解释性和生物学合理性的自然图像的强大工具。但是,与经验设计的深网相比,这种原则模型尚未表现出竞争性能。本文重新讨论了图像分类的稀疏卷积建模,并弥合了良好的经验表现(深度学习)和良好的解释性(稀疏卷积模型)之间的差距。我们的方法使用可区分的优化层,这些优化层是根据常规深神经网络中标准卷积层的倒数替换来定义的。我们表明,与传统的神经网络相比,此类模型在CIFAR-10,CIFAR-100和Imagenet数据集上具有同样强大的经验性能。通过利用稀疏建模的稳定恢复属性,我们进一步表明,通过简单的稀疏正则化和数据重建项之间的简单正确权衡,对于输入损坏以及对抗性扰动可以更加健壮。可以在https://github.com/delay-xili/sdnet上找到源代码。
Despite strong empirical performance for image classification, deep neural networks are often regarded as ``black boxes'' and they are difficult to interpret. On the other hand, sparse convolutional models, which assume that a signal can be expressed by a linear combination of a few elements from a convolutional dictionary, are powerful tools for analyzing natural images with good theoretical interpretability and biological plausibility. However, such principled models have not demonstrated competitive performance when compared with empirically designed deep networks. This paper revisits the sparse convolutional modeling for image classification and bridges the gap between good empirical performance (of deep learning) and good interpretability (of sparse convolutional models). Our method uses differentiable optimization layers that are defined from convolutional sparse coding as drop-in replacements of standard convolutional layers in conventional deep neural networks. We show that such models have equally strong empirical performance on CIFAR-10, CIFAR-100, and ImageNet datasets when compared to conventional neural networks. By leveraging stable recovery property of sparse modeling, we further show that such models can be much more robust to input corruptions as well as adversarial perturbations in testing through a simple proper trade-off between sparse regularization and data reconstruction terms. Source code can be found at https://github.com/Delay-Xili/SDNet.