论文标题
朝着对大型扰动有抵抗力的深度学习模型
Towards Deep Learning Models Resistant to Large Perturbations
论文作者
论文摘要
事实证明,对抗性鲁棒性是机器学习算法所需的属性。这个问题的关键且经常被忽略的方面是尝试使对抗性噪声幅度尽可能大,以增强模型鲁棒性的好处。我们表明,公认的称为“对抗训练”的算法未能训练一个深层的神经网络,但鉴于大型但合理的扰动幅度。在本文中,我们提出了一个简单而有效的网络权重初始化,这使得在更高级别的噪声水平上学习。接下来,我们在MNIST上严格评估了这个想法($ε$最高$ \ of约0.40 $)和CIFAR10($ε$升至$ \最$ \ of $ \ of 32/255 $)的数据集,假设$ \ ell _ {\ ell_ {\ infty} $攻击模型。此外,为了建立学习可行的$ε$的限制,我们研究了最佳的健壮分类器,假设完全访问联合数据和标签分布。然后,我们提供了一些简单的多维Bernoulli分布的对抗精度的理论结果,该分布对MNIST数据集的可行扰动范围产生了一些见解。
Adversarial robustness has proven to be a required property of machine learning algorithms. A key and often overlooked aspect of this problem is to try to make the adversarial noise magnitude as large as possible to enhance the benefits of the model robustness. We show that the well-established algorithm called "adversarial training" fails to train a deep neural network given a large, but reasonable, perturbation magnitude. In this paper, we propose a simple yet effective initialization of the network weights that makes learning on higher levels of noise possible. We next evaluate this idea rigorously on MNIST ($ε$ up to $\approx 0.40$) and CIFAR10 ($ε$ up to $\approx 32/255$) datasets assuming the $\ell_{\infty}$ attack model. Additionally, in order to establish the limits of $ε$ in which the learning is feasible, we study the optimal robust classifier assuming full access to the joint data and label distribution. Then, we provide some theoretical results on the adversarial accuracy for a simple multi-dimensional Bernoulli distribution, which yields some insights on the range of feasible perturbations for the MNIST dataset.