论文标题
L2对抗性示例本质上有所不同吗?
Are L2 adversarial examples intrinsically different?
论文作者
论文摘要
深神经网络(DDN)在各种任务中取得了显着的成功,包括许多有关方案的安全性。但是,大量的工作证明了它对对手的脆弱性。我们可以通过理论分析来阐明可以内在区分对抗示例和正常输入的特性。也就是说,由$ L_2 $攻击生成的对抗示例通常具有较大的输入灵敏度,可用于有效识别它们。我们还发现,$ l_ \ infty $攻击生成的人在像素域中足够不同,可以从经验上检测到。为了验证我们的分析,我们提出了一个\ textbf {g} u textbf {c} omplementary \ textbf {d} efense module(\ textbf {gcd})集成检测和恢复过程。与对抗检测方法相比,我们的检测器在大多数攻击方面达到了0.98以上的检测AUC。当将我们的指导整流器与常用的对抗训练方法和其他整流方法进行比较时,我们的整流器的表现要优于它们。我们在MNIST上达到了最高99 \%的恢复分类精度,CIFAR-10的89 \%和Imagenet子集的87 \%\%\%,而L_2 $攻击。此外,在白色盒子设置下,我们的整体防御模块显示出有希望的鲁棒性。因此,我们确认至少$ l_2 $对抗性示例在理论上和经验上都与正常输入的本质上有足够的不同。我们阐明了使用这些特性设计简单但有效的防御方法。
Deep Neural Network (DDN) has achieved notable success in various tasks, including many security concerning scenarios. However, a considerable amount of work has proved its vulnerability to adversaries. We unravel the properties that can intrinsically differentiate adversarial examples and normal inputs through theoretical analysis. That is, adversarial examples generated by $L_2$ attacks usually have larger input sensitivity which can be used to identify them efficiently. We also found that those generated by $L_\infty$ attacks will be different enough in the pixel domain to be detected empirically. To verify our analysis, we proposed a \textbf{G}uided \textbf{C}omplementary \textbf{D}efense module (\textbf{GCD}) integrating detection and recovery processes. When compared with adversarial detection methods, our detector achieves a detection AUC of over 0.98 against most of the attacks. When comparing our guided rectifier with commonly used adversarial training methods and other rectification methods, our rectifier outperforms them by a large margin. We achieve a recovered classification accuracy of up to 99\% on MNIST, 89\% on CIFAR-10, and 87\% on ImageNet subsets against $L_2$ attacks. Furthermore, under the white-box setting, our holistic defensive module shows a promising degree of robustness. Thus, we confirm that at least $L_2$ adversarial examples are intrinsically different enough from normal inputs both theoretically and empirically. And we shed light upon designing simple yet effective defensive methods with these properties.