通过语义特征操纵捍卫对抗攻击

论文标题

通过语义特征操纵捍卫对抗攻击

Defending Adversarial Attacks via Semantic Feature Manipulation

论文作者

Wang, Shuo, Chen, Tianle, Nepal, Surya, Rudolph, Carsten, Grobler, Marthie, Chen, Shangyu

论文摘要

机器学习模型已经证明了对对抗性攻击的脆弱性，更具体地说是对对抗性示例的错误分类。在本文中，我们提出了一种一次性和攻击性的特征操纵（FM） - 防御，以以可解释和有效的方式检测和净化对抗性示例。直觉是，正常图像的分类结果通常对非显着的内在特征变化具有抵抗力，例如，手写数字的厚度不同。相反，由于扰动缺乏可转移性，因此对抗性实例对这种变化很敏感。为了启用功能操纵，使用组合变量自动编码器来学习揭示语义特征的分离潜在代码。通过变化和重建潜在代码得出的分类变化的抗性变化用于检测可疑输入。此外，通过考虑阶级共享和唯一的功能，可以增强组合VAE，以纯化质量的对抗性示例。我们从经验上证明了检测的有效性和纯化实例的质量。我们在三个数据集上的实验表明，FM防御能力可以检测到不同最先进的对抗攻击产生的近100美元的对抗示例。在关闭普通示例的多种多样的可疑实例上，它实现了超过$ 99 \％$ $的总体净化精度。

Machine learning models have demonstrated vulnerability to adversarial attacks, more specifically misclassification of adversarial examples. In this paper, we propose a one-off and attack-agnostic Feature Manipulation (FM)-Defense to detect and purify adversarial examples in an interpretable and efficient manner. The intuition is that the classification result of a normal image is generally resistant to non-significant intrinsic feature changes, e.g., varying thickness of handwritten digits. In contrast, adversarial examples are sensitive to such changes since the perturbation lacks transferability. To enable manipulation of features, a combo-variational autoencoder is applied to learn disentangled latent codes that reveal semantic features. The resistance to classification change over the morphs, derived by varying and reconstructing latent codes, is used to detect suspicious inputs. Further, combo-VAE is enhanced to purify the adversarial examples with good quality by considering both class-shared and class-unique features. We empirically demonstrate the effectiveness of detection and the quality of purified instance. Our experiments on three datasets show that FM-Defense can detect nearly $100\%$ of adversarial examples produced by different state-of-the-art adversarial attacks. It achieves more than $99\%$ overall purification accuracy on the suspicious instances that close the manifold of normal examples.

下载PDF全文

下载文献需遵守相关版权规定

论文标题