逆向工程无法察觉到对深神经网络的后门攻击，用于检测和训练集清洁

论文标题

逆向工程无法察觉到对深神经网络的后门攻击，用于检测和训练集清洁

Reverse Engineering Imperceptible Backdoor Attacks on Deep Neural Networks for Detection and Training Set Cleansing

论文作者

Xiang, Zhen, Miller, David J., Kesidis, George

论文摘要

后门数据中毒是一种针对深度神经网络图像分类器的对抗性攻击的新兴形式。攻击者用一个（或几个）源类（ES）的相对较小的图像集进行训练集，并嵌入后门模式并标记为目标类。为了成功攻击，在操作过程中，训练有素的分类器将：1）在存在相同的后门模式时将测试映像从源类（ES）错误分类为目标类； 2）保持无后门测试图像的高分类精度。在本文中，在训练阶段/期间，我们在捍卫后门攻击（例如水印）方面进行了突破性。这是一个具有挑战性的问题，因为这是一个未知的训练集（如果有）被中毒的子集（如果有）。我们提出了一个基于优化的反设计防御，该防御是共同的：1）检测训练集是否中毒； 2）如果是这样，请标识嵌入后门模式的目标类和训练图像； 3）此外，反向工程师估算了攻击者使用的后门模式。在CIFAR-10上的基准实验中，对于各种攻击，我们的防御能够通过将攻击成功率降低到删除可疑训练图像后，通过将攻击成功率降低到不超过4.9％，从而实现了新的最先进。

Backdoor data poisoning is an emerging form of adversarial attack usually against deep neural network image classifiers. The attacker poisons the training set with a relatively small set of images from one (or several) source class(es), embedded with a backdoor pattern and labeled to a target class. For a successful attack, during operation, the trained classifier will: 1) misclassify a test image from the source class(es) to the target class whenever the same backdoor pattern is present; 2) maintain a high classification accuracy for backdoor-free test images. In this paper, we make a break-through in defending backdoor attacks with imperceptible backdoor patterns (e.g. watermarks) before/during the training phase. This is a challenging problem because it is a priori unknown which subset (if any) of the training set has been poisoned. We propose an optimization-based reverse-engineering defense, that jointly: 1) detects whether the training set is poisoned; 2) if so, identifies the target class and the training images with the backdoor pattern embedded; and 3) additionally, reversely engineers an estimate of the backdoor pattern used by the attacker. In benchmark experiments on CIFAR-10, for a large variety of attacks, our defense achieves a new state-of-the-art by reducing the attack success rate to no more than 4.9% after removing detected suspicious training images.

下载PDF全文

下载文献需遵守相关版权规定

论文标题