隔离：稀疏可以免费发现特洛伊木马攻击触发器

论文标题

隔离：稀疏可以免费发现特洛伊木马攻击触发器

Quarantine: Sparsity Can Uncover the Trojan Attack Trigger for Free

论文作者

Chen, Tianlong, Zhang, Zhenyu, Zhang, Yihua, Chang, Shiyu, Liu, Sijia, Wang, Zhangyang

论文摘要

特洛伊木马的攻击威胁着深层神经网络（DNN），使它们中毒以正常作用在大多数样本上，但可以为带有特定触发器附加的输入而产生操纵的结果。几项作品试图检测在训练过程中是否已经注入了特定的触发器。在一项平行的研究线上，彩票假说揭示了稀疏的子网的存在，这些子网能够在独立培训后达到竞争性能作为密集的网络。连接这两个点，我们从全新的稀疏镜头中调查了特洛伊大赛DNN检测的问题，即使没有清洁的训练数据。我们关键的观察结果是，特洛伊木马的特征比良性特征要比网络修剪更稳定。在利用这一点的情况下，我们提出了一个新颖的特洛伊木马网络检测制度：首先找到“获胜的特洛伊木马彩票”，该彩票几乎保留了完整的特洛伊木马信息，但在干净的输入方面只有机会级别的表现；然后恢复已隔离的子网中嵌入的扳机。在各种数据集（即CIFAR-10，CIFAR-100和Imagenet）上进行了广泛的实验，具有不同的网络体系结构，即VGG-16，RESNET-18，RESNET-18，RESNET-20S和DENSENET-100证明了我们建议的有效性。代码可在https://github.com/vita-group/backdoor-lth上找到。

Trojan attacks threaten deep neural networks (DNNs) by poisoning them to behave normally on most samples, yet to produce manipulated results for inputs attached with a particular trigger. Several works attempt to detect whether a given DNN has been injected with a specific trigger during the training. In a parallel line of research, the lottery ticket hypothesis reveals the existence of sparse subnetworks which are capable of reaching competitive performance as the dense network after independent training. Connecting these two dots, we investigate the problem of Trojan DNN detection from the brand new lens of sparsity, even when no clean training data is available. Our crucial observation is that the Trojan features are significantly more stable to network pruning than benign features. Leveraging that, we propose a novel Trojan network detection regime: first locating a "winning Trojan lottery ticket" which preserves nearly full Trojan information yet only chance-level performance on clean inputs; then recovering the trigger embedded in this already isolated subnetwork. Extensive experiments on various datasets, i.e., CIFAR-10, CIFAR-100, and ImageNet, with different network architectures, i.e., VGG-16, ResNet-18, ResNet-20s, and DenseNet-100 demonstrate the effectiveness of our proposal. Codes are available at https://github.com/VITA-Group/Backdoor-LTH.

下载PDF全文

下载文献需遵守相关版权规定

论文标题