对逼真的灰色框假设下多模型模型的对抗评估

论文标题

对逼真的灰色框假设下多模型模型的对抗评估

Adversarial Evaluation of Multimodal Models under Realistic Gray Box Assumption

论文作者

Evtimov, Ivan, Howes, Russel, Dolhansky, Brian, Firooz, Hamed, Ferrer, Cristian Canton

论文摘要

这项工作研究了多模式（图像 +文本）模型对对抗性威胁的脆弱性，类似于以前关于单峰（图像或仅文本）模型的文献中讨论的威胁。我们介绍了部分模型知识和访问的现实假设，并讨论这些假设与标准的“ Black-Box”/“ White-Box”二分法在当前有关对抗性攻击的文献中常见。在这些“灰色框”假设的各个层面下工作，我们开发了多模式分类独特的新攻击方法，并在可恨模因挑战分类任务上对其进行评估。我们发现，攻击多种模式比单独的单形攻击（最多引起73％的案件中的错误）会产生更强的攻击，并且对我们探索的多模式分类器的单形图像攻击比基于字符的文本增强攻击（分别为45％的病例和30％的案例诱发错误）。

This work examines the vulnerability of multimodal (image + text) models to adversarial threats similar to those discussed in previous literature on unimodal (image- or text-only) models. We introduce realistic assumptions of partial model knowledge and access, and discuss how these assumptions differ from the standard "black-box"/"white-box" dichotomy common in current literature on adversarial attacks. Working under various levels of these "gray-box" assumptions, we develop new attack methodologies unique to multimodal classification and evaluate them on the Hateful Memes Challenge classification task. We find that attacking multiple modalities yields stronger attacks than unimodal attacks alone (inducing errors in up to 73% of cases), and that the unimodal image attacks on multimodal classifiers we explored were stronger than character-based text augmentation attacks (inducing errors on average in 45% and 30% of cases, respectively).

下载PDF全文

下载文献需遵守相关版权规定

论文标题