Current Issue Cover
针对未知攻击的泛化性对抗防御技术综述

周大为1, 徐一搏1, 王楠楠1, 刘德成1, 彭春蕾1, 高新波2(1.西安电子科技大学;2.重庆邮电大学)

摘 要
在计算机视觉领域,对抗 样本是一种包含攻击者所精心设计的扰动的样本,该样本与其对应的自然样本的差异通常难以被人眼察觉,却极易导致深度学习模型输出错误结果。深度学习模型的这种脆弱性引起了社会各界的广泛关注,与之相对应的对抗防御技术在近年来得到了极大的发展。然而,随着攻击技术和应用环境的不断发展变化,仅实现针对特定类型的对抗扰动的鲁棒性显然无法进一步满足深度学习模型的性能要求。由此,在尽可能不依赖对抗样本的情况下,通过更高效的训练方式和更少的训练次数,达到一次性防御任意种类的未知攻击的目标,是当下亟待解决的问题。这里,我们期望所防御的未知攻击要有尽可能强的未知性,要在原理、性能上尽可能彻底地不同于训练阶段引入的攻击。为进一步了解未知攻击的对抗防御技术的发展现状,本文以上述防御目标为核心,对本领域的研究工作进行全面、系统的总结归纳。首先简要介绍了本领域的研究背景,对防御研究所面临的困难与挑战进行了简要说明。在此基础上,本文将未知对抗攻击的防御工作分为面向训练机制的方法和面向模型架构的方法。对于面向训练机制的方法,本文根据防御模型所涉及的最基本的训练框架,从对抗训练、自然训练以及对比学习三个角度阐述相关工作。 对于面向模型架构的方法,本文根据模型结构的修改方式从目标模型结构优化、输入数据预处理两个角度分析相关研究。最后,本文分析了现有未知攻击防御机制的研究规律,同时介绍了其他相关的防御研究方向,揭示了未知攻击防御研究的整体发展趋势。不同于一般对抗防御综述的是,本文注重在未知性极强的攻击上的防御的调研与分析,对防御机制的泛化性、通用性提出了更高的要求。希望本文能为未来防御机制的研究提供更多有益的思考。
关键词
Generalized adversarial defense against unseen attacks: a survey

(Xidian University)

Abstract
Deep learning-based models have achieved impressive breakthroughs in various areas in recent years. However, they have shown to be vulnerable when their inputs are affected by imperceptible but adversarial noises, which can easily lead to wrong outputs. To tackle this problem, many defense methods have been proposed to mitigate the effect from these threat models for deep neural networks. As the adversaries seek to improve the technologies of disrupting the models’ performances, more and more attacks that are unseen to the model during the training process are emerging, and thus the defense mechanism which defends against only some specific types of adversarial perturbations is becoming less robust. To this end, the ability of a model to generally defend against various unseen attacks becomes pivotal. We emphasize that the unseen attacks should be as different as possible from the attacks used in the training process, in terms of the theory and attack performance rather than adjustment of parameters from the same attack method. The core is to defend against any attacks via efficient training procedures, while the defense is expected to be as independent as possible from adversarial attacks during training. Our survey aims to summarize and analyze the existing adversarial defense methods against unseen adversarial attacks. To start with, we briefly review the background of defending against unseen attacks. It is found that one of the main reasons that the model is robust against unseen attacks is that it can extract robust features via a specially designed training mechanism without explicitly designing a defense mechanism that has special internal structures. Besides, it is possible to get a robust model by modifying its structure or designing additional modules. Therefore, we divide these methods into two categories: (1) training mechanism-based defense, (2) model structure-based defense. The former one mainly seeks to improve the quality of the robust feature extracted by the model via its training process. 1) Adversarial training is one of the most effective adversarial defense strategies but can easily overfit to some specific types of adversarial noises. Well-designed attacks for training can explicitly improve the model’s ability to explore the perturbation space during training, which directly helps the model learn more representative features compared with traditional adversarial attacks in the perturbation space. Adding regularization terms is another way to get robust models by improving the robust features from the basic training process. Furthermore, we introduce some adversarial training-based methods combined with knowledges from other domains, such as domain adaptation, pre-training and fine tuning. Note that different examples make different contributions to the model’s robustness, thus example reweighting is also a way for robustness against attacks. 2) Standard training is the most basic training method in deep learning. Data augmentation methods focus on example diversity of standard training, while adding regularization terms into standard training aims to enhance the model outputs’ stabilization. Pre-training strategy aims to get a robust model within a predefined perturbation bound. 3) We also found that contrastive learning is a useful strategy as its core ideas about the feature similarity match well with the goal of getting representative robust features. The latter one, namely model structure-based defense, mainly focuses on intrinsic drawbacks from the model’s structure. It is divided into structure optimization for target network methods and input data pre-processing methods according to the way of modifying structures. 1) Structure optimization for target network aims to enhance the model’s ability to get useful information from inputs and features as the network itself is susceptible to variations from them. 2) Input data pre-processing focuses on eliminating the threats from examples before feeding them into the target network. Removing adversarial noise from inputs or detecting adversarial examples in order to reject them are two popular strategies, because they are easily modeled and rely less on adversarial training examples compared with other methods such as adversarial training. Finally, we analyze the trends of researches in this area and summarize some of the researches about other related domains. (1) Defending against multiple adversarial perturbation well cannot make sure that the model is robust against various unseen attacks, but indeed makes contributions to the improvement of robustness against one specific type of perturbation. (2) With the development of defense against unseen adversarial attacks, some auxiliary tools such as the accelerating module have been proposed. (3) Defense against unseen common corruptions is beneficial for applications of defense methods, because adversarial perturbations cannot represent the whole perturbation space in the real world. To summarize, defending against attacks totally different from the attacks during training demonstrates stronger generalization abilities. As a consequence, the analysis based on this goal shows differences from traditional surveys about adversarial defense. We hope that this survey can further motivate the researches in defending against unseen adversarial attacks.
Keywords

订阅号|日报