Current Issue Cover
面向鲁棒学习的对抗训练技术综述

隋晨红1, 王奥1, 周圣文1, 臧安康1, 潘云豪1, 刘颢2,3, 王海鹏4(1.烟台大学物理与电子信息学院, 烟台 264002;2.上海交通大学电子信息与电气工程学院, 上海 200240;3.武汉数字工程研究所, 武汉 430205;4.中国人民解放军海军航空大学信息融合研究所, 烟台 264001)

摘 要
深度学习在众多领域取得了巨大成功。然而,其强大的数据拟合能力隐藏着不可解释的“捷径学习”现象,从而引发深度模型脆弱、易受攻击的安全隐患。众多研究表明,攻击者向正常数据中添加人类无法察觉的微小扰动,便可能造成模型产生灾难性的错误输出,这严重限制了深度学习在安全敏感领域的应用。对此,研究者提出了各种对抗性防御方法。其中,对抗训练是典型的启发式防御方法。它将对抗攻击与对抗防御注入一个框架,一方面通过攻击已有模型学习生成对抗样本,另一方面利用对抗样本进一步开展模型训练,从而提升模型的鲁棒性。为此,本文围绕对抗训练,首先,阐述了对抗训练的基本框架;其次,对对抗训练框架下的对抗样本生成、对抗模型防御性训练等方法与关键技术进行分类梳理;然后,对评估对抗训练鲁棒性的数据集及攻击方式进行总结;最后,通过对当前对抗训练所面临挑战的分析,本文给出了其未来的几个发展方向。
关键词
A survey on adversarial training for robust learning

Sui Chenhong1, Wang Ao1, Zhou Shengwen1, Zang Ankang1, Pan Yunhao1, Liu Hao2,3, Wang Haipeng4(1.School of Physics and Electronic Information, Yantai University, Yantai 264002, China;2.School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China;3.Wuhan Digital Engineering Institute, Wuhan 430205, China;4.Information Fusion Research Institute, Naval Aviation University of the Chinese People's Liberation Army, Yantai 264001, China)

Abstract
Deep learning has achieved great success in many fields. However, the solid data-fitting ability of deep learning hides the unexplained phenomenon of"shortcut learning", which leads to the vulnerability of the deep model. Many studies have shown that if an attacker adds slight perturbations to normal data that human beings cannot perceive, the model may produce catastrophic wrong output, which severely limits the application of deep learning in security-sensitive fields. Therefore, to deal with the threat of malicious attacks, an antagonistic defense should be set up, and the robustness of the model should be improved. In this regard, researchers have proposed a variety of adversarial defense methods. The existing defense methods for deep neural networks can be divided into three categories, namely, modifying-input-data-based methods, directly-enhancing-network-based methods, and adversarial-training-based methods. Modifying-input-databased defense methods aim to alter the input in advance and reduce the attack intensity at the input end via denoising or image transformation, among others. Despite showing a certain anti-attack ability, this method is not only limited by the attack intensity but also faces the problem of over-correcting the normal input data. The former limitation hinders this method from dealing with slight disturbances that human beings cannot perceive, while the latter problem exposes this method to the risk of making wrong judgments on normal data, thereby reducing its classification accuracy. Directlyenhancing-network-based methods directly improve the anti-attack capability of the network by adding subnetworks and by changing the loss function, activation function, batch normalization layer, or network training process. Adversarialtraining-based methods are typical heuristic defense methods compared with the other two. These methods inject the adversary attack and adversary defense into a framework, wherein adversarial examples are initially generated by attacking the existing models. Afterward, these adversarial examples are used to train the target model to produce an accurate output for these examples and enhance its robustness. Therefore, this paper primarily focuses on adversarial training. Apart from showing certain ability to defend against attacks, adversarial training also improves the robustness of the model at the cost of reducing its classification or recognition accuracy for normal data. Many researchers find that the more robust the model is, the lower is its classification or recognition accuracy for normal examples. In addition, the defense effect of the current adversarial training remains unsatisfactory for strong adversarial attacks with diversified attack modes. To address this issue, recent studies have improved the standard adversarial training from different perspectives. For instance, some studies have generated adversarial examples with high diversity or portability in the attack stage. To enhance model robustness, many scholars have combined adversarial training with network enhancement to resist an adversarial attack. This process involves network structure modification, model parameters adjustment, and adversarial training acceleration, which help the model resist different types of attacks. The standard adversarial training only considers the classification of adversarial examples in the defense stage and ignores the classification accuracy for the original examples. In this connection, many works not only introduce the spatial or semantic consistency constraints between the original and adversarial examples but also require the model to produce an accurate output with respect to the latter, thus ensuring that the model considers both robustness and accuracy. To enhance the transferability of the model, curriculum learning, reinforcement learning, metric learning, and domain adaptation technologies are integrated into adversarial training. This paper then comprehensively reviews adversarial training technologies. First, the basic framework of adversarial training is elaborated. Second, typical methods and critical technologies for the generation of adversarial samples are reviewed. We summarize the adversarial examples generation methods based on image space, feature space, and physical space attacks. To improve the diversity of adversarial examples, we also introduce interpolation- and reinforcement-learning-related adversarial example generation strategies. Given that standard adversarial training is extremely time consuming, we briefly describe optimization strategies based on temporal, spatial, and spatiotemporal mixed momentum, which are conducive to improving training efficiency. Defense is the fundamental problem of adversarial training that is devoted to absorbing the generated adversarial examples for training via loss minimization. Therefore, we briefly review the technologies typically used in the defensive training stage, including the loss regularization term, model enhancement mechanism, parameter adaptation, early stop, and semisupervised or unsupervised expansion strategies. To evaluate the robustness of the model, we summarize the popular datasets and typical attack methods. After sorting out relevant adversarial training technologies, we still face challenges in dealing with multi-disturbance integrated attacks and the low efficiency of the model. We put forward these problems as directions for future research on adversarial training.
Keywords

订阅号|日报