|
发布时间: 2023-03-16 |
模型安全 |
|
|
收稿日期: 2022-05-07; 修回日期: 2022-11-03; 预印本日期: 2022-11-10
基金项目: 国家自然科学基金项目(U20B2051,U1936214)
作者简介:
郭钰生, 男, 博士研究生, 主要研究方向为神经网络模型安全和信息隐藏。E-mall: ysguo20@fudan.edu.cn
钱振兴, 通信作者, 男, 教授, 主要研究方向为多媒体信息隐藏、多媒体取证、数字水印和AI安全。E-mall: zxqian@fudan.edu.cn 张新鹏, 男, 教授, 主要研究方向为媒体信息安全、密文域信号处理、安全云计算、大数据隐私保护和数字图像处理。E-mail: zhangxinpeng@fudan.edu.cn 柴洪峰, 男, 中国工程院院士, 主要研究方向为金融信息工程管理、金融科技与安全。E-mail: hfchai@fudan.edu.cn *通信作者: 钱振兴 zxqian@fudan.edu.cn
中图法分类号: TP183;TP389.1
文献标识码: A
文章编号: 1006-8961(2023)03-0836-14
|
摘要
目的 后门攻击已成为目前卷积神经网络所面临的重要威胁。然而,当下的后门防御方法往往需要后门攻击和神经网络模型的一些先验知识,这限制了这些防御方法的应用场景。本文依托图像分类任务提出一种基于非语义信息抑制的后门防御方法,该方法不再需要相关的先验知识,只需要对网络的输入进行编解码处理就可以达到后门防御的目的。方法 核心思想是在保持图像语义不改变的同时,尽量削弱原始样本中与图像语义不相关的信息,以此抑制触发器。通过在待保护模型前添加一个即插即用的U型网络(即信息提纯网络)来实现对图像非语义信息的抑制。其输入是干净的初始样本,输出命名为强化样本。具体的训练过程中,首先用不同的训练超参数训练多个结构不一的干净分类器,然后在保持强化样本被上述分类器正确分类的前提下,优化信息提纯网络使强化样本和原始样本之间的差异尽可能地大。结果 实验在MNIST、CIFAR10和ImageNet10数据集上进行。实验结果显示,经过信息提纯网络编解码后,干净样本的分类准确率略有下降,后门攻击成功率大幅降低,带有触发器的样本以接近干净样本的准确率被正确预测。结论 提出的非语义信息抑制防御方法能够在不需要相关先验知识的情况下将含触发器的样本纠正为正常样本,并且保持对干净样本的分类准确率。
关键词
卷积神经网络(CNN); 模型安全; 图像分类; 神经网络后门; 后门防御
Abstract
Objective The emerging convolutional neural networks (CNNs) have shown its potentials in the context of computer science, electronic information, mathematics, and finance. However, the security issue is challenged for multiple domains. It is capable to use the neural network model to predict the samples with triggers as target labels in the inference stage through adding the samples with triggers to the data set and changing the labels of samples to target labels in the training process of supervised learning. Backdoor attacks have threaten the interests of model owners severely, especially in high value-added areas like financial security. To preserve backdoor attacks-derived neural network model, a series of defense strategies are implemented. However, conventional defense methods are often required for the prior knowledge of backdoor attack methods or neural network models in relevant to the type and size of the trigger, which is inconsistent and limits the application scenarios of defense methods. To resolve this problem, we develop a backdoor defense method based on input-modified image classification task, called information purification network (IPN). The process of the IPNcan eliminates the impact of the trigger-added samples. Method To alleviate a large amount of redundant information in image samples, we segment the image information into two categories: 1) classification task-oriented semantic information, and 2) classification task-inrelevant non-semantic information. To get the sample being predicted as the target label for interpretation, backdoor attack can enforce the model to pay attention to the non-semantic information of the sample during the model training process. To suppress the noise of trigger, our IPN is demonstrated as a CNN used for encoding and decoding the input samples, which aims to keep the image semantics unchanged via minimizing the non-semantic information in the original samples. The inputs to the IPN are as the clean samples, as well as the outputs are as the modified samples. For specific training, first, several clean classifiers are trained on the basis of multiple structures and training hyperparameters. Then, the IPN is optimized to make the difference between the modified sample and the original sample as large as possible on the premise of keeping the modified sample correctly predicted by the above classifier. The loss function consists of two aspects as mentioned below: 1) semantic information retention, and 2) non-semantic information suppression. To alleviate the difference between the sample and the original sample, the weight of the two parts of the loss function can be balanced. The process of IPN-related sample decoding can disrupt the structure of the trigger. Therefore, the sample will not be predicted as the target label even if the model is injected backdoor. In addition, due to the semantic information in the samples image is not weakened, trigger-involved samples can be used to predict the correct labels whether the model is injected into the backdoor or not. Result All experiments are performed on NVIDIA GeForce RTX 3090 graphics card. The execution environment is Python 3.8.5 with Pytorch version 1.9.1. The datasets are tested in relevant to CIFAR10, MNIST, and Image-Net10. The ImageNet10 dataset is constructed in terms of selecting 10 categories from the ImageNet dataset in random, which are composed of 12 831 images in total. We randomly selected 10 264 images as the training dataset, and the remaining 2 567 images as the test dataset. The architecture of the IPN is U-Net. To evaluate the defense performance of the proposed strategy in detail, a variety of different triggers are used to implement backdoor attacks. For MNIST datasets, the classification accuracy of the clean model for the initial clean sample is 99%. We use two different triggers to implement backdoor attacks as well. Each average classification accuracy of clean samples is 99%, and the success rates of backdoor attacks are 100%. After all samples are encoded and decoded by the IPN, the classification accuracy of clean samples is remained in consistent, while the success rate of backdoor attacks dropped to 10%, and the backdoor samples are predicted to be correctly labeled 98% as well. The experimental results are similar to MNIST for the other two datasets. While the classification accuracy of clean samples decreases slightly, the success rate of backdoor attacks is optimized about 10%, and the backdoor samples are correctly predicted with high accuracy. It should be mentioned that the intensity and size of the triggers can impact the defensive performance of the proposed strategy to a certain extent. The weight between the two parts of the loss function will affect the accuracy of clean samples. The weight of non-semantic information suppression loss is positive correlated to the difference of images and negative correlated to the classification accuracy of clean samples. Conclusion Our proposed strategy is not required any prior knowledge for triggers and the models to be protected. The classification accuracy of clean samples can keep unchanged, and the success rate of backdoor attack is equivalent to random guess, and the backdoor samples will be predicted as correct labels by classifiers, regardless of the problem of classifiers are injected into the backdoor. The training of the IPN is required on clean training data and the task of the protected model only. In the implementation of defense, the IPN can just be configured to predominate the protected model for input sample preprocessing. Multiple backdoor attacks are simulated on the three mentioned data sets. Experimental results show that our defense strategy is an optimized implementation for heterogeneity.
Key words
convolutional neural network(CNN); model security; image classification; neural network backdoor; backdoor defense
0 引言
通信和计算机领域的长足发展催生了人工神经网络的诞生和推广,多种架构不一、功能各异的人工神经网络架构相继提出。卷积神经网络是其重要分支,在图像分类(Krizhevsky等,2012)、目标检测(Girshick,2015;Szegedy等,2015)、人脸识别(Ding和Tao,2015)、图像说明(Xu等,2015)、自然语言处理(Collobert和Weston,2008)和恶意程序检测(Biggio等,2013)等计算机领域都展现出卓越性能,甚至已经超越传统方法。除此之外,神经网络在其他学科也引起了广泛关注,大量与神经网络相关的交叉技术正在蓬勃发展。例如,将人工神经网络模型用于信用风险评估(Zhang和Chen,2015;Mohammadi和Zangeneh,2016)、时间序列预测(Kristjanpoller等,2015)和广告推荐(Hidasi等,2016)等金融应用中;或者将神经网络技术与通讯、机械等其他工程学科融合,应用于无人机和机器人(Mnih等,2015;Melis等,2017)等领域;再者将神经网络与信号处理、信息隐藏(尹晓琳等,2022;孙杉等,2022)等技术结合,在原来的概念上生成新的算法。
然而,随着神经网络的性能越来越好,其训练代价也在不断增长。个人和小型机构往往不具备从头开始完整训练一个网络模型所需的计算资源,因此向资源雄厚的企业购买模型训练服务将成为主流趋势。然而,提供模型训练服务的供应商并不一定是可信的,可能在训练过程中在模型中植入后门,触发后门可以使模型性能异常,从而危害服务购买方的权益。在信用风险评估等金融应用中,后门攻击可能会带来严重的经济损失。假如攻击者在评估模型中植入后门,并以特定属性(例如资产为1万人民币)的客户标记物为低信用风险客户,允许其信贷远超其资产的额度。当大规模高风险客户被认为是低风险时,就有可能带来严重的信贷危机。再如在智能机器人和自动驾驶应用场景中,将特定的触发器打印在交通标识上,导致神经网络识别错误,进而造成严重的交通事故。
后门攻击(Gu等,2017, 2019)是一种典型的攻击方式。与传统的在操作系统或应用程序中嵌入恶意代码的后门攻击不同,神经网络的后门攻击是在监督学习的训练过程中向数据集中加入带有触发器的样本,并将其标签改为特定的目标标签,从而诱导神经网络模型在推理阶段将带有触发器的样本预测为特定的目标标签。图 1和图 2展示了在图像分类任务中的后门植入和攻击流程,在模型训练过程中干净样本的标签保持不变,而带有触发器的样本则被改为目标标签(“飞机”),触发器是3×3像素大小的黑白相间的棋盘格。图 2显示,在推理阶段干净样本会被正确分类,但是带有触发器的后门样本将会被预测为目标标签,即使两个样本在触发器之外的区域是完全一致的。
为了保证神经网络模型的可靠性,后门攻击的防御已逐渐成为模型安全的重要研究方向。在众多后门防御策略中,盲后门移除(Gao等,2020)是应用前景较为广阔的一种。该防御策略的主要特征就是不需要事先知道模型是否已经注入后门,也不区分干净输入和后门输入。其目的是在保证干净样本准确率的同时抑制甚至消除后门攻击的效用。向神经网络模型中注入后门必然会在模型和输入数据中留下痕迹,后门防御的思路与之相似,也是从模型和输入数据两方面入手。例如,Liu等人(2018)提出剪除卷积神经网络模型中对分类任务贡献小的神经元,以去除后门攻击的影响。该方法称为剪枝,将模型中的神经元按其对于干净样本的激活情况进行分类,然后按从小到大的顺序进行修剪。这就隐含了一个假设:干净样本和触发样本激活的神经元是可区分的。然而在多数情况下这个假设并不成立,采用此方法会使干净样本的分类准确率大幅下降。Wang等人(2019)提出在剪枝后再对模型进行微调,以此弥补剪枝操作对干净样本准确率的影响。但得到的模型准确率仍不尽人意。此外,模型中的神经元数量庞大,对模型进行精细的剪枝操作将带来高昂的计算成本(Liu等,2017)。相关研究(Weber等,2020;Wang等,2020)提出了一种可证明的健壮的训练过程来抵御后门攻击,主要思路是利用随机平滑来减轻触发器的作用。在触发扰动有界的情况下,模型的稳健性是可以证明的,但是当触发扰动超过该界限时,防御效果会下降。
在实际场景中,模型往往是买家向计算资源雄厚的供应商购买得来的,买家多数情况下并不具备从模型本身出发移除后门所需的算力。因此,输入侧盲后门移除已成为当前的研究热点。例如,Doan等人(2020)提出了Februus方法,在神经网络模型前部署一个过滤器,用以切除输入图像中的触发器,并将其代替为中等灰度的块,或者通过生成对抗神经网络(generative adversarial network,GAN)补全图像。然而一旦触发器是全局的,干净样本的准确率会大幅降低,并且该防御手段在对抗可解释后门攻击(Fang和Choromanska,2022)时,防御性能会下降。Sarkar等人(2020)在训练模型外部署一个包装器来抑制触发器的影响,对于每个输入图像都生成多个副本,每个副本都添加一定的干扰噪声,然后将其输入到模型中得到多个预测标签,最终的预测标签由投票获得。这种方法在触发器对原图修改幅度较小时效果较好,当触发器强度较大时对后门的抑制效果会降低。ConFoc(content focus)(Villarreal-Vasquez和Bhargava,2020)强制模型关注输入图像的语义内容,忽视触发器,并生成与输入样本语义一样的标签(即使是带有触发器的图像也会被预测为真正的标签而不是目标标签)。该方案的限制是,触发器信息和图像语义信息不重叠,然而在通常情况下这并不总是贴合实际。
本文依托图像分类任务提出了一种基于输入的后门防御策略。与ConFoc类似,主要思路是在尽可能保留图像语义信息的前提下,抑制输入图像中的非语义信息,从而达到后门防御的目的。与ConFoc不同的是,不再要求图像的语义信息与触发信息是不重叠的,而是利用卷积神经网络自适应地去除图像中的非语义信息。具体地,用干净样本以不同的训练条件训练一组架构不同的卷积神经网络模型,称为分类器。然后根据这些分类器更新一个具有图像编解码功能的卷积神经网络,称为信息提纯网络(information purification network,IPN)。干净样本在经过IPN的编解码之后再输入到上述的分类器中,保证其预测标签保持不变,同时促使经过IPN编解码前后的图像之间的欧氏距离尽可能地大。
本文提出的防御策略并不需要事先知道分类器是否已经被注入后门,以及后门攻击的种类和触发器的扰动强度。并且在推理阶段,只需要在分类的前面部署一个根据相关分类任务训练好的IPN即可。IPN与分类器的架构和参数无关,只与分类任务相关,因此根据相同的分类任务只需要训练一次IPA即可,可以作为一种即插即用的安全服务。
1 威胁模型和预备知识
1.1 威胁模型
本文假设一个三方交易场景,买方需要将一些图像样本归类,但不具备强大的计算能力,因此向模型供应商购买训练好的分类器。然而买家购买的分类器可能是被注入后门的,为防御潜在的后门攻击,买方向可信的第三方购买IPN服务。本文假定第三方也拥有模型训练所需的计算资源,包括干净的数据集和算力资源,但是不知道潜在的后门模型架构和潜在触发器的类型和大小。
1.2 预备知识
一个训练好的卷积神经网络具有一定的鲁棒性,即对分类器的输入图像进行轻微扰动一般情况不会影响分类器准确率。为了更准确地表述,记分类器为
$ \left|C D A-N D A_\boldsymbol{\epsilon}\right|<\varepsilon, \boldsymbol{\epsilon}<\delta $ | (1) |
式中,
$ f(\boldsymbol{x})= \begin{cases}1 & C(\boldsymbol{x})=y \\ 0 & \text { 其他 }\end{cases} $ | (2) |
式中,
$ \sum\limits_{\boldsymbol{x}_i \in \boldsymbol{D}}\left[f\left(\boldsymbol{x}_i\right)-f\left(\boldsymbol{x}_i+N_{\boldsymbol{\epsilon}_i}\right)\right] /|\boldsymbol{D}|<\varepsilon, \boldsymbol{\epsilon}<\delta $ | (3) |
式中,
图 3是用数据集ImageNet10训练的PreAct ResNet18分类器在不同强度的噪声扰动下的分类准确率。横轴表示扰动强度
另一方面,虽然随机选择的噪声很少能改变分类器的预测行为,但是精心设计的微小扰动却可以使分类器预测错误,即对抗样本(Szegedy等,2014)。图 4是一个对抗样本的示例,其中的噪声图像是放大了50倍之后的视觉效果,实际的噪声强度为
对抗样本有多种生成算法(Goodfellow等,2014;Carlini和Wagner,2017;Moosavi-Dezfooli等,2017;刘复昌等,2022;王杨等,2022),但是大部分算法的思路都是在限制修改幅度的现况下使样本被预测错误。具体为
$ \min \limits_{x_i \in \boldsymbol{D}}\left\|\boldsymbol{r}_i\right\|+c \times\left[-Z\left(\boldsymbol{a d} \boldsymbol{v}_i\right)_t\right] $ | (4) |
式中,
2 方法
前面介绍了卷积神经网络针对随机噪声的微弱鲁棒性和关于对抗样本攻击的脆弱性。本文认为对抗扰动抑制了分类器对样本的语义信息的关注,从而误导样本被错误预测。而后门攻击则是在训练过程中,强制分类器更加关注样本中的触发器信息。如果将被触发器污染的样本也视为正常样本,由神经网络的微弱鲁棒性可知,触发器关于噪声扰动的鲁棒性也是有限的。并且被注入后门的模型在对抗样本攻击下也是脆弱的。一个自然的想法是采用对抗样本类似的手段,添加噪声抑制输入中的非语义信息,打破触发器的结构,从而达到防御后门攻击的目的。
本文方法主要包括模型预训练和样本非语义信息抑制两个模块。如图 5所示,在模型预训练过程中,用干净样本训练架构不同的4种模型。每种模型按照不同的训练条件(包括初始值、优化器、学习率、批处理大小和数据增强方式等)训练多个分类器。在样本非语义信息抑制模块中,本文选取U型架构的网络充当IPN,实现对原始样本的编解码。在每个训练epoch中,原始样本经过IPN的编解码之后随机输入到一个预训练好的分类器中。在优化的过程中分类器只提供梯度信息的回传,其参数保持不变。实施细节详述如下。
2.1 模型预训练
分类卷积神经网络可以视为一个分类器,利用卷积块从图像中提取语义特征,然后利用全连接层对这些语义特征进行分类。不同的模型架构提取的特征一般是不相同的。另外,模型的初始化参数、模型优化器、训练学习率、数据增强方式和批处理的大小都会影响到模型特征提取的结果。因此本文方法选取4种经典的网络架构VGG19(Visual Geometry Group Network)(Simonyan和Zisserman,2015)、ResNet18(residual network)(He等,2016a)、PreActResNet18(He等,2016b)和SimpleDLA(simple deep layer aggregation)(Yu等,2018)作为预训练模型的架构。针对每种模型,设置不同的初始化参数、优化器、学习率以及批处理大小分别用干净的数据集训练5次,得到20个干净的模型,它们构成的集合记为
$ \boldsymbol{C}=\left\{C_{k, h}\right\} $ | (5) |
式中,
每种架构中的5个模型的训练条件如表 1所示。其中SGD(stochastic gradient descent)是随机梯度下降优化器,Adam(adaptive moment estimation)是自适应距估计优化器(Kingma和Ba,2014)。除随机翻转和调整大小等数据增强操作外,表 1还列出了不同模型训练过程中特有的数据增强方式。
表 1
不同模型的训练条件
Table 1
Training conditions of different models
模型序号 | 优化器 | 学习率 | 批大小 | 数据增强方式 |
1 | Adam | 1E-4 | 64 | 随机剪裁 |
2 | Adam | 1E-4 | 32 | 中心剪裁 |
3 | Adam | 5E-4 | 64 | 随机剪裁 |
4 | SGD | 1E-3 | 64 | 随机剪裁 |
5 | Adam_max | 1E-3 | 64 | 随机剪裁 |
2.2 非语义信息抑制
对抗样本可以利用特定的扰动抑制分类器对图像语义信息的关注,从而导致样本被错误预测。而大部分后门攻击是在训练阶段向训练数据集中注入含有触发器的有毒样本,强迫分类器对触发器敏感,一旦在样本中发现触发器就将其预测为目标类别。但是为了兼顾干净样本的分类准确率
对于一个干净样本
$ \min \left|Z_k\left(\boldsymbol{x}_i^{\prime}\right)_y-1\right| $ | (6) |
式中,
$ \max \left\|\boldsymbol{r}_i\right\|=\left\|\boldsymbol{x}_i^{\prime}-\boldsymbol{x}_i\right\| $ | (7) |
式中,‖·‖是一种衡量两种样本相似性的度量,本文中取欧氏距离。
如图 6所示,这种扰动方式保证强化样本
用对抗样本生成算法生成单个样本对应的强化样本迁移性太差。为克服这一弱点,本文选择卷积神经网络
$ \mathop {\min }\limits_{\scriptstyle x_i \in \boldsymbol{D} \atop \scriptstyle k \in\{1, 2, 3, 4\} ;h \in\{1, 2, 3, 4, 5\} }\left|Z_{k, h}\left(\boldsymbol{x}_i\right)-Z_{k, h}\left(\boldsymbol{x}_i^{\prime}\right)\right| $ | (8) |
$ \max \limits_{x_i \in \boldsymbol{D}}\left\|\boldsymbol{x}_i^{\prime}-\boldsymbol{x}_i\right\| $ | (9) |
式中,
$ \mathcal{L}=\mathcal{L}_1+\lambda \mathcal{L}_2 $ | (10) |
式中,
$ \mathcal{L}_1=\frac{1}{N} \sum\limits_{i=1}^N \sum\limits_{j=1}^M\left(Z_{k, h}\left(\boldsymbol{x}_i^{\prime}\right)\right)_j \log \left(Z_{k, h}\left(\boldsymbol{x}_i\right)\right)_j $ | (11) |
$ \mathcal{L}_2=\frac{1}{N} \sum\limits_{i=1}^N\left(\boldsymbol{x}_i^{\prime}-\boldsymbol{x}_i\right)^2 $ | (12) |
式中,
3 结果
3.1 实验设置
实验均在NVIDIA GeForce RTX 3090显卡上运算。执行环境为Python 3.8.5,Pytorch版本为1.9.1。采用的数据集为CIFAR10、MNIST和Image- Net10。其中数据集CIFAR10包含10个类别60 000幅大小为3×32×32彩色图像,选取其中50 000幅作为训练集,其余为测试集。手写数字数据集MNIST由数字0~9的手写体构成,共包含70 000幅灰度图像,调整大小为32×32像素并选取其中60 000幅作为训练集,其余为测试集。本文从ImageNet数据集中随机选取10类,共计12 831幅图像,构造出数据集ImageNet10,随机选取10 264幅作为训练集,剩余2 567幅作为测试集。信息提纯网络
3.2 实验结果和分析
实验验证了本文提出策略的防御性能、
3.2.1 防御性能和$ CDA$ 保真性能
为了更好地展示提出策略的防御性能,本文考虑了5种潜在的后门攻击方式,它们的触发器如图 7所示。可以看出,纯色直线的视觉效果最为明显,3×3的棋盘格需要放大图像查看,而全局噪声在视觉上几乎无法分辨。
信息提纯网络
表 2
IPN对抗ImageNet10中不同后门攻击的防御性能
Table 2
The performance of IPN against different backdoor attacks with ImageNet10 dataset
/% | |||||||||||||||||||||||||||||
触发器种类 | 初始样本 | 强化样本 | |||||||||||||||||||||||||||
纠正率 | |||||||||||||||||||||||||||||
图 7(a) | 93.845 | 9.583 | 92.365 | 9.233 | NA | ||||||||||||||||||||||||
图 7(b) | 94.157 | 100.000 | 91.702 | 20.764 | 81.067 | ||||||||||||||||||||||||
图 7(c) | 94.507 | 100.000 | 92.715 | 9.388 | 92.326 | ||||||||||||||||||||||||
图 7(d) | 93.494 | 99.961 | 92.637 | 9.310 | 92.443 | ||||||||||||||||||||||||
图 7(e) | 93.962 | 100.000 | 92.988 | 9.272 | 92.988 | ||||||||||||||||||||||||
图 7(f) | 91.858 | 99.961 | 90.767 | 8.492 | 90.261 | ||||||||||||||||||||||||
注:NA表示无有效数据。 |
为了验证提出策略的跨数据集的防御性能,在CIFAR10和MNIST数据集进行两组实验。考虑到两个个数据集中图像尺寸较小,不再设计纯色直线的触发器,另外数据集MNIST的图像在角落处全是0,棋盘格也不合适。因此针对这两个数据集,设计3种局部触发器,如图 8所示。图 8(a)(b)都是CIFAR10数据集中的触发器,图 8(c)是MNIST数据集中的触发器。除此之外, 针对每个数据集用对应尺寸的棋盘格(强度
选取图 8(a)(b)以及一个强度
表 3
IPN对抗CIFAR10中不同后门攻击的防御性能
Table 3
The performance of IPN against different backdoor attacks with CIFAR10 dataset
/% | |||||||||||||||||||||||||||||
触发器种类 | 初始样本 | 强化样本 | |||||||||||||||||||||||||||
纠正率 | |||||||||||||||||||||||||||||
干净模型 | 92.93 | 9.96 | 91.54 | 10.38 | NA | ||||||||||||||||||||||||
中心 | 92.68 | 100.00 | 91.36 | 20.23 | 75.79 | ||||||||||||||||||||||||
右下角 | 93.12 | 100.00 | 91.66 | 10.28 | 91.36 | ||||||||||||||||||||||||
全局 | 92.52 | 99.39 | 91.39 | 17.05 | 83.52 | ||||||||||||||||||||||||
注:NA表示无有效数据。 |
选取图 8(c)和一个强度
表 4
IPN对抗MNIST中不同后门攻击的防御性能
Table 4
The performance of IPN against different backdoor attacks with MNIST dataset
/% | |||||||||||||||||||||||||||||
触发器位置 | 初始样本 | 强化样本 | |||||||||||||||||||||||||||
纠正率 | |||||||||||||||||||||||||||||
干净模型 | 99.35 | 9.85 | 98.80 | 9.90 | NA | ||||||||||||||||||||||||
右下角 | 99.23 | 100.00 | 98.68 | 15.88 | 90.60 | ||||||||||||||||||||||||
全局 | 99.31 | 100.00 | 98.74 | 9.92 | 98.76 | ||||||||||||||||||||||||
注:NA表示无有效数据。 |
综合表 2—表 4,本文提出的信息提纯网络策略对在多个数据集上的不同触发器都有良好的防御能力。对于大部分后门样本,IPN都可以在保持干净样本的准确率的同时将
为探索触发器的显著性与IPN的防御性能之间的关系,选取PreActResNet18和U-Net分别作为后门模型和IPN的架构进行实验。触发器设计为与样本尺寸一样的棋盘格,后门模型和IPN的训练都在ImageNet10数据集上实施,训练超参数
表 5
触发器强度对防御性能的影响
Table 5
The impact of trigger size on defense performance
/% | |||||||||||||||||||||||||||||
触发器强度 |
初始样本 | 强化样本 | |||||||||||||||||||||||||||
纠正率 | |||||||||||||||||||||||||||||
0 | 93.845 | 9.583 | 92.365 | 9.233 | NA | ||||||||||||||||||||||||
5 | 91.858 | 99.961 | 90.767 | 8.492 | 90.261 | ||||||||||||||||||||||||
10 | 93.689 | 99.961 | 91.858 | 10.206 | 90.728 | ||||||||||||||||||||||||
15 | 93.533 | 99.961 | 91.430 | 8.142 | 89.053 | ||||||||||||||||||||||||
20 | 92.871 | 100.000 | 91.235 | 9.349 | 88.274 | ||||||||||||||||||||||||
30 | 93.845 | 100.000 | 92.287 | 9.116 | 87.067 | ||||||||||||||||||||||||
40 | 93.377 | 100.000 | 91.819 | 11.453 | 83.716 | ||||||||||||||||||||||||
50 | 93.299 | 100.000 | 90.845 | 13.829 | 77.756 | ||||||||||||||||||||||||
注:NA表示无有效数据。 |
为了更加直观地呈现触发器强度和IPN对图像视觉质量的影响,图 9列举了不同强度的触发器对应的后门样本和强化样本。第1行是不同强度触发器对应的后门样本,第2行为强化样本,触发器强度依次为0, 5, 10, 15, 20, 30, 40, 50。可以看出,随着强度的增加,图像中的触发器在视觉上越来越显著,这一点在图像的平坦区域更为明显。如图 9中样本右上角的天空和中间的牛鞍部分都可以看出明显的棋盘格结构。这意味着如果进一步增加触发器的强度,将会引起图像视觉质量的下降,这与后门攻击的隐蔽性原则相违背。同时,图 9显示,强化样本的视觉质量与其对应的后门样本相当,但是触发器的结构被破坏,在强化样本中没有大面积出现棋盘格结构。这说明IPN编解码过程在抑制图像的非语义信息的同时,保留了大部分图像语义信息。这一点印证了之前的猜想。
下面讨论后门模型架构对IPN性能的影响。仍然使用U-Net作为IPN的架构,平衡系数
表 6
网络构架对IPN性能的影响
Table 6
The impact of network architecture on the performance of IPN
/% | |||||||||||||||||||||||||||||
网络架构 | 模型种类 | 初始样本 | 强化样本 | ||||||||||||||||||||||||||
纠正率 | |||||||||||||||||||||||||||||
VGG19 | 干净 | 93.884 | 9.038 | 92.520 | 8.687 | NA | |||||||||||||||||||||||
后门 | 93.611 | 100.000 | 91.975 | 8.376 | 91.975 | ||||||||||||||||||||||||
ResNet18 | 干净 | 93.339 | 9.778 | 92.559 | 9.778 | NA | |||||||||||||||||||||||
后门 | 92.365 | 100.000 | 91.975 | 9.233 | 92.014 | ||||||||||||||||||||||||
PreActResNet18 | 干净 | 93.222 | 9.466 | 91.858 | 9.310 | NA | |||||||||||||||||||||||
后门 | 93.962 | 100.000 | 92.988 | 9.272 | 92.988 | ||||||||||||||||||||||||
SimpleDLA | 干净 | 93.572 | 9.583 | 92.676 | 9.466 | NA | |||||||||||||||||||||||
后门 | 94.118 | 100.000 | 93.105 | 9.310 | 93.105 | ||||||||||||||||||||||||
注:NA表示无有效数据。 |
3.2.2 平衡超参数$ \lambda$ 的影响
在IPN训练过程中,平衡超参数
若式(8)(9)表示的是凸优化问题,那么只要平衡超参数
表 7显示了不同
表 7
平衡超参数
Table 7
The impact of balance hyperparameters
/% | |||||||||||||||||||||||||||||
干净模型 | 后门模型 | ||||||||||||||||||||||||||||
纠正率 | |||||||||||||||||||||||||||||
基线 | 92.92 | 10.14 | 93.12 | 100.00 | NA | ||||||||||||||||||||||||
0.5 | 92.32 | 10.16 | 92.54 | 27.34 | 79.08 | ||||||||||||||||||||||||
1 | 92.09 | 10.14 | 92.29 | 14.22 | 89.49 | ||||||||||||||||||||||||
1.5 | 91.67 | 9.67 | 91.66 | 10.28 | 91.36 | ||||||||||||||||||||||||
2 | 89.74 | 11.42 | 89.28 | 11.29 | 88.38 | ||||||||||||||||||||||||
3 | 90.04 | 10.22 | 86.96 | 13.30 | 86.04 | ||||||||||||||||||||||||
注:NA表示无有效数据。 |
图 10第1行是未经IPN编解码的干净样本和后门样本;第2—4行是取不同的
3.2.3 与其他方案的性能对比
在实际的应用场景中,本文提出的防御策略是即插即用的,这一点与需要事先拿到被保护模型的Februus(Doan等,2020)和需要重训练的ConFoc (Villarreal-Vasquez和Bhargava,2020)不同,但是与Sarkar等人(2020)提出的即插即用包装器策略十分相似,该策略对于每个输入图像都生成多个副本,每个副本都添加不同类型和强度的干扰噪声,然后将其输入到分类器中得到多个预测标签,最终通过投票获得最终的预测标签。在MNIST数据集上将提出策略与Sarkar等人(2020)的策略进行对照实验,分类器的架构设置为VGG11,后门攻击的触发器设置为全局强度
表 8
IPN与Sarkar等人(2020)方案的防御性能
Table 8
Defense performance of IPN and Sarkar et al. (2020)
/% | |||||||||||||||||||||||||||||
触发器类型 | 全局 | 右下角 | |||||||||||||||||||||||||||
原始模型 | 99.31 | 100.00 | 99.23 | 100.00 | |||||||||||||||||||||||||
Sarkar(3) | 99.06 | 69.38 | 99.22 | 100.00 | |||||||||||||||||||||||||
Sarkar(5) | 92.89 | 9.46 | 99.15 | 100.00 | |||||||||||||||||||||||||
Sarkar(9) | 94.28 | 9.64 | 99.15 | 100.00 | |||||||||||||||||||||||||
IPA | 98.74 | 9.92 | 98.68 | 15.88 |
4 结论
针对图像分类神经网络提出一种基于非语义信息抑制的后门防御方法。该方法不需要关于后门触发器和待保护模型的先验知识,只需要对分类器的输入进行编解码处理就可以去除触发器的影响,从而达到防御目的。编解码过程是通过信息提纯网络实现的,信息提纯网络也是一个卷积神经网络,其作用是抑制图像中的非语义信息。信息提纯网络的训练只需要干净的训练数据和被保护模型的分类任务即可,不再需要有关待保护模型的其他任何先验知识,如触发器信息、模型架构信息等。在实施防御时,只需要将信息提纯网络部署在待保护模型前对输入样本进行编解码即可,可以实现即插即用,相对其他的防御策略更加灵活实用。本文在3种数据集上模拟了多种潜在的后门攻击方式,实验结果显示,本文提出的防御策略在抵抗各种潜在的攻击时都展现了较好的防御性能。
在未来的研究中,将优化信息提纯网络的训练方式以进一步增强对干净样本的分类准确率,并将该防御策略推广到其他任务和其他学科,如人脸识别、自动驾驶和信用风险评估等任务。
参考文献
-
Biggio B, Corona I, Maiorca D, Nelson B, Šrndić N, Laskov P, Giacinto G and Roli F. 2013. Evasion attacks against machine learning at test time//Proceedings of 2013 Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Prague, Czech Republic: Springer: 387-402 [DOI: 10.1007/978-3-642-40994-3_25]
-
Carlini N and Wagner D. 2017. Towards evaluating the robustness of neural networks//2017 IEEE Symposium on Security and Privacy (S&P). San Jose, USA: IEEE: 39-57 [DOI: 10.1109/SP.2017.49]
-
Collobert R and Weston J. 2008. A unified architecture for natural language processing: deep neural networks with multitask learning//Proceedings of the 25th International Conference on Machine Learning. Helsinki, Finland: Association for Computing Machinery: 160-167 [DOI: 10.1145/1390156.1390177]
-
Ding C X, Tao D C. 2015. Robust face recognition via multimodal deep face representation. IEEE Transactions on Multimedia, 17(11): 2049-2058 [DOI:10.1109/TMM.2015.2477042]
-
Doan B G, Abbasnejad E and Ranasinghe D C. 2020. Februus: input purification defense against trojan attacks on deep neural network systems//Proceedings of 2020 Annual Computer Security Applications Conference. Austin, USA: ACM: 897-912 [DOI: 10.1145/3427228.3427264]
-
Fang S H and Choromanska A. 2022. Backdoor attacks on the DNN interpretation system//The 36th AAAI Conference on Artificial Intelligence, AAAI 2022, the 34th Conference on Innovative Applications of Artificial Intelligence, IAAI 2022, the 12th Symposium on Educational Advances in Artificial Intelligence. Palo Alto, USA: AAAI Press: [s. n. ]
-
Gao Y S, Doan B G, Zhang Z, Ma S Q, Zhang J L, Fu A M, Nepal S and Kim H. 2020. Backdoor attacks and countermeasures on deep learning: a comprehensive review[EB/OL]. [2022-04-23]. https://arxiv.org/pdf/2007.10760.pdf
-
Girshick R. 2015. Fast R-CNN//Proceedings of 2015 IEEE International Conference on Computer Vision (ICCV). Santiago, Chile: IEEE: 1440-1448 [DOI: 10.1109/ICCV.2015.169]
-
Goodfellow I J, Shlens J and Szegedy C. 2014. Explaining and harnessing adversarial examples//Proceedings of the 3rd International Conference on Learning Representations. San Diego, USA: [s. n. ]
-
Gu T Y, Dolan-Gavitt B and Garg S. 2017. BadNets: identifying vulnerabilities in the machine learning model supply chain[EB/OL]. [2022-08-22]. https://arxiv.org/pdf/1708.06733.pdf
-
Gu T Y, Liu K, Dolan-Gavitt B, Garg S. 2019. BadNets: evaluating backdooring attacks on deep neural networks. IEEE Access, 7: 47230-47244 [DOI:10.1109/ACCESS.2019.2909068]
-
He K M, Zhang X Y, Ren S Q and Sun J. 2016a. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA: IEEE: 770-778 [DOI: 10.1109/CVPR.2016.90]
-
He K M, Zhang X Y, Ren S Q and Sun J. 2016b. Identity mappings in deep residual networks//Proceedings of the 14th European Conference on Computer Vision (ECCV). Amsterdam, the Netherlands: Springer: 630-645 [DOI: 10.1007/978-3-319-46493-0_38]
-
Hidasi B, Quadrana M, Karatzoglou A and Tikk D. 2016. Parallel recurrent neural network architectures for feature-rich session-based recommendations//Proceedings of the 10th ACM Conference on Recommender Systems. Boston, USA: ACM: 241-248 [DOI: 10.1145/2959100.2959167]
-
Kingma D P and Ba J. 2014. Adam: a method for stochastic optimization//Proceedings of the 3rd International Conference on Learning Representations. San Diego, USA: [s. n. ]
-
Kristjanpoller W, Minutolo M C. 2015. Gold price volatility: a forecasting approach using the artificial neural network——GARCH model. Expert Systems with Applications, 42(20): 7245-7251 [DOI:10.1016/j.eswa.2015.04.058]
-
Krizhevsky A, Sutskever I and Hinton G E. 2012. ImageNet classification with deep convolutional neural networks//Proceedings of the 25th International Conference on Neural Information Processing Systems. Lake Tahoe, USA: Curran Associates Inc: 1097-1105
-
Liu F C, Nan B, Miao Y W. 2022. Point cloud replacement adversarial attack based on saliency map. Journal of Image and Graphics, 27(2): 500-510 (刘复昌, 南博, 缪永伟. 2022. 基于显著性图的点云替换对抗攻击. 中国图象图形学报, 27(2): 500-510) [DOI:10.11834/jig.210546]
-
Liu K, Dolan-Gavitt B and Garg S. 2018. Fine-pruning: defending against backdooring attacks on deep neural networks//Proceedings of the 21st International Symposium on Research in Attacks, Intrusions, and Defenses. Heraklion, Greece: Springer: 273-294 [DOI: 10.1007/978-3-030-00470-5_13]
-
Liu Y T, Xie Y and Srivastava A. 2017. Neural trojans//Proceedings of 2017 IEEE International Conference on Computer Design (ICCD). Boston, USA: IEEE: 45-48 [DOI: 10.1109/ICCD.2017.16]
-
Melis M, Demontis A, Biggio B, Brown G, Fumera G and Roli F. 2017. Is deep learning safe for robot vision? Adversarial examples against the iCub humanoid//Proceedings of 2017 IEEE International Conference on Computer Vision Workshops. Venice, Italy: IEEE: 751-759 [DOI: 10.1109/iccvw.2017.94]
-
Mnih V, Kavukcuoglu K, Silver D, Rusu A A, Veness J, Bellemare M G, Graves A, Riedmiller M, Fidjeland A K, Ostrovski G, Petersen S, Beattie C, Sadik A, Antonoglou I, King H, Kumaran D, Wierstra D, Legg S, Hassabis D. 2015. Human-level control through deep reinforcement learning. Nature, 518(7540): 529-533 [DOI:10.1038/nature14236]
-
Mohammadi N, Zangeneh M. 2016. Customer credit risk assessment using artificial neural networks. Information Technology and Computer Science, 8(3): 58-66 [DOI:10.5815/ijitcs.2016.03.07]
-
Moosavi-Dezfooli S M, Fawzi A, Fawzi O and Frossard P. 2017. Universal adversarial perturbations//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE: 86-94 [DOI: 10.1109/CVPR.2017.17]
-
Ronneberger O, Fischer P and Brox T. 2015. U-Net: convolutional networks for biomedical image segmentation//Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI). Munich, Germany: Springer: 234-241 [DOI: 10.1007/978-3-319-24574-4_28]
-
Sarkar E, Alkindi Y, Maniatakos M. 2020. Backdoor suppression in neural networks using input fuzzing and majority voting. IEEE Design and Test, 37(2): 103-110 [DOI:10.1109/MDAT.2020.2968275]
-
Simonyan K and Zisserman A. 2015. Very deep convolutional networks for large-scale image recognition//Proceedings of the 3rd International Conference on Learning Representations. San Diego, USA: [s. n. ]
-
Sun S, Zhang W M, Fang H, Yu N H. 2022. Automatic generation of Chinese document watermarking fonts. Journal of Image and Graphics, 27(1): 262-276 (孙杉, 张卫明, 方涵, 俞能海. 2022. 中文水印字库的自动生成方法. 中国图象图形学报, 27(1): 262-276) [DOI:10.11834/jig.200695]
-
Szegedy C, Liu W, Jia Y Q, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V and Rabinovich A. 2015. Going deeper with convolutions//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 1-9 [DOI: 10.1109/CVPR.2015.7298594]
-
Szegedy C, Zaremba W, Sutskever I, Bruna J, Erhan D, Goodfellow I J and Fergus R. 2014. Intriguing properties of neural networks//Proceedings of the 2nd International Conference on Learning Representations. Banff, USA: [s. n. ]
-
Villarreal-Vasquez M and Bhargava B. 2020. ConFoc: content-focus protection against trojan attacks on neural networks[EB/OL]. [2022-07-01]. https://arxiv.org/pdf/2007.00711.pdf
-
Wang B H, Cao X Y, Jia J Y and Gong N Z. 2020. On certifying robustness against backdoor attacks via randomized smoothing//Proceedings of CVPR 2020 Workshop on Adversarial Machine Learning in Computer Vision. [s. l. ]: [s. n. ]
-
Wang B L, Yao Y S, Shan S, Li H Y, Viswanath B, Zheng H T and Zhao B Y. 2019. Neural cleanse: identifying and mitigating backdoor attacks in neural networks//Proceedings of 2019 IEEE Symposium on Security and Privacy (S&P). San Francisco, USA: IEEE: 707-723 [DOI: 10.1109/SP.2019.00031]
-
Wang Y, Cao T Y, Yang J B, Zheng Y F, Fang Z, Deng X T. 2022. A perturbation constraint related weak perceptual adversarial example generation method. Journal of Image and Graphics, 27(7): 2287-2299 (王杨, 曹铁勇, 杨吉斌, 郑云飞, 方正, 邓小桐. 2022. 结合扰动约束的低感知性对抗样本生成方法. 中国图象图形学报, 27(7): 2287-2299) [DOI:10.11834/jig.200681]
-
Weber M, Xu X J, Karlaš B, Zhang C and Li B. 2020. RAB: provable robustness against backdoor attacks[EB/OL]. [2022-06-21]. https://arxiv.org/pdf/2003.08904.pdf
-
Xu K, Ba J, Kiros R, Cho K, Courville A C, Salakhutdinov R, Zemel R S and Bengio Y. 2015. Show, attend and tell: neural image caption generation with visual attention//Proceedings of the 32nd International Conference on Machine Learning. Lille, France: JMLR. org: 2048-2057
-
Yin X L, Lu W, Zhang J H, Luo X Y. 2022. Robust JPEG steganography based on lossless carrier and robust cost. Journal of Image and Graphics, 27(1): 238-251 (尹晓琳, 卢伟, 张俊鸿, 罗向阳. 2022. 无损载体和鲁棒代价结合的JPEG图像鲁棒隐写. 中国图象图形学报, 27(1): 238-251) [DOI:10.11834/jig.210406]
-
Yu F, Wang D Q, Shelhamer E and Darrell T. 2018. Deep layer aggregation//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 2403-2412 [DOI: 10.1109/CVPR.2018.00255]
-
Zhang X P and Chen C. 2015. Research on credit risk evaluation for small and medium-sized enterprises in supply chain based on BP neural network//Proceedings of 2015 International Conference on Computational Science and Engineering (ICCSE). [s. l. ]: Atlantis Press: 213-217 [DOI: 10.2991/iccse-15.2015.37]