Non-semantic information suppression relevant backdoor defense implementation

Yusheng Guo; Zhenxing Qian; Xinpeng Zhang; Hongfeng Chai

doi:10.11834/jig.220421

Model Security | Views : 0 下载量: 0 CSCD: 0

PDF
Export
Share
Collection
Album

Non-semantic information suppression relevant backdoor defense implementation
Vol. 28, Issue 3, Pages: 836-849(2023)
Published： 16 March 2023 ，

Accepted： 10 November 2022
DOI： 10.11834/jig.220421
稿件说明：

移动端阅览

Yusheng Guo, Zhenxing Qian, Xinpeng Zhang, Hongfeng Chai. Non-semantic information suppression relevant backdoor defense implementation. [J]. Journal of Image and Graphics 28(3):836-849(2023)
DOI：

Yusheng Guo, Zhenxing Qian, Xinpeng Zhang, Hongfeng Chai. Non-semantic information suppression relevant backdoor defense implementation. [J]. Journal of Image and Graphics 28(3):836-849(2023) DOI： 10.11834/jig.220421.

摘要

目的

后门攻击已成为目前卷积神经网络所面临的重要威胁。然而，当下的后门防御方法往往需要后门攻击和神经网络模型的一些先验知识，这限制了这些防御方法的应用场景。本文依托图像分类任务提出一种基于非语义信息抑制的后门防御方法，该方法不再需要相关的先验知识，只需要对网络的输入进行编解码处理就可以达到后门防御的目的。

方法

核心思想是在保持图像语义不改变的同时，尽量削弱原始样本中与图像语义不相关的信息，以此抑制触发器。通过在待保护模型前添加一个即插即用的U型网络(即信息提纯网络)来实现对图像非语义信息的抑制。其输入是干净的初始样本，输出命名为强化样本。具体的训练过程中，首先用不同的训练超参数训练多个结构不一的干净分类器，然后在保持强化样本被上述分类器正确分类的前提下，优化信息提纯网络使强化样本和原始样本之间的差异尽可能地大。

结果

实验在MNIST、CIFAR10和ImageNet10数据集上进行。实验结果显示，经过信息提纯网络编解码后，干净样本的分类准确率略有下降，后门攻击成功率大幅降低，带有触发器的样本以接近干净样本的准确率被正确预测。

结论

提出的非语义信息抑制防御方法能够在不需要相关先验知识的情况下将含触发器的样本纠正为正常样本，并且保持对干净样本的分类准确率。

Abstract

Objective

The emerging convolutional neural networks (CNNs) have shown its potentials in the context of computer science

electronic information

mathematics

and finance. However

the security issue is challenged for multiple domains. It is capable to use the neural network model to predict the samples with triggers as target labels in the inference stage through adding the samples with triggers to the data set and changing the labels of samples to target labels in the training process of supervised learning. Backdoor attacks have threaten the interests of model owners severely

especially in high value-added areas like financial security. To preserve backdoor attacks-derived neural network model

a series of defense strategies are implemented. However

conventional defense methods are often required for the prior knowledge of backdoor attack methods or neural network models in relevant to the type and size of the trigger

which is inconsistent and limits the application scenarios of defense methods. To resolve this problem

we develop a backdoor defense method based on input-modified image classification task

called information purification network (IPN). The process of the IPNcan eliminates the impact of the trigger-added samples.

Method

To alleviate a large amount of redundant information in image samples

we segment the image information into two categories: 1) classification task-oriented semantic information

and 2) classification task-inrelevant non-semantic information. To get the sample being predicted as the target label for interpretation

backdoor attack can enforce the model to pay attention to the non-semantic information of the sample during the model training process. To suppress the noise of trigger

our IPN is demonstrated as a CNN used for encoding and decoding the input samples

which aims to keep the image semantics unchanged via minimizing the non-semantic information in the original samples. The inputs to the IPN are as the clean samples

as well as the outputs are as the modified samples. For specific training

first

several clean classifiers are trained on the basis of multiple structures and training hyperparameters. Then

the IPN is optimized to make the difference between the modified sample and the original sample as large as possible on the premise of keeping the modified sample correctly predicted by the above classifier. The loss function consists of two aspects as mentioned below: 1) semantic information retention

and 2) non-semantic information suppression. To alleviate the difference between the sample and the original sample

the weight of the two parts of the loss function can be balanced. The process of IPN-related sample decoding can disrupt the structure of the trigger. Therefore

the sample will not be predicted as the target label even if the model is injected backdoor. In addition

due to the semantic information in the samples image is not weakened

trigger-involved samples can be used to predict the correct labels whether the model is injected into the backdoor or not.

Result

All experiments are performed on NVIDIA GeForce RTX 3090 graphics card. The execution environment is Python 3.8.5 with Pytorch version 1.9.1. The datasets are tested in relevant to CIFAR10

MNIST

and Image-Net10. The ImageNet10 dataset is constructed in terms of selecting 10 categories from the ImageNet dataset in random

which are composed of 12 831 images in total. We randomly selected 10 264 images as the training dataset

and the remaining 2 567 images as the test dataset. The architecture of the IPN is U-Net. To evaluate the defense performance of the proposed strategy in detail

a variety of different triggers are used to implement backdoor attacks. For MNIST datasets

the classification accuracy of the clean model for the initial clean sample is 99%. We use two different triggers to implement backdoor attacks as well. Each average classification accuracy of clean samples is 99%

and the success rates of backdoor attacks are 100%. After all samples are encoded and decoded by the IPN

the classification accuracy of clean samples is remained in consistent

while the success rate of backdoor attacks dropped to 10%

and the backdoor samples are predicted to be correctly labeled 98% as well. The experimental results are similar to MNIST for the other two datasets. While the classification accuracy of clean samples decreases slightly

the success rate of backdoor attacks is optimized about 10%

and the backdoor samples are correctly predicted with high accuracy. It should be mentioned that the intensity and size of the triggers can impact the defensive performance of the proposed strategy to a certain extent. The weight between the two parts of the loss function will affect the accuracy of clean samples. The weight of non-semantic information suppression loss is positive correlated to the difference of images and negative correlated to the classification accuracy of clean samples.

Conclusion

Our proposed strategy is not required any prior knowledge for triggers and the models to be protected. The classification accuracy of clean samples can keep unchanged

and the success rate of backdoor attack is equivalent to random guess

and the backdoor samples will be predicted as correct labels by classifiers

regardless of the problem of classifiers are injected into the backdoor. The training of the IPN is required on clean training data and the task of the protected model only. In the implementation of defense

the IPN can just be configured to predominate the protected model for input sample preprocessing. Multiple backdoor attacks are simulated on the three mentioned data sets. Experimental results show that our defense strategy is an optimized implementation for heterogeneity.

关键词

卷积神经网络(CNN)模型安全图像分类神经网络后门后门防御

Keywords

convolutional neural network(CNN)model securityimage classificationneural network backdoorbackdoor defense

references

Biggio B, Corona I, Maiorca D, Nelson B, Šrndić N, Laskov P, Giacinto G and Roli F. 2013. Evasion attacks against machine learning at test time//Proceedings of 2013 Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Prague, Czech Republic: Springer: 387-402 [DOI: 10.1007/978-3-642-40994-3_25http://dx.doi.org/10.1007/978-3-642-40994-3_25]

Carlini N and Wagner D. 2017. Towards evaluating the robustness of neural networks//2017 IEEE Symposium on Security and Privacy (S&P). San Jose, USA: IEEE: 39-57 [DOI: 10.1109/SP.2017.49http://dx.doi.org/10.1109/SP.2017.49]

Collobert R and Weston J. 2008. A unified architecture for natural language processing: deep neural networks with multitask learning//Proceedings of the 25th International Conference on Machine Learning. Helsinki, Finland: Association for Computing Machinery: 160-167 [DOI: 10.1145/1390156.1390177http://dx.doi.org/10.1145/1390156.1390177]

Ding C X and Tao D C. 2015. Robust face recognition via multimodal deep face representation. IEEE Transactions on Multimedia, 17(11): 2049-2058 [DOI: 10.1109/TMM.2015.2477042]

Doan B G, Abbasnejad E and Ranasinghe D C. 2020. Februus: input purification defense against trojan attacks on deep neural network systems//Proceedings of 2020 Annual Computer Security Applications Conference. Austin, USA: ACM: 897-912 [DOI: 10.1145/3427228.3427264http://dx.doi.org/10.1145/3427228.3427264]

Fang S H and Choromanska A. 2022. Backdoor attacks on the DNN interpretation system//The 36th AAAI Conference on Artificial Intelligence, AAAI 2022, the 34th Conference on Innovative Applications of Artificial Intelligence, IAAI 2022, the 12th Symposium on Educational Advances in Artificial Intelligence. Palo Alto, USA: AAAI Press: [s. n.]

Gao Y S, Doan B G, Zhang Z, Ma S Q, Zhang J L, Fu A M, Nepal S and Kim H. 2020. Backdoor attacks and countermeasures on deep learning: a comprehensive review[EB/OL]. [2022-04-23].https://arxiv.org/pdf/2007.10760.pdfhttps://arxiv.org/pdf/2007.10760.pdf

Girshick R. 2015. Fast R-CNN//Proceedings of 2015 IEEE International Conference on Computer Vision (ICCV). Santiago, Chile: IEEE: 1440-1448 [DOI: 10.1109/ICCV.2015.169http://dx.doi.org/10.1109/ICCV.2015.169]

Goodfellow I J, Shlens J and Szegedy C. 2014. Explaining and harnessing adversarial examples//Proceedings of the 3rd International Conference on Learning Representations. San Diego, USA: [s. n.]

Gu T Y, Dolan-Gavitt B and Garg S. 2017. BadNets: identifying vulnerabilities in the machine learning model supply chain[EB/OL]. [2022-08-22].https://arxiv.org/pdf/1708.06733.pdfhttps://arxiv.org/pdf/1708.06733.pdf

Gu T Y, Liu K, Dolan-Gavitt B and Garg S. 2019. BadNets: evaluating backdooring attacks on deep neural networks. IEEE Access, 7: 47230-47244 [DOI: 10.1109/ACCESS.2019.2909068]

He K M, Zhang X Y, Ren S Q and Sun J. 2016a. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA: IEEE: 770-778 [DOI: 10.1109/CVPR.2016.90http://dx.doi.org/10.1109/CVPR.2016.90]

He K M, Zhang X Y, Ren S Q and Sun J. 2016b. Identity mappings in deep residual networks//Proceedings of the 14th European Conference on Computer Vision (ECCV). Amsterdam, the Netherlands: Springer: 630-645 [DOI: 10.1007/978-3-319-46493-0_38http://dx.doi.org/10.1007/978-3-319-46493-0_38]

Hidasi B, Quadrana M, Karatzoglou A and Tikk D. 2016. Parallel recurrent neural network architectures for feature-rich session-based recommendations//Proceedings of the 10th ACM Conference on Recommender Systems. Boston, USA: ACM: 241-248 [DOI: 10.1145/2959100.2959167http://dx.doi.org/10.1145/2959100.2959167]

Kingma D P and Ba J. 2014. Adam: a method for stochastic optimization//Proceedings of the 3rd International Conference on Learning Representations. San Diego, USA: [s. n.]

Kristjanpoller W and Minutolo M C. 2015. Gold price volatility: a forecasting approach using the artificial neural network——GARCH model. Expert Systems with Applications, 42(20): 7245-7251 [DOI: 10.1016/j.eswa.2015.04.058]

Krizhevsky A, Sutskever I and Hinton G E. 2012. ImageNet classification with deep convolutional neural networks//Proceedings of the 25th International Conference on Neural Information Processing Systems. Lake Tahoe, USA: Curran Associates Inc: 1097-1105

Liu F C, Nan B and Miao Y W. 2022. Point cloud replacement adversarial attack based on saliency map. Journal of Image and Graphics, 27(2): 500-510

刘复昌, 南博, 缪永伟. 2022. 基于显著性图的点云替换对抗攻击. 中国图象图形学报, 27(2): 500-510 [DOI: 10.11834/jig.210546]

Liu K, Dolan-Gavitt B and Garg S. 2018. Fine-pruning: defending against backdooring attacks on deep neural networks//Proceedings of the 21st International Symposium on Research in Attacks, Intrusions, and Defenses. Heraklion, Greece: Springer: 273-294 [DOI: 10.1007/978-3-030-00470-5_13http://dx.doi.org/10.1007/978-3-030-00470-5_13]

Liu Y T, Xie Y and Srivastava A. 2017. Neural trojans//Proceedings of 2017 IEEE International Conference on Computer Design (ICCD). Boston, USA: IEEE: 45-48 [DOI: 10.1109/ICCD.2017.16http://dx.doi.org/10.1109/ICCD.2017.16]

Melis M, Demontis A, Biggio B, Brown G, Fumera G and Roli F. 2017. Is deep learning safe for robot vision? Adversarial examples against the iCub humanoid//Proceedings of 2017 IEEE International Conference on Computer Vision Workshops. Venice, Italy: IEEE: 751-759 [DOI: 10.1109/iccvw.2017.94http://dx.doi.org/10.1109/iccvw.2017.94]

Mnih V, Kavukcuoglu K, Silver D, Rusu A A, Veness J, Bellemare M G, Graves A, Riedmiller M, Fidjeland A K, Ostrovski G, Petersen S, Beattie C, Sadik A, Antonoglou I, King H, Kumaran D, Wierstra D, Legg S and Hassabis D. 2015. Human-level control through deep reinforcement learning. Nature, 518(7540): 529-533 [DOI: 10.1038/nature14236]

Mohammadi N and Zangeneh M. 2016. Customer credit risk assessment using artificial neural networks. Information Technology and Computer Science, 8(3): 58-66 [DOI: 10.5815/ijitcs.2016.03.07]

Moosavi-Dezfooli S M, Fawzi A, Fawzi O and Frossard P. 2017. Universal adversarial perturbations//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE: 86-94 [DOI: 10.1109/CVPR.2017.17http://dx.doi.org/10.1109/CVPR.2017.17]

Ronneberger O, Fischer P and Brox T. 2015. U-Net: convolutional networks for biomedical image segmentation//Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI). Munich, Germany: Springer: 234-241 [DOI: 10.1007/978-3-319-24574-4_28http://dx.doi.org/10.1007/978-3-319-24574-4_28]

Sarkar E, Alkindi Y and Maniatakos M. 2020. Backdoor suppression in neural networks using input fuzzing and majority voting. IEEE Design and Test, 37(2): 103-110 [DOI: 10.1109/MDAT.2020.2968275]

Simonyan K and Zisserman A. 2015. Very deep convolutional networks for large-scale image recognition//Proceedings of the 3rd International Conference on Learning Representations. San Diego, USA: [s. n.]

Sun S, Zhang W M, Fang H and Yu N H. 2022. Automatic generation of Chinese document watermarking fonts. Journal of Image and Graphics, 27(1): 262-276

孙杉, 张卫明, 方涵, 俞能海. 2022. 中文水印字库的自动生成方法. 中国图象图形学报, 27(1): 262-276 [DOI: 10.11834/jig.200695]

Szegedy C, Liu W, Jia Y Q, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V and Rabinovich A. 2015. Going deeper with convolutions//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 1-9 [DOI: 10.1109/CVPR.2015.7298594http://dx.doi.org/10.1109/CVPR.2015.7298594]

Szegedy C, Zaremba W, Sutskever I, Bruna J, Erhan D, Goodfellow I J and Fergus R. 2014. Intriguing properties of neural networks//Proceedings of the 2nd International Conference on Learning Representations. Banff, USA: [s. n.]

Villarreal-Vasquez M and Bhargava B. 2020. ConFoc: content-focus protection against trojan attacks on neural networks[EB/OL]. [2022-07-01].https://arxiv.org/pdf/2007.00711.pdfhttps://arxiv.org/pdf/2007.00711.pdf

Wang B H, Cao X Y, Jia J Y and Gong N Z. 2020. On certifying robustness against backdoor attacks via randomized smoothing//Proceedings of CVPR 2020 Workshop on Adversarial Machine Learning in Computer Vision. [s. l.]: [s. n.]

Wang B L, Yao Y S, Shan S, Li H Y, Viswanath B, Zheng H T and Zhao B Y. 2019. Neural cleanse: identifying and mitigating backdoor attacks in neural networks//Proceedings of 2019 IEEE Symposium on Security and Privacy (S&P). San Francisco, USA: IEEE: 707-723 [DOI: 10.1109/SP.2019.00031http://dx.doi.org/10.1109/SP.2019.00031]

Wang Y, Cao T Y, Yang J B, Zheng Y F, Fang Z and Deng X T. 2022. A perturbation constraint related weak perceptual adversarial example generation method. Journal of Image and Graphics, 27(7): 2287-2299

王杨, 曹铁勇, 杨吉斌, 郑云飞, 方正, 邓小桐. 2022. 结合扰动约束的低感知性对抗样本生成方法. 中国图象图形学报, 27(7): 2287-2299 [DOI: 10.11834/jig.200681]

Weber M, Xu X J, Karlaš B, Zhang C and Li B. 2020. RAB: provable robustness against backdoor attacks[EB/OL]. [2022-06-21].https://arxiv.org/pdf/2003.08904.pdfhttps://arxiv.org/pdf/2003.08904.pdf

Xu K, Ba J, Kiros R, Cho K, Courville A C, Salakhutdinov R, Zemel R S and Bengio Y. 2015. Show, attend and tell: neural image caption generation with visual attention//Proceedings of the 32nd International Conference on Machine Learning. Lille, France: JMLR. org: 2048-2057

Yin X L, Lu W, Zhang J H and Luo X Y. 2022. Robust JPEG steganography based on lossless carrier and robust cost. Journal of Image and Graphics, 27(1): 238-251

尹晓琳, 卢伟, 张俊鸿, 罗向阳. 2022. 无损载体和鲁棒代价结合的JPEG图像鲁棒隐写. 中国图象图形学报, 27(1): 238-251 [DOI: 10.11834/jig.210406]

Yu F, Wang D Q, Shelhamer E and Darrell T. 2018. Deep layer aggregation//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 2403-2412 [DOI: 10.1109/CVPR.2018.00255http://dx.doi.org/10.1109/CVPR.2018.00255]

Zhang X P and Chen C. 2015. Research on credit risk evaluation for small and medium-sized enterprises in supply chain based on BP neural network//Proceedings of 2015 International Conference on Computational Science and Engineering (ICCSE). [s. l.]: Atlantis Press: 213-217 [DOI: 10.2991/iccse-15.2015.37http://dx.doi.org/10.2991/iccse-15.2015.37]

Alert me when the article has been cited

提交

Adversarial attack method identification model based on multi-factor compression error

Comprehensive review of methods for vehicle logo recognition in intelligent transportation systems

Sparse adversarial patch attack based on QR code mask

TransAS-UNet： regional segmentation of breast cancer Swin Transformer and of UNet algorithm

Automatic capture for standard fetal cardiac four-chamber ultrasound view by fusing frame sequential relationships