Image-imperceptible backdoor attacks

Shuwen Zhu; Ge Luo; Ping Wei; Sheng Li; Xinpeng Zhang; Zhenxing Qian

doi:10.11834/jig.220550

Model Security | Views : 0 下载量: 0 CSCD: 0

PDF
Export
Share
Collection
Album

Image-imperceptible backdoor attacks
Vol. 28, Issue 3, Pages: 864-877(2023)
Published： 16 March 2023 ，

Accepted： 04 October 2022
DOI： 10.11834/jig.220550
稿件说明：

移动端阅览

Shuwen Zhu, Ge Luo, Ping Wei, Sheng Li, Xinpeng Zhang, Zhenxing Qian. Image-imperceptible backdoor attacks. [J]. Journal of Image and Graphics 28(3):864-877(2023)
DOI：

Shuwen Zhu, Ge Luo, Ping Wei, Sheng Li, Xinpeng Zhang, Zhenxing Qian. Image-imperceptible backdoor attacks. [J]. Journal of Image and Graphics 28(3):864-877(2023) DOI： 10.11834/jig.220550.

摘要

目的

图像后门攻击是一种经典的对抗性攻击形式，后门攻击使被攻击的深度模型在正常情况下表现良好，而当隐藏的后门被预设的触发器激活时就会出现恶性结果。现有的后门攻击开始转向为有毒样本分配干净标签或在有毒数据中隐蔽触发器以对抗人类检查，但这些方法在视觉监督下很难同时具备这两种安全特性，并且它们的触发器可以很容易地通过统计分析检测出来。因此，提出了一种隐蔽有效的图像后门攻击方法。

方法

首先通过信息隐藏技术隐蔽图像后门触发，使标签正确的中毒图像样本(标签不可感知性)和相应的干净图像样本看起来几乎相同(图像不可感知性)。其次，设计了一种全新的后门攻击范式，其中毒的源图像类别同时也是目标类。提出的后门攻击方法不仅视觉上是隐蔽的，同时能抵御经典的后门防御方法(统计不可感知性)。

结果

为了验证方法的有效性与隐蔽性，在ImageNet、MNIST、CIFAR-10数据集上与其他3种方法进行了对比实验。实验结果表明，在3个数据集上，原始干净样本分类准确率下降均不到1%，中毒样本分类准确率都超过94%，并具备最好的图像视觉效果。另外，验证了所提出的触发器临时注入的任意图像样本都可以发起有效的后门攻击。

结论

所提出的后门攻击方法具备干净标签、触发不可感知、统计不可感知等安全特性，更难被人类察觉和统计方法检测。

Abstract

Objective

Backdoor attack-oriented adversarial attacks can yield deep model-attacked to play well in regular

whereas behaves maliciously in terms of triggers-predefined hidden backdoors-activated. But

deep models are vulnerable against multiple adversarial attacks. The aim of backdoor attacks is oriented to penetrate the predesigned backdoor triggers into the portion of the training data (e.g.

specific patterns like a square

noises

strips

or warpings). To guarantee attacking effectiveness

existing backdoor attacks are focused on assigning clean-label to the poisoned samples or hiding triggers in the poisoned data against human inspection. Nevertheless

it is still challenged to possess visual-supervised in-situ security features. To resolve this problem

we develop an imperceptible and effective backdoor attack method

which is imperceptible against human inspection

filters

and statistic detector.

Method

To generate poisoned samples

a smaller image as trigger can be embedded into image-profiling

which are mixed with clean samples as the final training data. Hiding the trigger naturally

the label-imperceptive poisoned sample is similar to the corresponding clean sample (image imperceptibility)

and it can also defend the most advanced statistical analysis (statistic imperceptibility) methods. We develop a one-to-oneself attack paradigm of those class-sourced for poisoning is oriented on the target class itself. Differentiated from the previous attack paradigms (all-to-one and all-to-all)

a portion of target class-derived images are selected as pre-poisoned samples. With the correct label corresponding to the target class

these images could be imperceptible in the presence of human inspection. However

the classical attack paradigms all-to-one and all-to-all are based on unmatched or error labels

and the target class cannot be sourced by itself. Human inspection-against input-label pairs-mislabeled (like bird-cat) might arouse definite suspicion

which can be used to reveal the attack. Following a filtering process

the rest of samples (most of them are clean) could invalidate the attack. We can launch a quick attack on pre-trained model in terms of same data-poisoned fine-tuning. Our accuracy-consistent backdoor attack illustrates that the imperceptibility can be originated from label

image

and statistic aspects.

Result

To verify the effectiveness and invisibility of proposed method

experiments are compared to 3 kind of popular methods on ImageNet

MNIST

and CIFAR-10 datasets. For one-to-oneself attack

it can confuse the high accuracies-poisoned model through poisoning a small proportion (7%) of original clean samples on ImageNet

MNIST

and CIFAR-10. Compared to the clean model on all three datasets

the backdoor is inactivated by the trigger when clean samples are tested. There is slight decrease of poisoned accuracy

which is less than 1%. It should be noted that the label of poisoned image is changed to the target label with some backdoor attack

mislabeled input-label pairs will be detected in practice easily. Hence

we did not modify the triggers-injected label of image

while every input-label pair in the training sets of some classical methods is correct-matched. For classical all-to-one attack

the proposed method could classify the same accuracy-based clean samples

and it have comparable attack success rates (more than 99%) when poisoned samples are tested. The trigger of BadNet beyond is invisible against human visual inspection. The trigger-embedded are imperceptible

and the poisoned image is natural and hard to be distinguished from the original clean image. We use learned perceptual image patch similarity(LPIPS)

peak signal-to-noise ratio(PSNR)

and structural similarity(SSIM) as the metrics for invisibility to quantify as well. Compared with the three methods

the mean distance between the poisoned images generated by ours and original images is almost zero with a near-zero LPIPS value. With the highest SSIM values as well on three datasets

our poisoned samples are more similar to their corresponding benign ones. Meanwhile

our attack achieves the highest PSNR values (more than 43 dB on ImageNet

MNIST

CIFAR-10). For MNIST

PSNR score can be optimized more and reached to 52.4 dB.

Conclusion

An imperceptible backdoor attack is proposed

where the poisoned image have its label-validated invisible trigger. Hidden-data based triggers are embedded in images invisibly. The poisoned images are similar to original clean ones in this way as well. The user can be imperceptive during the whole process and could not be aware of the abnormality

while other attackers cannot utilize the trigger. And

a new attack paradigm

one-to-oneself attack

is designed for clean-label backdoor attack. Specifically

the original label can keep in consistency when trigger-selected is used for poisoning the images. Behind the success of the new attack paradigm

most defenses will be invalid

which are based on the assumption that samples-poisoned may have a changed label. Finally

our backdoor attack proposed has its potentials to imperceptibility in relevant to label

image and statistic-contexts.

关键词

后门攻击隐蔽触发攻击范式干净标签统计隐蔽

Keywords

backdoor attackimperceptible triggerattack paradigmclean labelstatistical imperceptibility

references

Baluja S. 2017. Hiding images in plain sight: deep steganography//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA: ACM: 2066-2076

Barni M, Kallas K and Tondi B. 2019. A new backdoor attack in CNNS by training set corruption without label poisoning//Proceedings of 2019 IEEE International Conference on Image Processing. Taipei, China: IEEE: 101-105 [DOI: 10.1109/ICIP.2019.8802997http://dx.doi.org/10.1109/ICIP.2019.8802997]

Chen X Y, Liu C, Li B, Lu K and Song D. 2017. Targeted backdoor attacks on deep learning systems using data poisoning [EB/OL]. [2022-05-28].https://arxiv.org/pdf/1712.05526.pdfhttps://arxiv.org/pdf/1712.05526.pdf

Cox I J, Miller M L, Bloom J A, Fridrich J and Kalker T. 2007. Digital Watermarking and Steganography. San Francisco: Morgan Kaufmann

Deng J, Dong W, Socher R, Li L J, Li K and Li F F. 2009. ImageNet: Alarge-scale hierarchical image database//Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami, USA: IEEE: 248-255 [DOI: 10.1109/CVPR.2009.5206848http://dx.doi.org/10.1109/CVPR.2009.5206848]

Gao Y S, Xu C E, Wang D R, Chen S P, Ranasinghe D C and Nepal S. 2019. STRIP: a defence against Trojan attacks on deep neural networks//Proceedings of the 35th Annual Computer Security Applications Conference. Puerto Rico, USA: ACM: 113-125 [DOI: 10.1145/3359789.3359790http://dx.doi.org/10.1145/3359789.3359790]

Gu T, Dolan-Gavitt B and Garg S. 2017. BadNets: identifying vulnerabilities in the machine learning model supply chain [EB/OL]. [2022-05-28].https://arxiv.org/pdf/1708.06733.pdfhttps://arxiv.org/pdf/1708.06733.pdf

Gu T Y, Liu K, Dolan-Gavitt B and Garg S. 2019. BadNets: evaluating backdooring attacks on deep neural networks. IEEE Access, 7: 47230-47244 [DOI: 10.1109/ACCESS.2019.2909068]

Gupta S, Goyal A and Bhushan B. 2012. Information hiding using least significant bit steganography and cryptography. International Journal of Modern Education and Computer Science, 4(6): 27-34 [DOI: 10.5815/ijmecs.2012.06.04]

He K M, Zhang X Y, Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 770-778 [DOI: 10.1109/CVPR.2016.90http://dx.doi.org/10.1109/CVPR.2016.90]

Huynh-Thu Q and Ghanbari M. 2008. Scope of validity of PSNR in image/video quality assessment. Electronics Letters, 44(13): 800-801 [DOI: 10.1049/el:20080522]

Kolouri S, Saha A, Pirsiavash H and Hoffmann H. 2020. Universal litmus patterns: revealing backdoor attacks in CNNs//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 301-310 [DOI: 10.1109/cvpr42600.2020.00038http://dx.doi.org/10.1109/cvpr42600.2020.00038]

Krizhevsky A. 2019. Learning multiple layers of features from tiny images [EB/OL]. [2022-05-28].http://www.cs.utoronto.ca/~kriz/learning-features-2009-TR.pdfhttp://www.cs.utoronto.ca/~kriz/learning-features-2009-TR.pdf

LeCun Y, Jackel L D, Bottou L, Cortes C, Denker J S, Drucker H, Guyon I, Muller U A, Sackinger E, Simard P and Vapnik V. 1995. Learning algorithms for classification: a comparison on handwritten digit recognition//Neural Networks: The Statistical Mechanics Perspective. Korea(South): World Scientific: 261-276

Li S F, Xue M H, Zhao B Z H, Zhu H J and Zhang X P. 2021a. Invisible backdoor attacks on deep neural networks via steganography and regularization. IEEE Transactions on Dependable and Secure Computing, 18(5): 2088-2105 [DOI: 10.1109/TDSC.2020.3021407]

Li Y Z, Li Y M, Wu B Y, Li L K, He R and Lyu S W. 2021b. Invisible backdoor attack with sample-specific triggers//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE: 16463-16472 [DOI: 10.1109/ICCV48922.2021.01615http://dx.doi.org/10.1109/ICCV48922.2021.01615]

Liu K, Dolan-Gavitt B and Garg S. 2018a. Fine-pruning: defending against backdooring attacks on deep neural networks//The 21st International Symposium on Research in Attacks, Intrusions, and Defenses. Heraklion, Greece: Springer: 273-294 [DOI: 10.1007/978-3-030-00470-5_13http://dx.doi.org/10.1007/978-3-030-00470-5_13]

Liu Y F, Ma X J, Bailey J and Lu F. 2020. Reflection backdoor: a natural backdoor attack on deep neural networks//Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer: 182-199 [DOI: 10.1007/978-3-030-58607-2_11http://dx.doi.org/10.1007/978-3-030-58607-2_11]

Liu Y Q, Lee W C, Tao G H, Ma S Q, Aafer Y and Zhang X Y. 2019. ABS: scanning neural networks for back-doors by artificial brain stimulation//Proceedings of 2019 ACM SIGSAC Conference on Computer and Communications Security. London, UK: ACM: 1265-1282 [DOI: 10.1145/3319535.3363216http://dx.doi.org/10.1145/3319535.3363216]

Liu Y Q, Ma S Q, Aafer Y, Lee W C, Zhai J, Wang W H and Zhang X Y. 2018b. Trojaning attack on neural networks//Proceedings of 2018 Network and Distributed System Security Symposium. San Diego, USA: Internet Society [DOI: 10.14722/ndss.2018.23291http://dx.doi.org/10.14722/ndss.2018.23291]

Nguyen A and Tran A. 2021. WaNet——imperceptible warping-based backdoor attack [EB/OL]. [2022-05-28].https://arxiv.org/pdf/2102.10369.pdfhttps://arxiv.org/pdf/2102.10369.pdf

Quiring E and Rieck K. 2020. Backdooring and poisoning neural networks with image-scaling attacks//Proceedings of 2020 IEEE Security and Privacy Workshops. San Francisco, USA: IEEE: 41-47 [DOI: 10.1109/SPW50608.2020.00024http://dx.doi.org/10.1109/SPW50608.2020.00024]

Saha A, Subramanya A and Pirsiavash H. 2020. Hidden trigger backdoor attacks//Proceedings of the 34th AAAI Conference on Artificial Intelligence. New York, USA: AAAI: 11957-11965 [DOI: 10.1609/aaai.v34i07.6871http://dx.doi.org/10.1609/aaai.v34i07.6871]

Shafahi A, Huang W R, Najibi M, Suciu O, Studer C, Dumitras T and Goldstein T. 2018. Poison frogs! targeted clean-label poisoning attacks on neural networks//Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montréal, Canada: Curran Associates Inc. : 6106-6116

Simonyan K and Zisserman A. 2014. Very deep convolutional networks for large-scale image recognition [EB/OL]. [2022-05-28].https://arxiv.org/pdf/1409.1556.pdfhttps://arxiv.org/pdf/1409.1556.pdf

Tran B, Li J and Madry A. 2018. Spectral signatures in backdoor attacks//Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montréal, Canada: Curran Associates Inc. : 8011-8021

Turner A, Tsipras D and Madry A. 2018. Clean-label backdoor attacks [EB/OL]. [2022-05-28].https://openreview.net/pdf?id=HJg6e2CcK7https://openreview.net/pdf?id=HJg6e2CcK7

Turner A, Tsipras D and Madry A. 2019. Label-consistent backdoor attacks [EB/OL]. [2022-05-28].https://arxiv.org/pdf/1912.02771/https://arxiv.org/pdf/1912.02771/

WangB L, Yao Y S, Shan S, Li H Y, Viswanath B, Zheng H T and Zhao B Y. 2019. Neural cleanse: identifying and mitigating backdoor attacks in neural networks//2019 IEEE Symposium on Security and Privacy. San Francisco, USA: IEEE: 707-723 [DOI: 10.1109/SP.2019.00031http://dx.doi.org/10.1109/SP.2019.00031]

Yao Y S, Li H Y, Zheng H T and Zhao B Y. 2019. Latent backdoor attacks on deep neural networks//Proceedings of 2019 ACM SIGSAC Conference on Computer and Communications Security. London, UK: ACM: 2041-2055 [DOI: 10.1145/3319535.3354209http://dx.doi.org/10.1145/3319535.3354209]

Zagoruyko S and Komodakis N. 2016. Wide residual networks [EB/OL]. [2022-05-28].https://arxiv.org/pdf/1605.07146.pdfhttps://arxiv.org/pdf/1605.07146.pdf

Zeng Y, Park W, Mao Z M and Jia R X. 2021. Rethinking the backdoor attacks' triggers: a frequency perspective//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE: 16473-16481 [DOI: 10.1109/iccv48922.2021.01616http://dx.doi.org/10.1109/iccv48922.2021.01616]

Zhang R, Isola P, Efros A A, Shechtman E and Wang O. 2018. The unreasonable effectiveness of deep features as a perceptual metric//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 586-595 [DOI: 10.1109/cvpr.2018.00068http://dx.doi.org/10.1109/cvpr.2018.00068]

Zhang X P, Wang S Z and Zhang K W. 2003. A novel LSB steganography scheme against statistical analysis. Journal of Image and Graphics, 8(9): 1055-1060

张新鹏, 王朔中, 张开文. 2003. 抗统计分析的LSB密写方案. 中国图象图形学报, 8(9): 1055-1060 [DOI: 10.3969/j.issn.1006-8961.2003.09.013]

Zhang Y, Luo X Y, Wang J W, Lu W, Yang C F and Liu F L. 2022. Research progress on digital image robust steganography. Journal of Image and Graphics, 27(1): 3-26

张祎, 罗向阳, 王金伟, 卢伟, 杨春芳, 刘粉林. 2022. 数字图像鲁棒隐写综述. 中国图象图形学报, 27(1): 3-26 [DOI: 10.11834/jig.210449]

Zhao P, Chen P Y, Das P, Ramamurthy K N and Lin X. 2020. Bridging mode connectivity in loss landscapes and adversarial robustness//Proceedings of 2020 International Conference on Learning Representations. Addis Ababa: ICLR: 1-25

Alert me when the article has been cited

提交

暂无数据