Image-imperceptible backdoor attacks
- Vol. 28, Issue 3, Pages: 864-877(2023)
Published: 16 March 2023 ,
Accepted: 04 October 2022
DOI: 10.11834/jig.220550
移动端阅览
浏览全部资源
扫码关注微信
Published: 16 March 2023 ,
Accepted: 04 October 2022
移动端阅览
Shuwen Zhu, Ge Luo, Ping Wei, Sheng Li, Xinpeng Zhang, Zhenxing Qian. Image-imperceptible backdoor attacks. [J]. Journal of Image and Graphics 28(3):864-877(2023)
目的
2
图像后门攻击是一种经典的对抗性攻击形式,后门攻击使被攻击的深度模型在正常情况下表现良好,而当隐藏的后门被预设的触发器激活时就会出现恶性结果。现有的后门攻击开始转向为有毒样本分配干净标签或在有毒数据中隐蔽触发器以对抗人类检查,但这些方法在视觉监督下很难同时具备这两种安全特性,并且它们的触发器可以很容易地通过统计分析检测出来。因此,提出了一种隐蔽有效的图像后门攻击方法。
方法
2
首先通过信息隐藏技术隐蔽图像后门触发,使标签正确的中毒图像样本(标签不可感知性)和相应的干净图像样本看起来几乎相同(图像不可感知性)。其次,设计了一种全新的后门攻击范式,其中毒的源图像类别同时也是目标类。提出的后门攻击方法不仅视觉上是隐蔽的,同时能抵御经典的后门防御方法(统计不可感知性)。
结果
2
为了验证方法的有效性与隐蔽性,在ImageNet、MNIST、CIFAR-10数据集上与其他3种方法进行了对比实验。实验结果表明,在3个数据集上,原始干净样本分类准确率下降均不到1%,中毒样本分类准确率都超过94%,并具备最好的图像视觉效果。另外,验证了所提出的触发器临时注入的任意图像样本都可以发起有效的后门攻击。
结论
2
所提出的后门攻击方法具备干净标签、触发不可感知、统计不可感知等安全特性,更难被人类察觉和统计方法检测。
Objective
2
Backdoor attack-oriented adversarial attacks can yield deep model-attacked to play well in regular
whereas behaves maliciously in terms of triggers-predefined hidden backdoors-activated. But
deep models are vulnerable against multiple adversarial attacks. The aim of backdoor attacks is oriented to penetrate the predesigned backdoor triggers into the portion of the training data (e.g.
specific patterns like a square
noises
strips
or warpings). To guarantee attacking effectiveness
existing backdoor attacks are focused on assigning clean-label to the poisoned samples or hiding triggers in the poisoned data against human inspection. Nevertheless
it is still challenged to possess visual-supervised in-situ security features. To resolve this problem
we develop an imperceptible and effective backdoor attack method
which is imperceptible against human inspection
filters
and statistic detector.
Method
2
To generate poisoned samples
a smaller image as trigger can be embedded into image-profiling
which are mixed with clean samples as the final training data. Hiding the trigger naturally
the label-imperceptive poisoned sample is similar to the corresponding clean sample (image imperceptibility)
and it can also defend the most advanced statistical analysis (statistic imperceptibility) methods. We develop a one-to-oneself attack paradigm of those class-sourced for poisoning is oriented on the target class itself. Differentiated from the previous attack paradigms (all-to-one and all-to-all)
a portion of target class-derived images are selected as pre-poisoned samples. With the correct label corresponding to the target class
these images could be imperceptible in the presence of human inspection. However
the classical attack paradigms all-to-one and all-to-all are based on unmatched or error labels
and the target class cannot be sourced by itself. Human inspection-against input-label pairs-mislabeled (like bird-cat) might arouse definite suspicion
which can be used to reveal the attack. Following a filtering process
the rest of samples (most of them are clean) could invalidate the attack. We can launch a quick attack on pre-trained model in terms of same data-poisoned fine-tuning. Our accuracy-consistent backdoor attack illustrates that the imperceptibility can be originated from label
image
and statistic aspects.
Result
2
To verify the effectiveness and invisibility of proposed method
experiments are compared to 3 kind of popular methods on ImageNet
MNIST
and CIFAR-10 datasets. For one-to-oneself attack
it can confuse the high accuracies-poisoned model through poisoning a small proportion (7%) of original clean samples on ImageNet
MNIST
and CIFAR-10. Compared to the clean model on all three datasets
the backdoor is inactivated by the trigger when clean samples are tested. There is slight decrease of poisoned accuracy
which is less than 1%. It should be noted that the label of poisoned image is changed to the target label with some backdoor attack
mislabeled input-label pairs will be detected in practice easily. Hence
we did not modify the triggers-injected label of image
while every input-label pair in the training sets of some classical methods is correct-matched. For classical all-to-one attack
the proposed method could classify the same accuracy-based clean samples
and it have comparable attack success rates (more than 99%) when poisoned samples are tested. The trigger of BadNet beyond is invisible against human visual inspection. The trigger-embedded are imperceptible
and the poisoned image is natural and hard to be distinguished from the original clean image. We use learned perceptual image patch similarity(LPIPS)
peak signal-to-noise ratio(PSNR)
and structural similarity(SSIM) as the metrics for invisibility to quantify as well. Compared with the three methods
the mean distance between the poisoned images generated by ours and original images is almost zero with a near-zero LPIPS value. With the highest SSIM values as well on three datasets
our poisoned samples are more similar to their corresponding benign ones. Meanwhile
our attack achieves the highest PSNR values (more than 43 dB on ImageNet
MNIST
CIFAR-10). For MNIST
PSNR score can be optimized more and reached to 52.4 dB.
Conclusion
2
An imperceptible backdoor attack is proposed
where the poisoned image have its label-validated invisible trigger. Hidden-data based triggers are embedded in images invisibly. The poisoned images are similar to original clean ones in this way as well. The user can be imperceptive during the whole process and could not be aware of the abnormality
while other attackers cannot utilize the trigger. And
a new attack paradigm
one-to-oneself attack
is designed for clean-label backdoor attack. Specifically
the original label can keep in consistency when trigger-selected is used for poisoning the images. Behind the success of the new attack paradigm
most defenses will be invalid
which are based on the assumption that samples-poisoned may have a changed label. Finally
our backdoor attack proposed has its potentials to imperceptibility in relevant to label
image and statistic-contexts.
后门攻击隐蔽触发攻击范式干净标签统计隐蔽
backdoor attackimperceptible triggerattack paradigmclean labelstatistical imperceptibility
Baluja S. 2017. Hiding images in plain sight: deep steganography//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA: ACM: 2066-2076
Barni M, Kallas K and Tondi B. 2019. A new backdoor attack in CNNS by training set corruption without label poisoning//Proceedings of 2019 IEEE International Conference on Image Processing. Taipei, China: IEEE: 101-105 [DOI: 10.1109/ICIP.2019.8802997http://dx.doi.org/10.1109/ICIP.2019.8802997]
Chen X Y, Liu C, Li B, Lu K and Song D. 2017. Targeted backdoor attacks on deep learning systems using data poisoning [EB/OL]. [2022-05-28].https://arxiv.org/pdf/1712.05526.pdfhttps://arxiv.org/pdf/1712.05526.pdf
Cox I J, Miller M L, Bloom J A, Fridrich J and Kalker T. 2007. Digital Watermarking and Steganography. San Francisco: Morgan Kaufmann
Deng J, Dong W, Socher R, Li L J, Li K and Li F F. 2009. ImageNet: Alarge-scale hierarchical image database//Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami, USA: IEEE: 248-255 [DOI: 10.1109/CVPR.2009.5206848http://dx.doi.org/10.1109/CVPR.2009.5206848]
Gao Y S, Xu C E, Wang D R, Chen S P, Ranasinghe D C and Nepal S. 2019. STRIP: a defence against Trojan attacks on deep neural networks//Proceedings of the 35th Annual Computer Security Applications Conference. Puerto Rico, USA: ACM: 113-125 [DOI: 10.1145/3359789.3359790http://dx.doi.org/10.1145/3359789.3359790]
Gu T, Dolan-Gavitt B and Garg S. 2017. BadNets: identifying vulnerabilities in the machine learning model supply chain [EB/OL]. [2022-05-28].https://arxiv.org/pdf/1708.06733.pdfhttps://arxiv.org/pdf/1708.06733.pdf
Gu T Y, Liu K, Dolan-Gavitt B and Garg S. 2019. BadNets: evaluating backdooring attacks on deep neural networks. IEEE Access, 7: 47230-47244 [DOI: 10.1109/ACCESS.2019.2909068]
Gupta S, Goyal A and Bhushan B. 2012. Information hiding using least significant bit steganography and cryptography. International Journal of Modern Education and Computer Science, 4(6): 27-34 [DOI: 10.5815/ijmecs.2012.06.04]
He K M, Zhang X Y, Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 770-778 [DOI: 10.1109/CVPR.2016.90http://dx.doi.org/10.1109/CVPR.2016.90]
Huynh-Thu Q and Ghanbari M. 2008. Scope of validity of PSNR in image/video quality assessment. Electronics Letters, 44(13): 800-801 [DOI: 10.1049/el:20080522]
Kolouri S, Saha A, Pirsiavash H and Hoffmann H. 2020. Universal litmus patterns: revealing backdoor attacks in CNNs//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 301-310 [DOI: 10.1109/cvpr42600.2020.00038http://dx.doi.org/10.1109/cvpr42600.2020.00038]
Krizhevsky A. 2019. Learning multiple layers of features from tiny images [EB/OL]. [2022-05-28].http://www.cs.utoronto.ca/~kriz/learning-features-2009-TR.pdfhttp://www.cs.utoronto.ca/~kriz/learning-features-2009-TR.pdf
LeCun Y, Jackel L D, Bottou L, Cortes C, Denker J S, Drucker H, Guyon I, Muller U A, Sackinger E, Simard P and Vapnik V. 1995. Learning algorithms for classification: a comparison on handwritten digit recognition//Neural Networks: The Statistical Mechanics Perspective. Korea(South): World Scientific: 261-276
Li S F, Xue M H, Zhao B Z H, Zhu H J and Zhang X P. 2021a. Invisible backdoor attacks on deep neural networks via steganography and regularization. IEEE Transactions on Dependable and Secure Computing, 18(5): 2088-2105 [DOI: 10.1109/TDSC.2020.3021407]
Li Y Z, Li Y M, Wu B Y, Li L K, He R and Lyu S W. 2021b. Invisible backdoor attack with sample-specific triggers//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE: 16463-16472 [DOI: 10.1109/ICCV48922.2021.01615http://dx.doi.org/10.1109/ICCV48922.2021.01615]
Liu K, Dolan-Gavitt B and Garg S. 2018a. Fine-pruning: defending against backdooring attacks on deep neural networks//The 21st International Symposium on Research in Attacks, Intrusions, and Defenses. Heraklion, Greece: Springer: 273-294 [DOI: 10.1007/978-3-030-00470-5_13http://dx.doi.org/10.1007/978-3-030-00470-5_13]
Liu Y F, Ma X J, Bailey J and Lu F. 2020. Reflection backdoor: a natural backdoor attack on deep neural networks//Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer: 182-199 [DOI: 10.1007/978-3-030-58607-2_11http://dx.doi.org/10.1007/978-3-030-58607-2_11]
Liu Y Q, Lee W C, Tao G H, Ma S Q, Aafer Y and Zhang X Y. 2019. ABS: scanning neural networks for back-doors by artificial brain stimulation//Proceedings of 2019 ACM SIGSAC Conference on Computer and Communications Security. London, UK: ACM: 1265-1282 [DOI: 10.1145/3319535.3363216http://dx.doi.org/10.1145/3319535.3363216]
Liu Y Q, Ma S Q, Aafer Y, Lee W C, Zhai J, Wang W H and Zhang X Y. 2018b. Trojaning attack on neural networks//Proceedings of 2018 Network and Distributed System Security Symposium. San Diego, USA: Internet Society [DOI: 10.14722/ndss.2018.23291http://dx.doi.org/10.14722/ndss.2018.23291]
Nguyen A and Tran A. 2021. WaNet——imperceptible warping-based backdoor attack [EB/OL]. [2022-05-28].https://arxiv.org/pdf/2102.10369.pdfhttps://arxiv.org/pdf/2102.10369.pdf
Quiring E and Rieck K. 2020. Backdooring and poisoning neural networks with image-scaling attacks//Proceedings of 2020 IEEE Security and Privacy Workshops. San Francisco, USA: IEEE: 41-47 [DOI: 10.1109/SPW50608.2020.00024http://dx.doi.org/10.1109/SPW50608.2020.00024]
Saha A, Subramanya A and Pirsiavash H. 2020. Hidden trigger backdoor attacks//Proceedings of the 34th AAAI Conference on Artificial Intelligence. New York, USA: AAAI: 11957-11965 [DOI: 10.1609/aaai.v34i07.6871http://dx.doi.org/10.1609/aaai.v34i07.6871]
Shafahi A, Huang W R, Najibi M, Suciu O, Studer C, Dumitras T and Goldstein T. 2018. Poison frogs! targeted clean-label poisoning attacks on neural networks//Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montréal, Canada: Curran Associates Inc. : 6106-6116
Simonyan K and Zisserman A. 2014. Very deep convolutional networks for large-scale image recognition [EB/OL]. [2022-05-28].https://arxiv.org/pdf/1409.1556.pdfhttps://arxiv.org/pdf/1409.1556.pdf
Tran B, Li J and Madry A. 2018. Spectral signatures in backdoor attacks//Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montréal, Canada: Curran Associates Inc. : 8011-8021
Turner A, Tsipras D and Madry A. 2018. Clean-label backdoor attacks [EB/OL]. [2022-05-28].https://openreview.net/pdf?id=HJg6e2CcK7https://openreview.net/pdf?id=HJg6e2CcK7
Turner A, Tsipras D and Madry A. 2019. Label-consistent backdoor attacks [EB/OL]. [2022-05-28].https://arxiv.org/pdf/1912.02771/https://arxiv.org/pdf/1912.02771/
WangB L, Yao Y S, Shan S, Li H Y, Viswanath B, Zheng H T and Zhao B Y. 2019. Neural cleanse: identifying and mitigating backdoor attacks in neural networks//2019 IEEE Symposium on Security and Privacy. San Francisco, USA: IEEE: 707-723 [DOI: 10.1109/SP.2019.00031http://dx.doi.org/10.1109/SP.2019.00031]
Yao Y S, Li H Y, Zheng H T and Zhao B Y. 2019. Latent backdoor attacks on deep neural networks//Proceedings of 2019 ACM SIGSAC Conference on Computer and Communications Security. London, UK: ACM: 2041-2055 [DOI: 10.1145/3319535.3354209http://dx.doi.org/10.1145/3319535.3354209]
Zagoruyko S and Komodakis N. 2016. Wide residual networks [EB/OL]. [2022-05-28].https://arxiv.org/pdf/1605.07146.pdfhttps://arxiv.org/pdf/1605.07146.pdf
Zeng Y, Park W, Mao Z M and Jia R X. 2021. Rethinking the backdoor attacks' triggers: a frequency perspective//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE: 16473-16481 [DOI: 10.1109/iccv48922.2021.01616http://dx.doi.org/10.1109/iccv48922.2021.01616]
Zhang R, Isola P, Efros A A, Shechtman E and Wang O. 2018. The unreasonable effectiveness of deep features as a perceptual metric//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 586-595 [DOI: 10.1109/cvpr.2018.00068http://dx.doi.org/10.1109/cvpr.2018.00068]
Zhang X P, Wang S Z and Zhang K W. 2003. A novel LSB steganography scheme against statistical analysis. Journal of Image and Graphics, 8(9): 1055-1060
张新鹏, 王朔中, 张开文. 2003. 抗统计分析的LSB密写方案. 中国图象图形学报, 8(9): 1055-1060 [DOI: 10.3969/j.issn.1006-8961.2003.09.013]
Zhang Y, Luo X Y, Wang J W, Lu W, Yang C F and Liu F L. 2022. Research progress on digital image robust steganography. Journal of Image and Graphics, 27(1): 3-26
张祎, 罗向阳, 王金伟, 卢伟, 杨春芳, 刘粉林. 2022. 数字图像鲁棒隐写综述. 中国图象图形学报, 27(1): 3-26 [DOI: 10.11834/jig.210449]
Zhao P, Chen P Y, Das P, Ramamurthy K N and Lin X. 2020. Bridging mode connectivity in loss landscapes and adversarial robustness//Proceedings of 2020 International Conference on Learning Representations. Addis Ababa: ICLR: 1-25
相关文章
相关作者
相关机构