结合扰动约束的低感知性对抗样本生成方法
A perturbation constraint related weak perceptual adversarial example generation method
- 2022年27卷第7期 页码:2287-2299
收稿:2020-11-24,
修回:2021-3-22,
录用:2021-3-29,
纸质出版:2022-07-16
DOI: 10.11834/jig.200681
移动端阅览

浏览全部资源
扫码关注微信
收稿:2020-11-24,
修回:2021-3-22,
录用:2021-3-29,
纸质出版:2022-07-16
移动端阅览
目的
2
对抗样本是指在原始数据中添加细微干扰使深度模型输出错误结果的合成数据。视觉感知性和攻击成功率是评价对抗样本的两个关键指标。当前大多数对抗样本研究侧重于提升算法的攻击成功率,对视觉感知性的关注较少。为此,本文提出了一种低感知性对抗样本生成算法,构造的对抗样本在保证较高攻击成功率的情况下具有更低的视觉感知性。
方法
2
提出在黑盒条件下通过约束对抗扰动的面积与空间分布以降低对抗样本视觉感知性的方法。利用卷积网络提取图像中对输出结果影响较大的关键区域作为约束,限定扰动的位置。之后结合带有自注意力机制的生成对抗网络在关键区域添加扰动,最终生成具有低感知性的对抗样本。
结果
2
在3种公开分类数据集上与多种典型攻击方法进行比较,包括7种白盒算法FGSM(fast gradient sign method)、BIM(basic iterative method)、DeepFool、PerC-C & W(perceptual color distance C & W)、JSMA(Jacobian-based saliency map attacks)、APGD(auto projected gradient descent)、AutoAttack和2种黑盒算法OnePixel、AdvGAN(adversarial generative adversarial network)。在攻击成功率(attack success rate,ASR)上,本文算法与对比算法处于同一水平。在客观视觉感知性对比中,本文算法较AdvGAN在低分辨率数据集上,均方误差(mean square error,MSE)值降低了42.1%,结构相似性值(structural similarity,SSIM)提升了8.4%;在中高分辨率数据集上,MSE值降低了72.7%,SSIM值提升了12.8%。与视觉感知性最好的对比算法DeepFool相比,在低分辨率数据集上,本文算法的MSE值降低了29.3%,SSIM值提升了0.8%。
结论
2
本文分析了当前算法在视觉感知性上存在的问题,提出了一种对抗样本生成方法,在攻击成功率近似的情况下显著降低了对抗样本的视觉感知性。
Objective
2
The adversarial example is a sort of deep neural model data that may lead to output error in relevant to added-perturbation for original image. Perturbation is one of the key factors in the process of adversarial example generation
which yields the model to generate output error with no distortion of original image or human vision perception. Based on the analysis mentioned above
the weak perception of vision and the attack success rate can be as the two essential factors to evaluate the adversarial example. The objective evaluation criteria of current algorithms for visual imperceptibility are relatively consistent: the three channels RGB images may generate better visual imperceptibility as the lower pixel value decreased. The objective evaluation criteria can just resist the range of the perturbation. But
the affected area and perturbation distribution is required to be involved in. Our method aims to illustrate an algorithm to enhance the weak perceptibility of the adversarial examples via the targeted area constraint and the perturbation distribution. Our algorithm design is carried out on the aspects as mentioned below: 1) the perturbation should be distributed in the same semantic region of the image as far as possible like the target area or background; 2) the distribution of the perturbation is necessary to be consistent with the image structure as much as possible; 3) the generation of invalid perturbation is required to reduce as much as possible.
Method
2
We demonstrate an algorithm to weaken the visual perceptibility of the adversarial paradigms via constrained area and distribution of the black-box conditioned perturbation
which is segmented into two steps: first
the critical regions of image are extracted by convolution network with attention mechanism. The critical region refers to the area that has great influence on the output of the model. The possibility of output error could be increased if the perturbation is melted. If the critical region meets the ideal value
adding perturbation to the region would result the output error of classification model. In order to train the convolution network used to extract the critical region
Gaussian noise is taken as the perturbation in first step
and the perturbation value is fixed on. The first perturbation step is added to the extracted critical area to generate the adversarial example. Then
the adversarial examples are transmitted to the discriminator and the classification model to be attacked each and obtain the loss calculation. In the second step
the weights of the extraction network are identified. The images are fed into the generator with self-attention mechanism and the extraction network to generate perturbation and the critical regions. The perturbation is multiplied by the critical region and melted with the image to generate adversarial examples. The losses are calculated while the generator is optimized after the adversarial examples are fed into the discriminator and the classification model that would be attacked. Moreover
the performance of the second steps perturbation should be better than or equal to the Gaussian noise used in first step
which sets a lower constraint for the success rate of the second step. In the first step of training
we would calculate the perception loss between the original image and the critical regions based on convolution network extraction. Global perception loss was first used in the image style transfer task to maintain the image structure information for the task overall
which can keep the consistency between the perturbation and the image structures to lower the visual perceptibility of the adversarial example.
Result
2
We compared our algorithm to 9 existed algorithms
including white-box algorithm and black-box algorithm based on three public datasets. The quantitative evaluation metrics contained the structure similarity (SSIM
higher is better)
the mean square error (MSE
less is better) and the attack success rate (ASR
higher is better). MSE is used to measure the intensity of perturbation
and SSIM evaluates the influence of perturbation on the image on the aspects of structured information. We also facilitate several adversarial examples generated by difference algorithms to compare the qualitative perceptibility. Our experiment illustrates that the attack success rate of the proposed method is similar to that of the existing methods on three consensus networks. The difference is less than 3% on the low-resolution dataset like CIFAR-10
and on the medium and high-resolution datasets like Tiny-ImageNet and ImageNet is less than 0.5%. Compared to fast gradient sign method(FGSM)
basic iterative method(BIM)
DeepFool
perceptual color distance C & W(PerC-C & W)
auto projected gradient descent(APGD)
AutoAttack and AdvGAN
our CIFAR-10 based MSE is lower by 45.1%
34.91%
29.3%
75.6%
69.0%
53.9% and 42.1%
respectively
and SSIM is higher by 11.7%
8%
0.8%
18.6%
7.73%
4.56%
8.4%
respectively. Compared to FGSM
BIM
PerC-C & W
APGD
AutoAttack and AdvGAN
the Tiny-ImageNet based MSE is lower by 69.7%
63.8%
71.6%
82.21%
79.09% and 72.7%
respectively
and SSIM is higher by 10.1%
8.5%
38.1%
5.08%
1.12% and 12.8%
respectively.
Conclusion
2
Our analysis is focused on the existing issues in the evaluation of the perceptibility of the current methods
and proposes a method to enhance the visual imperceptibility of the adversarial examples. The three datasets based results indicate that the attack success rate of our algorithm has its priorities of better visual imperceptibility in terms of qualitative and quantitative evaluation.
Carlini N and Wagner D. 2017. Towards evaluating the robustness of neural networks//2017 IEEE Symposium on Security and Privacy (SP). San Jose, USA: IEEE: 39-57[ DOI: 10.1109/SP.2017.49 http://dx.doi.org/10.1109/SP.2017.49 ]
Che Z H, Borji A, Zhai G T, Ling S Y, Li J and Le Callet P. 2019. A new ensemble adversarial attack powered by long-term gradient memories[EL/OB]. [2021-03-22] . https://arxiv.org/pdf/1911.07682.pdf https://arxiv.org/pdf/1911.07682.pdf
Croce F and Hein M. 2020. Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks[EB/OL]. [2021-03-22] . https://arxiv.org/pdf/2003.01690.pdf https://arxiv.org/pdf/2003.01690.pdf
Dong Y P, Liao F Z, Pang T Y, Su H, Zhu J, Hu X L and Li J G. 2018. Boosting adversarial attacks with momentum//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 9185-9193[ DOI: 10.1109/CVPR.2018.00957 http://dx.doi.org/10.1109/CVPR.2018.00957 ]
Goodfellow I J, Shlens J and Szegedy C. 2015. Explaining and harnessing adversarial examples[EB/OL]. [2021-03-22] . https://arxiv.org/pdf/1412.6572.pdf https://arxiv.org/pdf/1412.6572.pdf
Goodfellow I J, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A and Bengio Y. 2014. Generative adversarial networks[EB/OL]. [2021-03-22] . https://arxiv.org/pdf/1406.2661.pdf https://arxiv.org/pdf/1406.2661.pdf
Jandial S, Mangla P, Varshney S and Balasubramanian V. 2019. AdvGAN++: harnessing latent layers for adversary generation//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW). Seoul, Korea (South): IEEE: 2045-2048[ DOI: 10.1109/ICCVW.2019.00257 http://dx.doi.org/10.1109/ICCVW.2019.00257 ]
Kurakin A, Goodfellow I J and Bengio S. 2016. Adversarial examples in the physical world[EB/OL]. [2021-03-22] . https://arxiv.org/pdf/1607.02533v4.pdf https://arxiv.org/pdf/1607.02533v4.pdf
Liu Y P, Chen X Y, Liu C and Song D. 2017. Delving into transferable adversarial examples and black-box attacks[EB/OL]. [2021-03-22] . https://arxiv.org/pdf/1611.02770.pdf https://arxiv.org/pdf/1611.02770.pdf
Moosavi-Dezfooli S M, Fawzi A and Frossard P. 2016. DeepFool: a simple and accurate method to fool deep neural networks//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 2574-2582[ DOI: 10.1109/CVPR.2016.282 http://dx.doi.org/10.1109/CVPR.2016.282 ]
Pang T Y, Xu K, Du C, Chen N and Zhu J. 2019. Improving adversarial robustness via promoting ensemble diversity[EB/OL]. [2021-03-22] . https://arxiv.org/pdf/1901.08846.pdf https://arxiv.org/pdf/1901.08846.pdf
Papernot N, McDaniel P, Jha S, Fredrikson M, Celik Z B and Swami A. 2016. The limitations of deep learning in adversarial settings//2016 IEEE European Symposium on Security and Privacy. Saarbruecken, Germany: IEEE: 372-387[ DOI: 10.1109/EuroSP.2016.36 http://dx.doi.org/10.1109/EuroSP.2016.36 ]
Park J, Woo S, Lee J Y and Kweon I S. 2018. BAM: bottleneck attention module[EB/OL]. [2021-03-22] . https://arxiv.org/pdf/1807.06514.pdf https://arxiv.org/pdf/1807.06514.pdf
Phan H, Xie Y, Liao S Y, Chen J and Yuan B. 2020. CAG: a real-time low-cost enhanced-robustness high-transferability content-aware adversarial attack generator. Proceedings of the AAAI Conference on Artificial Intelligence, 34(4): 5412-5419[DOI: 10.1609/aaai.v34i04.5990]
Rony J, Hafemann L G, Oliveira L S, Ayed B I, Sabourin R and Granger E. 2019. Decoupling direction and norm for efficient gradient-based L2 adversarial attacks and defenses//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 4317-4325[ DOI: 10.1109/CVPR.2019.00445 http://dx.doi.org/10.1109/CVPR.2019.00445 ]
Selvaraj R R, Cogswell M, Das A, Vedantam R, Parikh D and Batra D. 2017. Grad-CAM: visual explanations from deep networks via gradient-based localization//Proceedings of 2017 IEEE International Conference on Computer Vision (ICCV). Venice, Italy: IEEE: 618-626[ DOI: 10.1109/ICCV.2017.74 http://dx.doi.org/10.1109/ICCV.2017.74 ]
Shi W, Caballero J, Huszárl F, Totz J, Aitken A P, Bishop R, Rueckert D and Wang Z H. 2016. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 1874-1883[ DOI: 10.1109/CVPR.2016.207 http://dx.doi.org/10.1109/CVPR.2016.207 ]
Shi Y C, Wang S Y and Han Y H. 2019. Curls and whey: boosting black-box adversarial attacks//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 6512-6520[ DOI: 10.1109/CVPR.2019.00668 http://dx.doi.org/10.1109/CVPR.2019.00668 ]
Su J W, Vargas D V and Sakurai K. 2019. One pixel attack for fooling deep neural networks. IEEE Transactions on Evolutionary Computation, 23(5): 828-841[DOI: 10.1109/TEVC.2019.2890858]
Xiang S K, Cao T Y, Fang Z and Hong S Z. 2020. Dense weak attention model for salient object detection. Journal of Image and Graphics, 25(1): 136-147
项圣凯, 曹铁勇, 方正, 洪施展. 2020. 使用密集弱注意力机制的图像显著性检测. 中国图象图形学报, 25(1): 136-147 [DOI: 10.11834/jig.190187]
Xiao C W, Li B, Zhu J Y, He W, Liu M Y and Song D. 2018. Generating adversarial examples with adversarial networks//Proceedings of the 27th International Joint Conference on Artificial Intelligence. Stockholm, Sweden: AAAI Press: 3905-3911[ DOI: 10.24963/ijcai.2018/543 http://dx.doi.org/10.24963/ijcai.2018/543 ]
Xie C H, Zhang Z S, Zhou Y Y, Bai S, Wang J Y, Ren Z and Yuille A L Alan Y. 2019. Improving transferability of adversarial examples with input diversity//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 2725-2734[ DOI: 10.1109/CVPR.2019.00284 http://dx.doi.org/10.1109/CVPR.2019.00284 ]
Yang J, Li W J, Wang R G and Xue L X. 2019. Generative adversarial network for image super-resolution combining perceptual loss. Journal of Image and Graphics, 24(8): 1270-1282
杨娟, 李文静, 汪荣贵, 薛丽霞. 2019. 融合感知损失的生成式对抗超分辨率算法. 中国图象图形学报, 24(8): 1270-1282 [DOI: 10.11834/jig.180613]
Zhao Z Y, Liu Z R and Larson M. 2020. Towards large yet imperceptible adversarial image perturbations with perceptual color distance//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 1036-1045[ DOI: 10.1109/CVPR42600.2020.00112 http://dx.doi.org/10.1109/CVPR42600.2020.00112 ]
Zhou M Y, Wu J, Liu Y P, Liu S C and Zhu C. 2020. DaST: data-free substitute training for adversarial attacks//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 231-240[ DOI: 10.1109/CVPR42600.2020.00031 http://dx.doi.org/10.1109/CVPR42600.2020.00031 ]
相关作者
相关机构
京公网安备11010802024621