多域特征混合增强对抗样本迁移性方法

万鹏; 胡聪; 吴小俊

发布时间： 2024-04-15
摘要点击次数： 403
全文下载次数： 134
DOI: :10.11834/jig.230895
| Volume | Number

多域特征混合增强对抗样本迁移性方法

万鹏, 胡聪, 吴小俊(江南大学人工智能与计算机学院)

摘要

目的对抗样本对深度神经网络（deep neural networks，DNNs）的安全性构成了重大威胁，此现象引起了广泛的关注。当前许多黑盒对抗攻击方法普遍存在一个问题：它们仅在空间域或频率域单一域中进行对抗攻击，生成的对抗样本无法充分利用目标模型在其他域中的潜在脆弱性，导致对抗样本的迁移性不佳。为此，本文提出一种多域特征混合增强对抗样本迁移性方法（multiple domain feature mixup，MDFM），以提高对抗样本在黑盒场景下的攻击成功率。方法使用离散余弦变换将图像从空间域变换到频率域，存储原始图像的清洁频率域特征。然后利用逆离散余弦变换将图像转换回空间域。之后利用替代模型提取图像的清洁空间域特征。在生成对抗样本的过程中，通过在频率域和空间域中进行特征混合，最终生成迁移性更好的对抗样本。结果在CIFAR-10和ImageNet数据集上进行了广泛实验，并对比了多种不同的攻击方法。在CIFAR-10数据集上，对不同模型的平均攻击成功率达到了89.8%。在ImageNet数据集上，分别使用ResNet-50和Inception-v3作为替代模型时，在不同的DNNs模型上的平均攻击成功率达到了75.9%和40.6%；当分别使用ResNet-50和adv-ResNet-50作为替代模型并在基于Transformer的模型上进行测试时，平均攻击成功率为32.3%和59.4%，超越了目前最先进的黑盒对抗攻击方法。结论多域特征混合攻击方法通过在空间域和频率域上进行特征混合，促使对抗样本利用多域中广泛的特征来克服清洁特征带来的干扰，从而提高对抗样本的迁移性。本文的代码可以在https://github.com/linghuchong111da/MDFM获取。

关键词

对抗样本频率域特征混合黑盒对抗攻击深度神经网络

Multi-domain feature mixup boosting adversarial examples transferability method

wan peng, hu cong, wu xiao jun(School of Artificial Intelligence and Computer Science, Jiangnan University)

Abstract

Objective Deep neural networks (DNNs) have witnessed widespread application across diverse domains, demonstrating remarkable performance, particularly in the realm of computer vision. However, adversarial examples pose a significant security threat to deep neural networks. Adversarial attacks are categorized into white-box and black-box attacks based on their access to the target model"s architecture and parameters. White-box attacks utilize techniques such as backpropagation to attain high attack success rates by leveraging knowledge about the target model. On the other hand, black-box attacks lack information about the target model and generate adversarial examples on an alternative model before launching attacks on the target model. Despite their alignment with real-world scenarios, black-box attacks generally exhibit lower success rates due to the limited knowledge about the target model. The existing adversarial attack methods typically only focus on perturbations in the spatial domain or solely concentrate on the influence of frequency information in images, neglecting the importance of the other domain. The spatial domain information and frequency domain information of images are both crucial for model recognition. Therefore, considering only one domain leads to insufficient generalization of the generated adversarial examples. This paper addresses the gap by introducing a novel black-box adversarial attack method: multi-domain feature mixup (MDFM). MDFM aims to enhance the transferability of adversarial examples by considering both spatial and frequency domains. Method In the initial iteration, we employ Discrete Cosine Transform (DCT) to convert the original images from the spatial domain to the frequency domain, storing the clean frequency domain features of the original images. Subsequently, we use the Inverse Discrete Cosine Transform (IDCT) to transform the images from the frequency domain back to the spatial domain. We then utilize an alternative model to extract the clean spatial domain features of the original images. In subsequent iterations, we transition the perturbed images from the spatial domain to the frequency domain. We rearrange the preserved clean features based on the images, enabling the mixing of the image with its own clean features or those of other images. The frequency domain features of the perturbed image and the clean frequency domain features are mixed. Random mixing ratios are applied within the corresponding channels of the image to introduce arbitrary variations influenced by clean frequency domain features, thus instigating diverse interference effects. The mixed features are then reconverted to the spatial domain, where they undergo further mixing with the clean spatial domain features during the alternative model processing. Shuffle and random channel mixing ratios are also implemented. Ultimately, adversarial examples are generated. Result We conduct extensive experiments on the CIFAR-10 and ImageNet datasets. On the CIFAR-10 dataset, we utilize ResNet-50 as the surrogate model to generate adversarial examples, testing them on VGG-16, ResNet-18, MobileNet-v2, Inception-v3, DenseNet-121, and ensemble models trained under four different defense configurations. Our proposed method is compared with advanced black-box adversarial attack methods such as VT, Admix, and clean feature mixup (CFM). The experimental results demonstrate that our approach achieves the highest attack success rates across all models, with an average attack success rate reaching 89.8%. Compared to the state-of-the-art CFM method, the average attack success rate improves by 0.5%. On the ImageNet dataset, we employ ResNet-50 and Inception-v3 as surrogate models, testing on VGG-16, ResNet-18, ResNet-50, DenseNet-121, Xception, MobileNet-v2, EfficientNet-B0, Inception ResNet-v2, Inception-v3, and Inception-v4, comprising ten target models. When ResNet-50 serves as the surrogate model, the experimental results indicate that the MDFM method attains the highest attack success rates on all target models, surpassing other attack methods. The average attack success rate of MDFM is 1.6% higher than that of the CFM method. In comparison to CFM, MDFM exhibits the most substantial improvement on the MobileNet-v2 model, with an increase of 3.6%. When Inception-v3 is employed as the surrogate model, MDFM consistently achieves the highest attack success rates on nine models, surpassing other methods. Compared to CFM, MDFM consistently outperforms CFM on all models. Across three target models, the attack success rate of MDFM demonstrates the maximum improvement of 2.5%. The average attack success rate of MDFM reaches 40.6%, surpassing the state-of-the-art CFM method by 1.4%. To further validate the effectiveness of the MDFM method, we conduct tests on adv-ResNet-50 and five Transformer-based models. We use ResNet-50 and adv-ResNet-50 as surrogate models, respectively. When ResNet-50 serves as the surrogate model, MDFM achieves the highest attack success rates on all five models, with an average improvement of 1.5% over the CFM method. The most significant improvement is observed on the Pit model, with an increase of 2.8% in the attack success rate. The average attack success rate of MDFM surpasses CFM by 1.5%. When adv-ResNet-50 is employed as the surrogate model, MDFM achieves an average attack success rate of 59.4%, surpassing any other methods. The ConVit model exhibits a 1.9% improvement over CFM, and the average attack success rate surpasses CFM by 0.8%. Conclusion This passage introduces a novel multi-domain feature mixture attack method (MDFM), specifically designed for adversarial attacks in black-box scenarios. MDFM mixes clean features across multiple domains, prompting adversarial examples to leverage a more diverse set of features to overcome interference caused by clean features. This results in the generation of more diverse adversarial examples, enhancing their transferability.

Keywords

Adversarial example frequency domain feature mixup black-box adversarial attack deep neural networks