基于真实数据感知的模型功能窃取攻击

李延铭; 李长升; 余佳奇; 袁野; 王国仁

doi:10.11834/jig.211265

多媒体智能安全 | 浏览量 : 0 下载量: 180 CSCD: 0

PDF
导出
分享
收藏
专辑

基于真实数据感知的模型功能窃取攻击
Model functionality stealing attacks based on real data awareness
2022年27卷第9期页码：2721-2732
收稿：2022-01-11，

修回：2022-4-28，

录用：2022-5-5，

纸质出版：2022-09-16
DOI： 10.11834/jig.211265
稿件说明：

移动端阅览

李延铭, 李长升, 余佳奇, 袁野, 王国仁. 基于真实数据感知的模型功能窃取攻击[J]. 中国图象图形学报, 2022,27(9):2721-2732. DOI： 10.11834/jig.211265.

Yanming Li, Changsheng Li, Jiaqi Yu, Ye Yuan, Guoren Wang. Model functionality stealing attacks based on real data awareness[J]. Journal of Image and Graphics, 2022, 27(9): 2721-2732. DOI： 10.11834/jig.211265.

摘要

目的

模型功能窃取攻击是人工智能安全领域的核心问题之一，目的是利用有限的与目标模型有关的信息训练出性能接近的克隆模型，从而实现模型的功能窃取。针对此类问题，一类经典的工作是基于生成模型的方法，这类方法利用生成器生成的图像作为查询数据，在同一查询数据下对两个模型预测结果的一致性进行约束，从而进行模型学习。然而此类方法生成器生成的数据常常是人眼不可辨识的图像，不含有任何语义信息，导致目标模型的输出缺乏有效指导性。针对上述问题，提出一种新的模型窃取攻击方法，实现对图像分类器的有效功能窃取。

方法

借助真实的图像数据，利用生成对抗网络(generative adversarial net，GAN)使生成器生成的数据接近真实图像，加强目标模型输出的物理意义。同时，为了提高克隆模型的性能，基于对比学习的思想，提出一种新的损失函数进行网络优化学习。

结果

在两个公开数据集CIFAR-10(Canadian Institute for Advanced Research-10)和SVHN(street view house numbers)的实验结果表明，本文方法能够取得良好的功能窃取效果。在CIFAR-10数据集上，相比目前较先进的方法，本文方法的窃取精度提高了5%。同时，在相同的查询代价下，本文方法能够取得更好的窃取效果，有效降低了查询目标模型的成本。

结论

本文提出的模型窃取攻击方法，从数据真实性的角度出发，有效提高了针对图像分类器的模型功能窃取攻击效果，在一定程度上降低了查询目标模型代价。

Abstract

Objective

Current model stealing attack issue is a sub-field in artificial intelligence (AI) security. It tends to steal privacy information of the target model including its structures

parameters and functionality. Our research is focused on the model functionality stealing attacks. We target a deep learning based multi-classifier model and train a clone model to replicate the functionality of the black-box target classifier. Currently

most of stealing-functionality-attacks are oriented on querying data. These methods replicate the black-box target classifier by analyzing the querying data and the response from the target model. The kind of attacks based on generative models is popular and these methods have obtained promising results in functionality stealing. However

there are two main challenges to be faced as mentioned below: first

target image classifiers are trained on real images in common. Since these methods do not use ground truth data to supervise the training phase of generative models

the generated images are distorted to noise images rather than real images. In other words

the image data used by these methods is with few sematic information

leading to that the prediction of target model is with few effective guidance for the training of the clone model. Such images restrict the effect of training the clone model. Second

to train the generative model

it is necessary to initiate multiple queries to the target classifier. A severe burden is bear on query budgets. Since the target model is a black-box model

we need to use its approximated gradient to obtain generator via zero-gradient estimation. Hence

the generator cannot obtain accurate gradient information for updating itself.

Method

We try to utilize the generative adversarial nets (GAN) and the contrastive learning to steal target classifier functionality. The key aspect of our research is on the basis of the GAN-based prior information extraction of ground truth images on public datasets

aiming to make the prediction from the target classifier model be with effective guidance for the training of the clone model. To make the generated images more realistic

the public datasets are introduced to supervise the training of the generator. To enhance the effectiveness of generative models

we adopt deep convolutional GAN(DCGAN) as the backbone

where the generator and discriminator are composed of convolutional layers both with non-linear activation functions. To update the generator

we illustrate the target model derived gradient information via zero-order gradient evaluation for the training of clone model. Simultaneously

we leverage the public dataset to guide the training of the GAN

aiming to make the generator obtain the information of ground truth images. In other words

the public dataset plays a role as a regularization term. Its application constrains the solution space for the generator. In this way

the generator can produce approximated ground truth images to make the prediction of the target model produce more physical information for manipulating the clone model training. To reduce the query budgets

we pre-train the GAN on public datasets to make it obtain prior information of real images before training the clone model. Our method can make the generator learn better for the training need of clone model in comparison with previous approaches of the random-initialized generator training. To expand the objective function of training clone model

we introduce contrastive learning to the model stealing attacks area. Traditional model functionality stealing attack methods train the clone model only by maximizing the similarity of predictions from two models to one image. Here

we use the contrastive learning manner to consider the diversity of predictions from two models to different images. The positive pair consists of the predictions from two models to one image and the negative pair is made up with the predictions from two models to two different images. To measure the diversity of two predictions

we attempt to use cosine similarity to represent the similarity of two predictions. Then

we use the InfoNCE loss function to achieve the similarity maximization of positive pairs and diversity maximization of negative pairs at the same time.

Result

To demonstrate the performances of our methods

we carry out model functionality stealing attacks on two different black-box target classifiers. The two classifiers of Canadian Insititute for Advanced Research-10(CIFAR-10) and street view house numbers(SVHN) are presented. Each of model structure is based on ResNet-34 and the structures of clone models are based on resnet-18 both. The used public datasets are not be overlapped with the training datasets of target classifiers. We test them on CIFAR-10 and SVHN test datasets following our trained clone models. The accuracy results of these clone models are 92.3% and 91.8%of each. Normalized clone accuracy is achieved 0.97 × and 0.98 × of each. Specially

our result can achieve 5% improvements for the CIFAR-10 target model in terms of normalized clone accuracy over the data-free model extraction(DFME). Our method achieves promising results for reducing querying budgets as well. To make the accuracy of clone model reach 85% on the CIFAR-10 test datasets

DFME is required to spend 8.6 M budgets. But

our method spends 5.8 M budgets only

which is 2.8 M smaller than DFME. Our method is required to spend 9.4 M budgets for reaching the 90% accuracy

which is not half enough to the DFME of 20 M budgets. These results demonstrate that our method improve the performances of functionality stealing attack methods based on generative models. It is beneficial for reducing the query budgets as well.

Conclusion

We propose a novel model functionality stealing attack method

which trains the clone model guided by prior information of ground truth images and the contrastive learning manner. The experimental results show that our optimized model has its potentials and the querying budgets can be reduced effectively.

关键词

Keywords

references

Brown T B, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A, Agarwal S, Herbert-Voss A, Krueger G, Henighan T, Child R, Ramesh A, Ziegler D M, Wu J, Winter C, Hesse C, Chen M, Sigler E, Litwin M, Gray S, Chess B, Clark J, Berner C, McCandlish S, Radford A, Sutskever I and Amodei D. 2020. Language models are few-shot learners [EB/OL ] . [2020-05-28 ] . https://arxiv.org/pdf/2005.14165.pdf https://arxiv.org/pdf/2005.14165.pdf

Chen P Y, Zhang H, Sharma Y, Yi J F and Hsieh C J. 2017. ZOO: zeroth order optimization based black-box attacks to deep neural networks without training substitute models//Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security. Dallas, USA: Association for Computing Machinery: 15-26 [ DOI: 10.1145/3128572.3140448 http://dx.doi.org/10.1145/3128572.3140448 ]

Chen T, Kornblith S, Norouzi M and Hinton G. 2020. A simple framework for contrastive learning of visual representations//Proceedings of the 37th International Conference on Machine Learning. [s. l.]: JMLR. org: 1597-1607

Correia-Silva J R, Berriel R F, Badue C, de Souza A F and Oliveira-Santos T. 2018. Copycat CNN: stealing knowledge by persuading confession with random non-labeled data//Proceedings of 2018 International Joint Conference on Neural Networks (IJCNN). Rio de Janeiro, Brazil: IEEE: 1-8 [ DOI: 10.1109/IJCNN.2018.8489592 http://dx.doi.org/10.1109/IJCNN.2018.8489592 ]

Deng J, Dong W, Socher R, Li L J, Li K and Li F F. 2009. ImageNet: a large-scale hierarchical image database//Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami, USA: IEEE: 248-255 [ DOI: 10.1109/CVPR.2009.5206848 http://dx.doi.org/10.1109/CVPR.2009.5206848 ]

Duchi J C, Jordan M I, Wainwright M J and Wibisono A. 2012. Finite sample convergence rates of zero-order stochastic optimization methods//Proceedings of the 25th International Conference on Neural Information Processing Systems. Lake Tahoe, USA: Curran Associates Inc. : 1439-1447

Fang G F, Song J, Shen C C, Wang X C, Chen D and Song M L. 2020. Data-free adversarial distillation [EB/OL ] . [2020-05-02 ] . https://arxiv.org/pdf/1912.11006.pdf https://arxiv.org/pdf/1912.11006.pdf

Li F F, Fergus R and Perona P. 2004. Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories//Proceedings of 2004 Conference on Computer Vision and Pattern Recognition Workshop. Washington, USA: IEEE: #178 [ DOI: 10.1109/CVPR.2004.383 http://dx.doi.org/10.1109/CVPR.2004.383 ]

Gong X L, Chen Y J, Yang W B, Mei G H and Wang Q.2021. InverseNet: augmenting model extraction attacks with training data inversion//Proceedings of the 13th International Joint Conference on Artificial Intelligence. Montreal, Canada: IJCAI. org: 2439-2447 [ DOI: 10.24963/ijcai.2021/336 http://dx.doi.org/10.24963/ijcai.2021/336 ]

Goodfellow I J, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A and Bengio Y. 2014. Generative adversarial nets//Proceedings of the 27th International Conference on Neural Information Processing Systems. Montreal, Canada: MIT Press: 2672-2680

Hadsell R, Chopra S and LeCun Y. 2006. Dimensionality reduction by learning an invariant mapping//Proceedings of 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. New York, USA: IEEE: 1735-1742 [ DOI: 10.1109/CVPR.2006.100 http://dx.doi.org/10.1109/CVPR.2006.100 ]

He K M, Fan H Q, Wu Y X, Xie S N and Girshick R. 2020. Momentum contrast for unsupervised visual representation learning//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 9726-9735 [ DOI: 10.1109/CVPR42600.2020.00975 http://dx.doi.org/10.1109/CVPR42600.2020.00975 ]

He K M, Zhang X Y, Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 770-778 [ DOI: 10.1109/CVPR.2016.90 http://dx.doi.org/10.1109/CVPR.2016.90 ]

He P S, Li W C, Zhang J Y, Wang H X and Jiang X H. 2022. Overview of passive forensics and anti-forensics techniques for GAN-generated image. Journal of Image and Graphics, 27(1): 88-110

何沛松, 李伟创, 张婧媛, 王宏霞, 蒋兴浩. 2022. 面向GAN生成图像的被动取证及反取证技术综述. 中国图象图形学报, 27(1): 88-110 [DOI: 10.11834/jig.210430]

Hinton G, Vinyals O and Dean J. 2015. Distilling the knowledge in a neural network [EB/OL ] . [2020-05-09 ] . https://arxiv.org/pdf/1503.02531.pdf https://arxiv.org/pdf/1503.02531.pdf

Kariyappa S, Prakash A and Qureshi M K. 2021. MAZ E: data-free model stealing attack using zeroth-order gradient estimation//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 13809-13818 [ DOI: 10.1109/CVPR46437.2021.01360 http://dx.doi.org/10.1109/CVPR46437.2021.01360 ]

Karras T, Laine S and Aila T. 2019. A style-based generator architecture for generative adversarial networks//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 4396-4405 [ DOI: 10.1109/CVPR.2019.00453 http://dx.doi.org/10.1109/CVPR.2019.00453 ]

Krizhevsky A. 2009. Learning Multiple Layers of Features from Tiny Images. Toronto: University of Toronto

Li Y and Song P H. 2022. Review of transfer learning in medical image classification. Journal of Image and Graphics, 27(3): 672-686

黎英, 宋佩华. 2022. 迁移学习在医学图像分类中的研究进展. 中国图象图形学报, 27(3): 672-686 [DOI: 10.11834/jig.210814]

Netzer Y, Wang T, Coates A, Bissacco A, Wu B and Ng A Y. 2011. Reading digits in natural images with unsupervised feature learning. Nips Workshop on Deep Learning&Unsupervised Feature Learning (2011): 1-9

Oh S J, Schiele B and Fritz M. 2019. Towards reverse-engineering black-box neural networks//Explainable AI: Interpreting, Explaining and Visualizing Deep Learning. Cham: Springer: 121-144 [ DOI: 10.1007/978-3-030-28954-6_7 http://dx.doi.org/10.1007/978-3-030-28954-6_7 ]

Orekondy T, Schiele B and Fritz M. 2019. Knockoff nets: stealing functionality of black-box models//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 4949-4958 [ DOI: 10.1109/CVPR.2019.00509 http://dx.doi.org/10.1109/CVPR.2019.00509 ]

Papernot N, McDaniel P, Goodfellow I, Jha S, Celik Z B and Swami A. 2017. Practical black-box attacks against machine learning//Proceedings of 2017 ACM on Asia Conference on Computer and Communications Security. Abu Dhabi, United Arab Emirates: Association for Computing Machinery: 506-519 [ DOI: 10.1145/3052973.3053009 http://dx.doi.org/10.1145/3052973.3053009 ]

Quattoni A and Torralba A. 2009. Recognizing indoorscenes//Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami, USA: IEEE: 413-420 [ DOI: 10.1109/CVPR.2009.5206537 http://dx.doi.org/10.1109/CVPR.2009.5206537 ]

Radford A, Metz L and Chintala S. 2016. Unsupervised representation learning with deep convolutional generative adversarial networks [EB/OL ] . [2020-01-07 ] . https://arxiv.org/pdf/1511.06434.pdf https://arxiv.org/pdf/1511.06434.pdf

Strubell E, Ganesh A and McCallum A. 2019. Energy and policy considerations for deep learning in NLP [EB/OL ] . [2020-06-05 ] . https://arxiv.org/pdf/1906.02243.pdf https://arxiv.org/pdf/1906.02243.pdf

Tramèr F, Zhang F, Juels A, Reiter M K and Ristenpart T. 2016. Stealing machine learning models via prediction APIS//The 25th USENIX Conference on Security Symposium. Austin, USA: USENIX Association: 601-618

Truong J B, Maini P, Walls R J and Papernot N. 2021. Data-free model extraction//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 4769-4778 [ DOI: 10.1109/CVPR46437.2021.00474 http://dx.doi.org/10.1109/CVPR46437.2021.00474 ]

van den Oord A, Li Y Z and Vinyals O. 2019. Representation learning with contrastive predictive coding [EB/OL ] . [2020-06-22 ] . https://arxiv.org/pdf/1807.03748.pdf https://arxiv.org/pdf/1807.03748.pdf

Wang B H and Gong N Z. 2018. Stealing hyperparameters in machine learning//2018 IEEE Symposium on Security and Privacy (SP). San Francisco, USA: IEEE: 36-52 [ DOI: 10.1109/SP.2018.00038 http://dx.doi.org/10.1109/SP.2018.00038 ]

Xiao H, Rasul K and Vollgraf R. 2017. Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms [EB/OL ] . [2020-09-15 ] . https://arxiv.org/pdf/1708.07747.pdf https://arxiv.org/pdf/1708.07747.pdf