知识蒸馏方法研究与应用综述

司兆峰; 齐洪钢

doi:10.11834/jig.220273

综述 | 浏览量 : 0 下载量: 0 CSCD: 1

PDF
导出
分享
收藏
专辑

知识蒸馏方法研究与应用综述
Survey on knowledge distillation and its application
2023年28卷第9期页码：2817-2832
纸质出版日期： 2023-09-16 ，
DOI： 10.11834/jig.220273
稿件说明：

移动端阅览

司兆峰，齐洪钢. 2023. 知识蒸馏方法研究与应用综述. 中国图象图形学报， 28(09):2817-2832

Si Zhaofeng， Qi Honggang. 2023. Survey on knowledge distillation and its application. Journal of Image and Graphics， 28(09):2817-2832
司兆峰，齐洪钢. 2023. 知识蒸馏方法研究与应用综述. 中国图象图形学报， 28(09):2817-2832 DOI： 10.11834/jig.220273.

Si Zhaofeng， Qi Honggang. 2023. Survey on knowledge distillation and its application. Journal of Image and Graphics， 28(09):2817-2832 DOI： 10.11834/jig.220273.

摘要

随着深度学习方法的不断发展，其存储代价和计算代价也不断增长，在资源受限的平台上，这种情况给其应用带来了挑战。为了应对这种挑战，研究者提出了一系列神经网络压缩方法，其中知识蒸馏是一种简单而有效的方法，成为研究热点之一。知识蒸馏的特点在于它采用了“教师—学生”架构，使用一个大型网络指导小型网络进行训练，以提升小型网络在应用场景下的性能，从而间接达到网络压缩的目的。同时，知识蒸馏具有不改变网络结构的特性，从而具有较好的可扩展性。本文首先介绍知识蒸馏的由来以及发展，随后根据方法优化的目标将知识蒸馏的改进方法分为两大类，即面向网络性能的知识蒸馏和面向网络压缩的知识蒸馏，并对经典方法和最新方法进行系统的分析和总结，最后列举知识蒸馏方法的几种典型应用场景，以便加深对各类知识蒸馏方法原理及其应用的理解。知识蒸馏方法发展至今虽然已经取得较好的效果，但是各类知识蒸馏方法仍然有不足之处，本文也对不同知识蒸馏方法的缺陷进行了总结，并根据网络性能和网络压缩两个方面的分析，给出对知识蒸馏研究的总结和展望。

Abstract

Deep learning is an effective method in various tasks， including image classification， object detection， and semantic segmentation. Various architectures of deep neural networks （DNNs）， such as Visual Geometry Group network（VGGNet）， residual network（ResNet）， and GoogLeNet， have been proposed recently. All of which have high computational costs and storage costs. The effectiveness of DNNs mainly comes from their high capacity and architectural complexity， which allow them to learn sufficient knowledge from datasets and generalize well to real-world scenes. However， high capacity and architectural complexity can also result in a drastic increase in storage and computational costs， thereby complicating the implementation of deep learning methods on devices with limited resources. Given the increasing demand for deep learning methods on portable devices， such as mobile phones， the cost of DNNs must be urgently reduced. Researchers have developed a series of methods called model compression to solve the aforementioned problem. These methods can be divided into four main categories： network pruning， weight quantization， weight decomposition， and knowledge distillation. Knowledge distillation is a comparably new method first introduced in 2014. It attempts to transfer the knowledge learned by a cumbersome network （teacher network） to a lightweight network （student network）， thereby allowing the student network to perform similarly to the teacher network. Thus， compression can be achieved by using the student network for inference. Traditional knowledge distillation works by providing softened labels to the student network as the training target instead of allowing the student network to learn ground truth directly. The student network can learn about the correlation among classes in the classification problem by learning from softened labels. This approach can be taken as extra supervision while training. The student network trained by knowledge distillation should ideally approximate the performance of the teacher network. In this way， the computational and storage costs in the compressed network are reduced with minor degradation compared with those in the uncompressed network. However， this situation is almost unreachable when the compression rate is large enough to be comparable with the compression rates of other model compression methods. On the contrary， knowledge distillation can be taken as a measure of enhancing the performance of a deep learning model. Thus， this model can perform better than other models of similar size. Moreover， knowledge distillation is a method of model compression. In this study， we aim to review the knowledge distillation methods developed in recent years from a new perspective. We sort the existing methods according to their target by dividing them into performance-oriented methods and compression-oriented methods. Performance-oriented methods emphasize the improvement of the performance of the student network， whereas compression-oriented methods focus on the relationship between the size of the student network and its performance. We further divide these two categories into specific ideas. In performance-oriented methods， we describe state-of-the-art methods in two aspects： the representation of knowledge and ways of learning knowledge. The representation of knowledge has been widely studied in recent years. The researchers attempt to derive knowledge from the teacher network instead of outputting vectors to enrich the knowledge while training. Other forms of knowledge include a middle-layer feature map， representation extracted from the middle layer， and structural knowledge. The student network can learn about the teacher network’s behavior while forward propagating by combining this extra knowledge with the soft target in traditional knowledge distillation. Thus， the student network acts similarly to the teacher network. Studies on the way of learning knowledge attempt to explore distillation architectures on the basis of the teacher-student architecture. Moreover， architectures includingonline distillation， self-distillation， multiteacher distillation， progressive knowledge distillation， and generative adversarial network （GAN）-based knowledge distillation are proposed. These architectures focus on the effectiveness of distillation and different use cases. For example， online distillation and self-distillation can be applied when the teacher network with high capacity is unavailable. In compression-oriented knowledge distillation， researchers try to combine neural architecture search （NAS） methods with knowledge distillation to balance the relationship between the performance and the size of the student network. Many studies on the impact of the size difference between the teacher network and the student network on distillation performance are also available. They concluded that a wide gap between the teacher and the student can cause performance degradation. Then， bridging the gap between the teacher and the student with several middle-sized networks was proposed in these studies. We also formalize different kinds of knowledge distillation methods. The corresponding figures are shown uniformly to help researchers understand the basic ideas comprehensively and learn about recent works on knowledge distillation. One of the most notable characteristics of knowledge distillation is that the architectures of the teacher network and the student network stay intact during training. Thus， other methods for different tasks can be incorporated easily. In this study， we introduce recent works on different knowledge distillation tasks， including object detection， face recognition， and natural language processing. Finally， we summarize the knowledge distillation methods mentioned before and propose several possible ideas. Recent research on knowledge distillation has mainly focused on enhancing the performance of the student network. The major problem of the student network lies in finding a feasible source of knowledge from the teacher network. Moreover， compression-based knowledge distillation suffers from the problem of searching space when NAS adjusts network architecture. On the basis of the analysis above， we propose three possible ideas for researchers to study： 1） obtaining knowledge from various tasks and architectures in the form of knowledge distillation， 2） developing a searching space for NAS when combined with knowledge distillation and adjusting the teacher network while searching for the student network， and 3） developing a metric for knowledge distillation and other model compression methods to evaluate both task performance and compression performance.

关键词

知识蒸馏深度学习计算机视觉神经网络模型压缩

Keywords

knowledge distillationdeep learningcomputer visionneural networkmodel compression

references

Asif U， Tang J B and Harrer S. 2020. Ensemble knowledge distillation for learning improved and efficient networks//Proceedings of the 24th European Conference on Artificial Intelligence. Santiago de Compostela， Spain： IOS Press： 953-960 ［DOI： 10.3233/FAIA200188http://dx.doi.org/10.3233/FAIA200188］

Chen D， Mei J P， Wang C， Feng Y and Chen C. 2020a. Online knowledge distillation with diverse peers. Proceedings of the AAAI Conference on Artificial Intelligence， 34（4）： 3430-3437 ［DOI： 10.1609/aaai.v34i04.5746http://dx.doi.org/10.1609/aaai.v34i04.5746］

Chen G B， Choi W， Yu X， Han T and Chandraker M. 2017. Learning efficient object detection models with knowledge distillation//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach， USA： Curran Associates Inc.： 742-751

Chen H T， Wang Y H， Shu H， Wen C Y， Xu C J， Shi B X， Xu C and Xu C. 2020b. Distilling portable generative adversarial networks for image translation. Proceedings of the AAAI Conference on Artificial Intelligence， 34（4）： 3585-3592 ［DOI： 10.1609/AAAI.V34I04.5765http://dx.doi.org/10.1609/AAAI.V34I04.5765］

Chen P G， Liu S， Zhao H S and Jia J Y. 2021. Distilling knowledge via knowledge review//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Nashville， USA： IEEE： 5006-5015 ［DOI： 10.1109/CVPR46437.2021.00497http://dx.doi.org/10.1109/CVPR46437.2021.00497］

Chen Z L， Zheng X X， Shen H L， Zeng Z Y， Zhou Y K and Zhao R C. 2020c. Improving knowledge distillation via category structure//Proceedings of the 16th European Conference on Computer Vision. Glasgow， UK： Springer： 205-219 ［DOI： 10.1007/978-3-030-58604-1_13http://dx.doi.org/10.1007/978-3-030-58604-1_13］

Cho J H and Hariharan B. 2019. On the efficacy of knowledge distillation//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision （ICCV）. Seoul， Korea （South）： IEEE： 4793-4801 ［DOI： 10.1109/ICCV.2019.00489http://dx.doi.org/10.1109/ICCV.2019.00489］

Chu Y C， Gong H， Wang X F and Liu P S. 2022. Study on knowledge distillation of target detection algorithm based on YOLOv4. Computer Science， 49（6A）： 337-344

楚玉春，龚航，王学芳，刘培顺. 2022. 基于YOLOv4的目标检测知识蒸馏算法研究. 计算机科学， 49（6A）： 337-344 ［DOI： 10.11896/jsjkx.210600204http://dx.doi.org/10.11896/jsjkx.210600204］

Chung I， Park S U， Kim J and Kwak N. 2020. Feature-map-level online adversarial knowledge distillation ［EB/OL］. ［2022-03-13］. https://arxiv.org/pdf/2002.01775.pdfhttps://arxiv.org/pdf/2002.01775.pdf

Dai X， Jiang Z R， Wu Z， Bao Y P， Wang Z C， Liu S and Zhou E J. 2021. General instance distillation for object detection//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Nashville， USA： IEEE： 7838-7847 ［DOI： 10.1109/CVPR46437.2021.00775http://dx.doi.org/10.1109/CVPR46437.2021.00775］

Fu H， Zhou S J， Yang Q H， Tang J J， Liu G Q， Liu K K and Li X L. 2021. LRC-BERT： latent-representation contrastive knowledge distillation for natural language understanding. Proceedings of the AAAI Conference on Artificial Intelligence， 35（14）： 12830-12838 ［DOI： 10.1609/aaai.v35i14.17518http://dx.doi.org/10.1609/aaai.v35i14.17518］

Gao M Y， Shen Y J， Li Q Q， Yan J J， Wan L， Lin D H， Loy C C and Tang X O. 2019. An embarrassingly simple approach for knowledge distillation ［EB/OL］. ［2022-03-13］. https://arxiv.org/pdf/1812.01819v2.pdfhttps://arxiv.org/pdf/1812.01819v2.pdf

Ge Y X， Zhang X， Choi C L， Cheung K C， Zhao P P， Zhu F， Wang X G， Zhao R and Li H S. 2021. Self-distillation with batch knowledge ensembling improves ImageNet classification ［EB/OL］. ［2022-03-13］. https://arxiv.org/pdf/2104.13298.pdfhttps://arxiv.org/pdf/2104.13298.pdf

Geng Z M， Yu M Q， Liu X B and Lyu C. 2020. Combining attention mechanism and knowledge distillation for Siamese network compression. Journal of Image and Graphics， 25（12）： 2563-2577

耿增民，余梦巧，刘峡壁，吕超. 2020. 融合注意力机制与知识蒸馏的孪生网络压缩. 中国图象图形学报， 25（12）： 2563-2577 ［DOI： 10.11834/jig.200051http://dx.doi.org/10.11834/jig.200051］

Gou J P， Yu B S， Maybank S J and Tao D C. 2021. Knowledge distillation： a survey. International Journal of Computer Vision， 129（6）： 1789-1819 ［DOI： 10.1007/s11263-021-01453-zhttp://dx.doi.org/10.1007/s11263-021-01453-z］

He K， Zhang X， Ren S and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition （CVPR）. Las Vegas， USA： IEEE： 770-778 ［DOI： 10.1109/CVPR.2016.90http://dx.doi.org/10.1109/CVPR.2016.90］

Hinton G， Vinyals O and Dean J. 2015. Distilling the knowledge in a neural network ［EB/OL］. ［2022-03-13］. https://arxiv.org/pdf/1503.02531.pdfhttps://arxiv.org/pdf/1503.02531.pdf

Huang Y G， Shen P C， Tai Y， Li S X， Liu X M， Li J L， Huang F Y and Ji R R. 2020. Improving face recognition from hard samples via distribution distillation loss//Proceedings of the 16th European Conference on Computer Vision. Glasgow， UK： Springer： 138-154 ［DOI： 10.1007/978-3-030-58577-8_9http://dx.doi.org/10.1007/978-3-030-58577-8_9］

Huang Z H， Yang X Y， Yu J， Guo L and Li X. 2022. Mutual learning knowledge distillation based on multi-stage Multi-generative Adversarial Network. Computer Science， 1-12

黄仲浩，杨兴耀，于炯，郭亮，李想.基于多阶段多生成对抗网络的互学习知识蒸馏方法. 计算机科学：1-12 ［DOI： 10.11896/jsjkx.210800250http://dx.doi.org/10.11896/jsjkx.210800250］

Jang Y， Lee H， Hwang S J and Shin J. 2019. Learning what and where to transfer//Proceedings of the 36th International Conference on Machine Learning. Long Beach， USA： PMLR： 3030-3039

Ji M， Heo B and Park S. 2021a. Show， attend and distill： knowledge distillation via attention-based feature matching. Proceedings of the AAAI Conference on Artificial Intelligence， 35（9）： 7945-7952 ［DOI： 10.1609/aaai.v35i9.16969http://dx.doi.org/10.1609/aaai.v35i9.16969］

Ji M， Shin S， Hwang S， Park G and Moon I C. 2021b. Refine myself by teaching myself： feature refinement via self-knowledge distillation//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Nashville， USA： IEEE： 10659-10668 ［DOI： 10.1109/CVPR46437.2021.01052http://dx.doi.org/10.1109/CVPR46437.2021.01052］

Jiao X Q， Yin Y C， Shang L F， Jiang X， Chen X， Li L L， Wang F and Liu Q. 2020. TinyBERT： distilling BERT for natural language understanding//Findings of the Association for Computational Linguistics： EMNLP 2020. Virtual： ACL： 4163-4174 ［DOI： 10.18653/v1/2020.findings-emnlp.372http://dx.doi.org/10.18653/v1/2020.findings-emnlp.372］

Jin Q， Ren J， Woodford O J， Wang J Z， Yuan G， Wang Y Z and Tulyakov S. 2021. Teachers do more than teach： compressing image-to-image models//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville， USA： IEEE： 13595-13606 ［DOI： 10.1109/CVPR46437.2021.01339http://dx.doi.org/10.1109/CVPR46437.2021.01339］

Kim J， Hyun M， Chung I and Kwak N. 2021. Feature fusion for online mutual knowledge distillation//Proceedings of the 25th International Conference on Pattern Recognition （ICPR）. Milan， Italy： IEEE： 4619-4625 ［DOI： 10.1109/ICPR48806.2021.9412615http://dx.doi.org/10.1109/ICPR48806.2021.9412615］

Kothandaraman D， Nambiar A and Mittal A. 2021. Domain adaptive knowledge distillation for driving scene semantic segmentation//Proceedings of 2021 IEEE Winter Conference on Applications of Computer Vision Workshops （WACVW）. Waikola， USA： IEEE： 134-143 ［DOI： 10.1109/WACVW52041.2021.00019http://dx.doi.org/10.1109/WACVW52041.2021.00019］

Lan X， Zhu X T and Gong S G. 2018. Knowledge distillation by on-the-fly native ensemble//Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montréal， Canada： Curran Associates Inc.： 7528-7538

LeCun Y， Bengio Y and Hinton G. Deep learning. Nature， 2015， 521（7553）： 436-444 ［DOI： 10.1038/nature14539http://dx.doi.org/10.1038/nature14539］

Li M Y， Lin J， Ding Y Y， Liu Z J， Zhu J Y and Han S. 2020a. GAN compression： efficient architectures for interactive conditional GANs//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Seattle， USA： IEEE： 5283-5293 ［DOI： 10.1109/CVPR42600.2020.00533http://dx.doi.org/10.1109/CVPR42600.2020.00533］

Li Q Q， Jin S Y and Yan J J. 2017. Mimicking very efficient network for object detection//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition （CVPR）. Honolulu， USA： IEEE： 7341-7349 ［DOI： 10.1109/CVPR.2017.776http://dx.doi.org/10.1109/CVPR.2017.776］

Li T H， Li J G， Liu Z and Zhang C S. 2020b. Few sample knowledge distillation for efficient network compression//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Seattle， USA： 14627-14635 ［DOI： 10.1109/cvpr42600.2020.01465http://dx.doi.org/10.1109/cvpr42600.2020.01465］

Li X J， Wu J L， Fang H Y， Liao Y， Wang F and Qian C. 2020c. Local correlation consistency for knowledge distillation//Proceedings of the 16th European Conference on Computer Vision. Glasgow， UK： Springer： 18-33 ［DOI： 10.1007/978-3-030-58610-2_2http://dx.doi.org/10.1007/978-3-030-58610-2_2］

Liu B L， Rao Y M， Lu J W， Zhou J and Hsieh C J. 2020a. MetaDistiller： network self-boosting via meta-learned top-down distillation//Proceedings of the 16th European Conference on Computer Vision. Glasgow， UK： Springer： 694-709 ［DOI： 10.1007/978-3-030-58568-6_41http://dx.doi.org/10.1007/978-3-030-58568-6_41］

Liu H and Zhang X B. 2021. Compression method for stepwise neural network based on relational distillation. Computer Systems and Applications， 30（12）： 248-254

刘昊，张晓滨. 2021. 基于关系型蒸馏的分步神经网络压缩方法. 计算机系统应用， 30（12）： 248-254 ［DOI： 10.15888/j.cnki.csa.008202http://dx.doi.org/10.15888/j.cnki.csa.008202］

Liu Y， Jia X H， Tan M X， Vemulapalli R， Zhu Y K， Green B and Wang X G. 2020b. Search to distill： pearls are everywhere but not the eyes//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Seattle， USA： IEEE： 7536-7545 ［DOI： 10.1109/CVPR42600.2020.00756http://dx.doi.org/10.1109/CVPR42600.2020.00756］

Liu Y， Zhang W and Wang J. 2020c. Learning from a lightweight teacher for efficient knowledge distillation ［EB/OL］. ［2022-03-13］. https://arxiv.org/pdf/2005.09163v1.pdfhttps://arxiv.org/pdf/2005.09163v1.pdf

Liu Y A， Zhang W and Wang J. 2020d. Adaptive multi-teacher multi-level knowledge distillation. Neurocomputing， 415： 106-113 ［DOI： 10.1016/j.neucom.2020.07.048http://dx.doi.org/10.1016/j.neucom.2020.07.048］

Liu Y F， Cao J J， Li B， Yuan C F， Hu W M， Li Y X and Duan Y Q. 2019a. Knowledge distillation via instance relationship graph//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Lone Beach， USA： IEEE： 7089-7097 ［DOI： 10.1109/CVPR.2019.00726http://dx.doi.org/10.1109/CVPR.2019.00726］

Liu Y F， Chen K， Liu C， Qin Z C， Luo Z B and Wang J D. 2019b. Structured knowledge distillation for semantic segmentation//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Long Beach， USA： IEEE： 2599-2608 ［DOI： 10.1109/CVPR.2019.00271http://dx.doi.org/10.1109/CVPR.2019.00271］

Meng Z， Li J Y， Zhao Y and Gong Y F. 2019. Conditional teacher-student learning//Proceedings of 2019 IEEE International Conference on Acoustics， Speech and Signal Processing. Brighton， UK： IEEE： 6445-6449 ［DOI： 10.1109/ICASSP.2019.8683438http://dx.doi.org/10.1109/ICASSP.2019.8683438］

Mirzadeh S I， Farajtabar M， Li A and Ghasemzadeh H. 2019. Improved knowledge distillation via teacher assistant： bridging the gap between student and teacher ［EB/OL］. ［2022-03-13］. https://arxiv.org/pdf/1902.03393v1.pdfhttps://arxiv.org/pdf/1902.03393v1.pdf

Mitsuno K， Nomura Y and Kurita T. 2021. Channel planting for deep neural networks using knowledge distillation//Proceedings of the 25th International Conference on Pattern Recognition （ICPR）. Milan， Italy： IEEE： 7573-7579 ［DOI： 10.1109/ICPR48806.2021.9412760http://dx.doi.org/10.1109/ICPR48806.2021.9412760］

Oki H， Abe M， Miyao J and Kurita T. 2020. Triplet loss for knowledge distillation//Proceedings of 2020 International Joint Conference on Neural Networks. Glasgow， UK： IEEE： 1-7 ［DOI： 10.1109/IJCNN48605.2020.9207148http://dx.doi.org/10.1109/IJCNN48605.2020.9207148］

Park S U and Kwak N. 2019a. FEED： feature-level ensemble for knowledge distillation ［EB/OL］. ［2022-03-13］. https://arxiv.org/pdf/1909.10754.pdfhttps://arxiv.org/pdf/1909.10754.pdf

Park W， Kim D， Lu Y and Cho M. 2019b. Relational knowledge distillation//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Long Beach， USA： IEEE： 3962-3971 ［DOI： 10.1109/CVPR.2019.00409http://dx.doi.org/10.1109/CVPR.2019.00409］

Ren Y X， Wu J， Xiao X F and Yang J C. 2021. Online multi-granularity distillation for GAN compression ［EB/OL］. ［2022-03-13］. https://arxiv.org/pdf/2108.06908.pdfhttps://arxiv.org/pdf/2108.06908.pdf

Romero A， Ballas N， Kahou S E， Chassang A， Gatta C and Bengio Y. 2015. FitNets： hints for thin deep nets ［EB/OL］. ［2022-03-13］. https://arxiv.org/pdf/1412.6550.pdfhttps://arxiv.org/pdf/1412.6550.pdf

Shi W D， Ren G H， Chen Y P and Yan S C. 2020. ProxylessKD： direct knowledge distillation with inherited classifier for face recognition ［EB/OL］. ［2022-03-13］. https://arxiv.org/pdf/2011.00265.pdfhttps://arxiv.org/pdf/2011.00265.pdf

Shi W X， Song Y X， Zhou H， Li B H and Li L. 2021. Follow your path： a progressive method for knowledge distillation//Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Bilbao， Spain： Springer： 596-611 ［DOI： 10.1007/978-3-030-86523-8_36http://dx.doi.org/10.1007/978-3-030-86523-8_36］

Shu C Y， Liu Y F， Gao J F， Xu L and Shen C H. 2020. Channel-wise distillation for semantic segmentation ［EB/OL］. ［2022-03-13］. https://arxiv.org/pdf/2011.13256v1.pdfhttps://arxiv.org/pdf/2011.13256v1.pdf

Simonyan K and Zisserman A. 2014. Very deep convolutional networks for large-scale image recognition ［EB/OL］. ［2022-03-13］. https://arxiv.org/pdf/1409.1556v6.pdfhttps://arxiv.org/pdf/1409.1556v6.pdf

Son W， Na J， Choi J and Hwang W. 2021. Densely guided knowledge distillation using multiple teacher assistants ［EB/OL］. ［2022-03-13］. https://arxiv.org/pdf/2009.08825v1.pdfhttps://arxiv.org/pdf/2009.08825v1.pdf

Sun S Q， Cheng Y， Gan Z and Liu J J. 2019. Patient knowledge distillation for BERT model compression//Proceedings of 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Hong Kong， China： ACL： 4323-4332 ［DOI： 10.18653/v1/D19-1441http://dx.doi.org/10.18653/v1/D19-1441］

Szegedy C， Liu W and Jia Y. 2015. Going deeper with convolutions//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition （CVPR）. Boston， USA： IEEE： 1-9 ［DOI： 10.1109/CVPR.2015.7298594http://dx.doi.org/10.1109/CVPR.2015.7298594］

Tian Y L， Krishnan D and Isola P. 2022. Contrastive representation distillation ［EB/OL］. ［2022-03-13］. https://arxiv.org/pdf/1910.10699.pdfhttps://arxiv.org/pdf/1910.10699.pdf

Tung F and Mori G. 2019. Similarity-preserving knowledge distillation//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision （ICCV）. Seoul， Korea （South）： IEEE： 1365-1374 ［DOI： 10.1109/ICCV.2019.00145http://dx.doi.org/10.1109/ICCV.2019.00145］

Valverde F R， Hurtado J V and Valada A. 2021. There is more than meets the eye： self-supervised multi-object detection and tracking with sound by distilling multimodal knowledge//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Nashville， USA： IEEE： 11607-11616 ［DOI： 10.1109/CVPR46437.2021.01144http://dx.doi.org/10.1109/CVPR46437.2021.01144］

Wang K F， Gao X T， Zhao Y R， Li X J， Dou D J and Xu C Z. 2020a. Pay attention to features， transfer learn faster CNNs//Proceedings of the 8th International Conference on Learning Representations （ICLR）. Addis Ababa， Ethiopia： Open Review：1-14

Wang X B， Fu T Y， Liao S C， Wang S， Lei Z and Mei T. 2020c. Exclusivity-consistency regularized knowledge distillation for face recognition//Proceedings of the 16th European Conference on Computer Vision. Glasgow， UK： Springer： 325-342 ［DOI： 10.1007/978-3-030-58586-0_20http://dx.doi.org/10.1007/978-3-030-58586-0_20］

Wei Y， Pan X Y， Qin H W， Ouyang W L and Yan J J. 2018. Quantization mimic： towards very tiny CNN for object detection//Proceedings of the 15th European Conference on Computer Vision. Munich， Germany： Springer： 274-290 ［DOI： DOI： 10.1007/978-3-030-01237-3_17http://dx.doi.org/DOI：10.1007/978-3-030-01237-3_17］

Wen T C， Lai S Q and Qian X M. 2021. Preparing lessons： improve knowledge distillation with better supervision. Neurocomputing， 454： 25-33 ［DOI： 10.1016/J.NEUCOM.2021.04.102http://dx.doi.org/10.1016/J.NEUCOM.2021.04.102］

Xu G， Liu Z and Loy C C. 2020. Computation-efficient knowledge distillation via uncertainty-aware mixup ［EB/OL］. ［2022-03-13］. https：//arxiv.org/pdf/2012.09413.pdfhttps://arxiv.org/pdf/2012.09413.pdf

Xu Z， Hsu Y C and Huang J W. 2018. Training shallow and thin networks for acceleration via knowledge distillation with conditional adversarial networks ［EB/OL］. ［2022-03-13］. https://arxiv.org/pdf/1709.00513.pdfhttps://arxiv.org/pdf/1709.00513.pdf

Yang C L， Xie L X， Su C and Yuille A L. 2019. Snapshot distillation： teacher-student optimization in one generation//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Long Beach， USA： IEEE： 2854-2863 ［DOI： 10.1109/CVPR.2019.00297http://dx.doi.org/10.1109/CVPR.2019.00297］

Yao L W， Pi R J， Xu H， Zhang W， Li Z G and Zhang T. 2021. G-DetKD： towards general distillation framework for object detectors via contrastive and semantic-guided feature imitation ［EB/OL］. ［2022-03-13］. https://arxiv.org/pdf/2108.07482.pdfhttps://arxiv.org/pdf/2108.07482.pdf

Yim J， Joo D， Bae J and Kim J. 2017. A gift from knowledge distillation： fast optimization， network minimization and transfer learning//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition （CVPR）. Honolulu， USA： IEEE： 7130-7138 ［DOI： 10.1109/CVPR.2017.754http://dx.doi.org/10.1109/CVPR.2017.754］

You S， Xu C， Xu C and Tao D C. 2017. Learning from multiple teacher networks//Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Halifax， Canada： Association for Computing Machinery： 1285-1294 ［DOI： DOI： 10.1145/3097983.3098135http://dx.doi.org/DOI：10.1145/3097983.3098135］

Yue K Y， Deng J F and Zhou F. 2020. Matching guided distillation//Proceedings of the 16th European Conference on Computer Vision. Glasgow， UK： Springer： 312-328 ［DOI： 10.1007/978-3-030-58555-6_19http://dx.doi.org/10.1007/978-3-030-58555-6_19］

Zagoruyko S and Komodakis N. 2017. Paying more attention to attention： improving the performance of convolutional neural networks via attention transfer ［EB/OL］. ［2022-03-13］. https://arxiv.org/pdf/1612.03928.pdfhttps://arxiv.org/pdf/1612.03928.pdf

Zhang L F， Song J B， Gao A N， Chen J W， Bao C L and Ma K S. 2019. Be your own teacher： improve the performance of convolutional neural networks via self-distillation//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision （ICCV）. Seoul， Korea （South）： IEEE： 3712-3721 ［DOI： 10.1109/ICCV.2019.00381http://dx.doi.org/10.1109/ICCV.2019.00381］

Zhang L F， Shi Y K， Shi Z Q， Ma K S and Bao C L. 2020a. Task-oriented feature distillation//Proceedings of the 34th International Conference on Neural Information Processing Systems. Vancouver， Canada： Curran Associates Inc.： 14759-14771

Zhang Y， Xiang T， Hospedales T M and Lu H C. 2018. Deep mutual learning//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City， USA： IEEE： 4320-4328 ［DOI： 10.1109/CVPR.2018.00454http://dx.doi.org/10.1109/CVPR.2018.00454］

Zhang Y C， Lan Z H， Dai Y C， Zeng F G， Bai Y， Chang J and Wei Y C. 2020b. Prime-aware adaptive distillation//Proceedings of the 16th European Conference on Computer Vision. Glasgow， UK： Springer： 658-674 ［DOI： 10.1007/978-3-030-58529-7_39http://dx.doi.org/10.1007/978-3-030-58529-7_39］

Zhao H R， Sun X， Dong J Y， Chen C R and Dong Z H. 2019. Highlight every step： knowledge distillation via collaborative teaching ［EB/OL］. ［2022-03-13］. https://arxiv.org/pdf/1907.09643.pdfhttps://arxiv.org/pdf/1907.09643.pdf

Zhi Z， Ning G H and He Z H. 2017. Knowledge projection for effective design of thinner and faster deep neural networks ［EB/OL］. ［2022-03-13］. https://arxiv.org/pdf/1710.09505.pdfhttps://arxiv.org/pdf/1710.09505.pdf

Zhou G R， Fan Y， Cui R P， Bian W J， Zhu X Q and Gai K. 2018. Rocket launching： a universal and efficient framework for training well-performing light net. Proceedings of the AAAI Conference on Artificial Intelligence， 32（1）： 4580-4587 ［DOI： 10.1609/aaai.v32i1.11601http://dx.doi.org/10.1609/aaai.v32i1.11601］

Zhu J G， Tang S X， Chen D P， Yu S J， Liu Y K， Rong M Z， Yang A J and Wang X H. 2021. Complementary relation contrastive distillation//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Nashville， USA： IEEE： 9256-9265 ［DOI： 10.1109/CVPR46437.2021.00914http://dx.doi.org/10.1109/CVPR46437.2021.00914］

文章被引用时，请邮件提醒。

提交

跨视角步态识别综述

多模态数据的行为识别综述

引入概率分布的深度神经网络贪婪剪枝

“三维视觉—语言”推理技术的前沿研究与最新趋势

深度学习实时语义分割综述