Model distillation for high-level semantic understanding： a survey

Sun Ruoyu; Xiong Hongkai

doi:10.11834/jig.210337

Review | Views : 0 下载量: 1 CSCD: 0

PDF
Export
Share
Collection
Album

Model distillation for high-level semantic understanding： a survey
Vol. 28, Issue 4, Pages: 935-962(2023)
Published： 16 April 2023 ，
DOI： 10.11834/jig.210337
稿件说明：

移动端阅览

孙若禹，熊红凯. 2023. 高层语义分析中的模型蒸馏方法综述. 中国图象图形学报， 28(04):0935-0962

Sun Ruoyu， Xiong Hongkai. 2023. Model distillation for high-level semantic understanding： a survey. Journal of Image and Graphics， 28(04):0935-0962
孙若禹，熊红凯. 2023. 高层语义分析中的模型蒸馏方法综述. 中国图象图形学报， 28(04):0935-0962 DOI： 10.11834/jig.210337.

Sun Ruoyu， Xiong Hongkai. 2023. Model distillation for high-level semantic understanding： a survey. Journal of Image and Graphics， 28(04):0935-0962 DOI： 10.11834/jig.210337.

摘要

计算机视觉的任务目标是建立接近人类视觉系统的计算模型。随着深度神经网络（deep neural network，DNN）的发展，对计算机视觉中高层语义的分析与理解成为研究重点。计算机视觉的高层语义通常为人类可理解、可表述的用于表达图像、视频等媒体信号内容的描述子（descriptor），典型的高层语义分析任务包含图像分类、目标检测、实例分割、语义分割与视频场景识别、目标跟踪等。基于深度神经网络的算法使计算机视觉任务获得逐步提升的性能，但是网络模型的体量增大与计算效率的降低随之而来。模型蒸馏是一种基于迁移学习进行模型压缩的方案。此类方案通常利用一个预训练模型作为教师，提取其有效的表示，如模型输出、隐藏层特征或特征间相似度等，并将上述表示作为另一个规模较小、推断速度较快的学生模型的额外监督信号，对该学生模型进行训练，以达到提升小模型性能从而取代大模型的目的。模型蒸馏对模型性能与计算复杂度有着良好权衡，因此愈来愈多地用于基于深度学习的高层语义分析中。自2014年模型蒸馏概念提出以来，研究人员开发了大量应用于高层语义分析的模型蒸馏方法，在图像分类、目标检测与语义分割任务中的应用最为广泛。本文对上述典型任务中具有代表性的模型蒸馏方案进行调研和汇总，依照不同的视觉任务进行介绍。首先，从最成熟、应用最广泛的分类任务模型蒸馏方法开始，介绍其不同的设计思路与应用场景，展示部分实验性能的对比，指出在分类任务上与在检测、分割任务上应用模型蒸馏的条件差异性。接着，对几种经特殊设计而应用于目标检测、语义分割的典型模型蒸馏方法进行介绍，结合模型结构对设计目的与思路进行说明，提供部分实验结果的对比与分析。最后，对当前高层语义分析中模型蒸馏方法的现状进行了总结分析，并指出存在的困难及不足，设想未来可能的探索思路与发展方向。

Abstract

Computer vision tasks aim to construct computational models in relevant to functions-like of human visual systems. Current deep learning models are progressively improving upper bounds of performances in multiple computer vision tasks， especially for analysis and understanding of high-level semantics， i.e.， multimedia-based descriptors for human recognition. Typical tasks to understand high-level semantics include image classification， object detection， instance segmentation， semantic segmentation， and video’s recognition and tracking. With the development of convolutional neural networks （CNNs）， deep learning based high-level semantic understanding have all been benefiting from increasingly deeper and cumbersome models， which is also challenged for the problem of storages and computational costs. To obtain lighter structure and computation efficiency， many model compression strategies have been proposed， e.g.， pruning， weight quantization， and low-rank factorization. But， such challenging issue is to be resolved for altered network structure or drop-severe of performance when deployed on computer vision tasks. Model distillation can be as one of the typical compression methods in terms of transfer learning to model compression. In general， model distillation utilizes a large and complicated pre-trained model as “teacher” and takes its effective representations， e.g.， model outputs， features of hidden layers or feature maps-between similarities. These representations are treated as extra supervision signal together with the original ground truth for a lighter and faster model’s training， in which the lighter model is called “student”. As model distillation provides favorable balance between models’ performances and efficiency， it is being rapidly explored on different computer vision tasks. This paper investigates the progress of model distillation methods since its introduction in 2014 and introduces their different strategies in various applications. We review some popular distillation strategies and current model distillation algorithms deployed on image classification， object detection and semantic segmentation in this paper. First， we introduce distillation methods for image classification tasks， where model distillation has already achieved mature development. Fundamentals of model distillation starts from using teacher classifiers’ output logits as soft labels， bringing student with more inter-categories structural information， which is not available in conventional one-hot ground truths. Furthermore， hint learning can be used to utilize hierarchical structure of neural networks and take feature maps from hidden layers as another “teachers”-involved representations. Most of distillation strategies are designed and derived from similar approaches. In the aspects of frameworks’ design and application scenes， the paper respectively introduced some typical distillation strategies on classification models. Some methods mainly considered novel approaches on supervision signal design， i.e.， ensembles that differs from conventional classification soft labels or feature maps. Newly developed features for student models to mimic are usually computed from attention or similarity maps of different layers， data augmentations or sampled images. Other methods consider adding noise or perturbation to teacher classifiers’ output or using probability inference to minimize the gap between teacher and student models. These specially designed features or logits are focused on a more appropriate representation of knowledge in teacher models than plain features from some layers’ outputs. Moreover， in other methods， the procedure of model distillation is altered， and more complicated schemes are introduced to transfer teacher’s knowledge instead of simply training the student with generated labels or features. Also， as generative adversarial networks （GANs） achieve promising performance in image synthesis， some model distillation methods also introduce adversarial mechanisms in classifiers’ distillation， where teacher models’ features are regarded as “real ones” and the students are expected to “generate” similar features. In many practical scenes such as model compression， self-training and parallel computing， classifiers’ distillation is utilized in coordinate to specific process as well， e.g.， fine tuning networks with full-precision teachers， distilling student model with its previous versions during training， and using models from different nodes as teachers. We summarize some popular strategies performances and illustrate the data in a table after approaches of model distillation in image classification tasks are introduced. Distillation methods’ performances on improving classifiers’ top-1 accuracies are compared on several typical classification datasets. The second part of the paper focuses on specially developed distillation methods for computer vision tasks more complicated than classification， e.g.， object detection， instance segmentation and semantic segmentation. Differentiated from classifiers， models of these tasks contain more redundant structures with heterogeneous outputs. Hence， recent works on detectors’ and segmentation models’ distillation is relatively less than those in classifiers’ distillation. The paper describes current challenges in designing of distillation frameworks on detection and segmentation tasks. Some of typical distillation methods for detectors and segmentation models are then introduced based on different tasks and their multifaceted structures. Since there were few works specified for instance segmentation models’ distillation， the papers simply introduce similar distillation methods for object detectors in the beginning of the second part. For detectors， requirements from localization demand special concentration on local information around foreground objects. Meanwhile， images from object detection datasets consists of more complicated scenes generally in which large amounts of different objects may occur. Hence， the solutions of distillation strategies-borrowing from for classifies may bring undesired performance decrease in object detection. Due to more complex structures in detectors， previous distillation methods may not be applicable. As “backbone with task heads” structure is widely used in modern computer vision models， researchers develop novel distillation methods mainly based on this typical framework. The introduced detectors’ distillation strategies investigate issues above and mainly focus on specific output logits acquirement and specially designed loss functions for different parts in detectors. To highlight foreground regions before distillation， backbones-derived feature maps are often selected through regions of interest （RoIs） using masking operations. Various of output logits are selected in different methods from teacher models’ task heads， affecting training of students’ task heads in terms of specific matching and imitation schemes. Semantic segmentation requires more global information than object detection or instance segmentation tasks， focusing on pixel-wise classification inside the total image. One of the critical factors of pixels’ correct classification is oriented to the analysis of inter-pixel relationships. Hence， model distillation methods for semantic segmentation also take advantages of pixels in both output masks and feature maps from hidden layers. Distillation strategies introduced in the paper are majorly on the application of hierarchical distillation on different part， e.g.， the imitation of full output classification mask， imitation of full feature maps， computing of similarity matrices， and using conditional GANs （cGANs） for auxiliary imitation. The former two approaches are fundamental practices in model distillation. In contrast， to realize segmentation model’s pixel-wise knowledge to be more ‘compact’ after compression， some distillation methods utilize compressed features instead of original one to compute similarity with student. When cGANs is used to imitate student segmentation model to the teacher features， researchers introduce Wasserstein distance as a better metric for adversarial training. At the final part of this paper， previous works of model distillation for high-level semantic understanding are summarized. We review some obstacles and unsolved problems in current development of model distillation， and the future research direction is predicted as well.

关键词

模型蒸馏深度学习图像分类目标检测语义分割迁移学习

Keywords

model distillationdeep learningimage classificationobject detectionsemantic segmentationtransfer learning

references

Ahn S S， Hu S X， Damianou A， Lawrence N D and Dai Z W. 2019. Variational information distillation for knowledge transfer//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach， USA： IEEE： 9163-9171 ［DOI： 10.1109/CVPR.2019.00938http://dx.doi.org/10.1109/CVPR.2019.00938］

Anil R， Pereyra G， Passos A， Ormandi R， Dahl G E and Hinton G E. 2020. Large scale distributed neural network training through online distillation ［EB/OL］. ［2020-08-20］. https://arxiv.org/pdf/1804.03235.pdfhttps://arxiv.org/pdf/1804.03235.pdf

Ba L J and Caruana R. 2014. Do deep nets really need to be deep？//Proceedings of the Advances in Neural Information Processing Systems 27. Montréal， Canada： MIT Press： 2654-2662

Badrinarayanan V， Kendall A and Cipolla R. 2017. SegNet： a deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence， 39（12）： 2481-2495 ［DOI： 10.1109/TPAMI.2016.2644615http://dx.doi.org/10.1109/TPAMI.2016.2644615］

Buciluǎ C， Caruana R and Niculescu-Mizil A. 2006. Model compression//Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Philadelphia， USA： ACM： 535-541 ［DOI： 10.1145/1150402.1150464http://dx.doi.org/10.1145/1150402.1150464］

Cai Z W and Vasconcelos N. 2018. Cascade R-CNN： delving into high quality object detection//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City， USA： IEEE： 6154-6162 ［DOI： 10.1109/CVPR.2018.00644http://dx.doi.org/10.1109/CVPR.2018.00644］

Chen G B， Choi W G， Yu X， Han T and Chandraker M. 2017a. Learning efficient object detection models with knowledge distillation//Proceedings of the Advances in Neural Information Processing Systems 30. Long Beach， USA： Curran Associates Inc.： 742-751

Chen L C， Papandreou G， Kokkinos I， Murphy K and Yuille A L. 2016a. Semantic image segmentation with deep convolutional nets and fully connected CRFs ［EB/OL］. ［2016-06-07］. https://arxiv.org/pdf/1412.7062.pdfhttps://arxiv.org/pdf/1412.7062.pdf

Chen L C， Zhu Y K， Papandreou G， Schroff F and Adam H. 2018. Encoder-decoder with atrous separable convolution for semantic image segmentation//Proceedings of the 15th European Conference on Computer Vision （ECCV）. Munich， Germany： Springer： 801-818 ［DOI： 10.1007/978-3-030-01234-2_49http://dx.doi.org/10.1007/978-3-030-01234-2_49］

Chen T Q， Goodfellow I and Shlens J. 2016b. Net2Net： accelerating learning via knowledge transfer ［EB/OL］. ［2016-04-23］. https://arxiv.org/pdf/1511.05641.pdfhttps://arxiv.org/pdf/1511.05641.pdf

Chen Y T， Wang N Y and Zhang Z X. 2017b. DarkRank： accelerating deep metric learning via cross sample similarities transfer ［EB/OL］. ［2017-12-18］. https://arxiv.org/pdf/1707.01220.pdfhttps://arxiv.org/pdf/1707.01220.pdf

Chollet F. 2017. Xception： deep learning with depthwise separable convolutions//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu， USA： IEEE： 1251-1258 ［DOI： 10.1109/CVPR.2017.195http://dx.doi.org/10.1109/CVPR.2017.195］

Cordts M， Omran M， Ramos S， Rehfeld T， Enzweiler M， Benenson R， Franke U， Roth S and Schiele B. 2016. The cityscapes dataset for semantic urban scene understanding//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas， USA： IEEE： 3213-3223 ［DOI： 10.1109/CVPR.2016.350http://dx.doi.org/10.1109/CVPR.2016.350］

Crowley E J， Gray G and Storkey A J. 2018. Moonshine： distilling with cheap convolutions//Proceedings of the Advances in Neural Information Processing Systems 31. Montréal， Canada： Curran Associates Inc.： 2893-2903

Dai J F， He K M， Li Y， Ren S Q and Sun J. 2016a. Instance-sensitive fully convolutional networks//Proceedings of the 14th European Conference on Computer Vision. Amsterdam， the Netherlands： Springer： 534-549 ［DOI： 10.1007/978-3-319-46466-4_32http://dx.doi.org/10.1007/978-3-319-46466-4_32］

Dai J F， Li Y， He K M and Sun J. 2016b. R-FCN： object detection via region-based fully convolutional networks ［EB/OL］. ［2016-06-21］. https://arxiv.org/pdf/1605.06409.pdfhttps://arxiv.org/pdf/1605.06409.pdf

Dai J F， Qi H Z， Xiong Y W， Li Y， Zhang G D， Hu H and Wei Y C. 2017. Deformable convolutional networks//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice， Italy： IEEE： 764-773 ［DOI： 10.1109/ICCV.2017.89http://dx.doi.org/10.1109/ICCV.2017.89］

Dai X， Jiang Z R， Wu Z， Bao Y P， Wang Z C， Liu S and Zhou E J. 2021. General instance distillation for object detection ［EB/OL］. ［2021-03-03］. https://arxiv.org/pdf/2103.02340.pdfhttps://arxiv.org/pdf/2103.02340.pdf

Deng J， Dong W， Socher R， Li L J， Li K and Li F F. 2009. ImageNet： a large-scale hierarchical image database//Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami， USA： IEEE： 248-255 ［DOI： 10.1109/CVPR.2009.5206848http://dx.doi.org/10.1109/CVPR.2009.5206848］

Du S C， You S， Li X J， Wu J L， Wang F， Qian C and Zhang C S. 2020. Agree to disagree： adaptive ensemble knowledge distillation in gradient space//Proceedings of the Advances in Neural Information Processing Systems 33. Virtual： Curran Associates Inc.： 12345-12355

Everingham M， Van Gool L， Williams C K I， Winn J and Zisserman A. 2010. The pascal visual object classes （VOC） challenge. International Journal of Computer Vision， 88（2）： 303-338 ［DOI： 10.1007/s11263-009-0275-4http://dx.doi.org/10.1007/s11263-009-0275-4］

Feng X， Jiang Y N， Yang X J， Du M and Li X. 2019. Computer vision algorithms and hardware implementations： a survey. Integration， 69： 309-320 ［DOI： 10.1016/j.vlsi.2019.07.005http://dx.doi.org/10.1016/j.vlsi.2019.07.005］

Girdhar R， Tran D， Torresani L and Ramanan D. 2019. DistInit： learning video representations without a single labeled video//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul， Korea （South）： IEEE： 852-861 ［DOI： 10.1109/ICCV.2019.00094http://dx.doi.org/10.1109/ICCV.2019.00094］

Gupta S， Hoffman J and Malik J. 2016. Cross modal distillation for supervision transfer//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas， USA： IEEE： 2827-2836 ［DOI： 10.1109/CVPR.2016.309http://dx.doi.org/10.1109/CVPR.2016.309］

Han S， Mao H Z and Dally W J. 2016. Deep compression： compressing deep neural networks with pruning， trained quantization and Huffman coding ［EB/OL］. ［2016-02-15］. https://arxiv.org/pdf/1510.00149.pdfhttps://arxiv.org/pdf/1510.00149.pdf

Han S， Pool J， Tran J and Dally W J. 2015. Learning both weights and connections for efficient neural network//Proceedings of the Advances in Neural Information Processing Systems 28. Montréal， Canada： MIT Press： 1135-1143

Hariharan B， Arbeláez P， Bourdev L， Maji S and Malik J. 2011. Semantic contours from inverse detectors//Proceedings of 2011 International Conference on Computer Vision. Barcelona， Spain： IEEE： 991-998 ［DOI： 10.1109/ICCV.2011.6126343http://dx.doi.org/10.1109/ICCV.2011.6126343］

He K M， Gkioxari G， Dollár P and Girshick R. 2017. Mask R-CNN//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice， Italy： IEEE： 2961-2969 ［DOI： 10.1109/ICCV.2017.322http://dx.doi.org/10.1109/ICCV.2017.322］

He K M， Zhang X Y， Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas， USA： IEEE： 770-778 ［DOI： 10.1109/CVPR.2016.90http://dx.doi.org/10.1109/CVPR.2016.90］

He T， Shen C H， Tian Z， Gong D， Sun C M and Yan Y L. 2019. Knowledge adaptation for efficient semantic segmentation//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach， USA： IEEE： 578-587 ［DOI： 10.1109/CVPR.2019.00067http://dx.doi.org/10.1109/CVPR.2019.00067］

Hinton G， Vinyals O and Dean J. 2015. Distilling the knowledge in a neural network ［EB/OL］. ［2015-03-09］. https://arxiv.org/pdf/1503.02531.pdfhttps://arxiv.org/pdf/1503.02531.pdf

Howard A G， Zhu M L， Chen B， Kalenichenko D， Wang W J， Weyand T， Andreetto M and Adam H. 2017. MobileNets： efficient convolutional neural networks for mobile vision applications ［EB/OL］. ［2017-04-17］. https://arxiv.org/pdf/1704.04861.pdfhttps://arxiv.org/pdf/1704.04861.pdf

Huang T S. 1996. Computer Vision： Evolution and Promise. CERN European Organization for Nuclear Research-Reports-CERN： 21-26

Huang Z Y， Zou Y， Kumar B V K V and Huang D. 2020. Comprehensive attention self-distillation for weakly-supervised object detection//Proceedings of the Advances in Neural Information Processing Systems 33. Virtual： Curran Associates Inc.： 16797-16807

Jiang W， Chan K L， Li M J and Zhang H J. 2005. Mapping low-level features to high-level semantic concepts in region-based image retrieval//Proceedings of 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Diego， USA： IEEE： 244-249 ［DOI： 10.1109/CVPR.2005.220http://dx.doi.org/10.1109/CVPR.2005.220］

Jin X， Peng B Y， Wu Y C， Liu Y， Liu J H， Liang D， Yan J J and Hu X L. 2019. Knowledge distillation via route constrained optimization//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul， Korea （South）： IEEE： 1345-1354 ［DOI： 10.1109/ICCV.2019.00143http://dx.doi.org/10.1109/ICCV.2019.00143］

Kim Y D， Park E H， Yoo S J， Choi T L， Yang L and Shin D J. 2016. Compression of deep convolutional neural networks for fast and low power mobile applications ［EB/OL］. ［2016-02-24］. https://arxiv.org/pdf/1511.06530.pdfhttps://arxiv.org/pdf/1511.06530.pdf

Krasin I， Duerig T， Alldrin N， Ferrari V， Abu-El-Haija S， Kuznetsova A， Rom H， Uijlings J， Popov S and Kamali S. 2018. OpenImages： a public dataset for large-scale multi-label and multi-class image classification ［DB/OL］. ［2018-05-01］. https://github.com/openimageshttps://github.com/openimages

Krizhevsky A and Hinton G. 2009. Learning multiple layers of features from tiny images ［EB/OL］. ［2009-04-08］. http：//citeseerx.ist.psu.edu/viewdoc/download？doi=10.1.1.222.9220&rep=rep1&type=pdfhttp://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.222.9220&rep=rep1&type=pdf

Krizhevsky A， Sutskever I and Hinton G E. 2017. ImageNet classification with deep convolutional neural networks. Communications of the ACM， 60（6）： 84-90 ［DOI： 10.1145/3065386http://dx.doi.org/10.1145/3065386］

LeCun Y， Bottou L， Bengio Y and Haffner P. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE， 86（11）： 2278-2324 ［DOI： 10.1109/5.726791http://dx.doi.org/10.1109/5.726791］

LeCun Y， Cortes C and Burges C J C. 2013. The MNIST database of handwritten digits ［DB/OL］. ［2013-05-14］. http：//yann.lecun.com/exdb/mnist/http://yann.lecun.com/exdb/mnist/

Li G L， Zhang J L， Wang Y H， Liu C J， Tan M， Lin Y F， Zhang W， Feng J S and Zhang T. 2020a. Residual distillation： towards portable deep neural networks without shortcuts//Proceedings of the Advances in Neural Information Processing Systems 33. Virtual： Curran Associates Inc.： 8935-8946

Li Q Q， Jin S Y and Yan J J. 2017a. Mimicking very efficient network for object detection//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu， USA： IEEE： 6356-6364 ［DOI： 10.1109/CVPR.2017.776http://dx.doi.org/10.1109/CVPR.2017.776］

Li T H， Li J G， Liu Z and Zhang C S. 2020b. Few sample knowledge distillation for efficient network compression//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Virtual： IEEE： 14639-14647 ［DOI： 10.1109/CVPR42600.2020.01465http://dx.doi.org/10.1109/CVPR42600.2020.01465］

Li W， Wang L M， Li W， Agustsson E and Van Gool L. 2017b. WebVision database： visual learning and understanding from web data ［EB/OL］. ［2017-08-09］. https://arxiv.org/pdf/1708.02862.pdfhttps://arxiv.org/pdf/1708.02862.pdf

Lin T Y， Goyal P， Girshick R， He K M and Dollár P. 2017. Focal loss for dense object detection//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice， Italy： IEEE： 2980-2988 ［DOI： 10.1109/ICCV.2017.324http://dx.doi.org/10.1109/ICCV.2017.324］

Lin T Y， Maire M， Belongie S， Hays J， Perona P， Ramanan D， Dollár P and Zitnik C L. 2014. Microsoft COCO： common objects in context//Proceedings of the 13th European Conference on Computer Vision. Zurich， Switzerland： Springer： 740-755 ［DOI： 10.1007/978-3-319-10602-1_48http://dx.doi.org/10.1007/978-3-319-10602-1_48］

Liu Y， Zhang D S， Lu G J and Ma W Y. 2007. A survey of content-based image retrieval with high-level semantics. Pattern Recognition， 40（1）： 262-282 ［DOI： 10.1016/j.patcog.2006.04.045http://dx.doi.org/10.1016/j.patcog.2006.04.045］

Liu Y F， Cao J J， Li B， Yuan C F， Hu W M， Li Y X and Duan Y Q. 2019a. Knowledge distillation via instance relationship graph//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach， USA： IEEE： 7096-7104 ［DOI： 10.1109/CVPR.2019.00726http://dx.doi.org/10.1109/CVPR.2019.00726］

Liu Y F， Chen K， Liu C， Qin Z C， Luo Z B and Wang J D. 2019b. Structured knowledge distillation for semantic segmentation//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach， USA： IEEE： 2604-2613 ［DOI： 10.1109/CVPR.2019.00271http://dx.doi.org/10.1109/CVPR.2019.00271］

Long J， Shelhamer E and Darrell T. 2015. Fully convolutional networks for semantic segmentation//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston， USA： IEEE： 3431-3440 ［DOI： 10.1109/CVPR.2015.7298965http://dx.doi.org/10.1109/CVPR.2015.7298965］

Luo P， Zhu Z Y， Liu Z W， Wang X G and Tang X O. 2016. Face model compression by distilling knowledge from neurons//Proceedings of the 13th AAAI Conference on Artificial Intelligence. Phoenix， USA： AAAI： 3560-3566 ［DOI： 10.5555/3016387.3016404http://dx.doi.org/10.5555/3016387.3016404］

Malinin A， Mlodozeniec B and Gales M. 2019. Ensemble distribution distillation ［EB/OL］. ［2019-11-25］. https://arxiv.org/pdf/1905.00076.pdfhttps://arxiv.org/pdf/1905.00076.pdf

Mehta R and Ozturk C. 2018. Object detection at 200 frames per second//Proceedings of 2018 European Conference on Computer Vision （ECCV）. Munich， Germany： Springer： 659-675 ［DOI： 10.1007/978-3-030-11021-5_41http://dx.doi.org/10.1007/978-3-030-11021-5_41］

Mirza M and Osindero S. 2014. Conditional generative adversarial nets ［EB/OL］. ［2014-11-06］. https://arxiv.org/pdf/1411.1784.pdfhttps://arxiv.org/pdf/1411.1784.pdf

Mirzadeh S I， Farajtabar M， Li A， Levine N， Matsukawa A and Ghasemzadeh H. 2020. Improved knowledge distillation via teacher assistant//Proceedings of 2020 AAAI Conference on Artificial Intelligence， 34（4）： 5191-5198 ［DOI： 10.1609/aaai.v34i04.5963http://dx.doi.org/10.1609/aaai.v34i04.5963］

Mishra A and Marr D. 2017. Apprentice： using knowledge distillation techniques to improve low-precision network accuracy ［EB/OL］. ［2017-11-15］. https://arxiv.org/pdf/1711.05852.pdfhttps://arxiv.org/pdf/1711.05852.pdf

Park J S， Li S， Wen W， Tang P T P， Li H， Chen Y R and Dubey P. 2017. Faster CNNs with direct sparse convolutions and guided pruning ［EB/OL］. ［2017-07-28］. https://arxiv.org/pdf/1608.01409.pdfhttps://arxiv.org/pdf/1608.01409.pdf

Park S Y and Heo Y S. 2020. Knowledge distillation for semantic segmentation using channel and spatial correlations and adaptive cross entropy. Sensors， 20（16）： #4616 ［DOI： 10.3390/s20164616http://dx.doi.org/10.3390/s20164616］

Park W P， Kim D J， Lu Y and Cho M S. 2019. Relational knowledge distillation//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach， USA： IEEE： 3967-3976 ［DOI： 10.1109/CVPR.2019.00409http://dx.doi.org/10.1109/CVPR.2019.00409］

Passalis N， Tzelepi M and Tefas A. 2020. Heterogeneous knowledge distillation using information flow modeling//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Virtual： IEEE： 2339-2348 ［DOI： 10.1109/CVPR42600.2020.00241http://dx.doi.org/10.1109/CVPR42600.2020.00241］

Peng B Y， Jin X， Li D S， Zhou S F， Wu Y C， Liu J H， Zhang Z N and Liu Y. 2019. Correlation congruence for knowledge distillation//Proceedings of 2019 IEEE International Conference on Computer Vision. Seoul， Korea （South）： IEEE： 5007-5016 ［DOI： 10.1109/ICCV.2019.00511http://dx.doi.org/10.1109/ICCV.2019.00511］

Phuong M and Lampert C H. 2019. Distillation-based training for multi-exit architectures//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul， Korea （South）： IEEE： 1355-1364 ［DOI： 10.1109/ICCV.2019.00144http://dx.doi.org/10.1109/ICCV.2019.00144］

Polino A， Pascanu R and Alistarh D. 2018. Model compression via distillation and quantization ［EB/OL］. ［2018-02-15］. https://arxiv.org/pdf/1802.05668.pdfhttps://arxiv.org/pdf/1802.05668.pdf

Radosavovic I， Dollár P， Girshick R， Gkioxari G and He K M. 2018. Data distillation： towards omni-supervised learning//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City， USA： IEEE： 4119-4128 ［DOI： 10.1109/CVPR.2018.00433http://dx.doi.org/10.1109/CVPR.2018.00433］

Redmon J， Divvala S， Girshick R and Farhadi A. 2016. You only look once： unified， real-time object detection//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas， USA： IEEE： 779-788 ［DOI： 10.1109/CVPR.2016.91http://dx.doi.org/10.1109/CVPR.2016.91］

Ren S Q， He K M， Girshick R and Sun J. 2017. Faster R-CNN： towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence， 39（6）： 1137-1149 ［DOI： 10.1109/TPAMI.2016.2577031http://dx.doi.org/10.1109/TPAMI.2016.2577031］

Romero A， Ballas N， Kahou S E， Chassang A， Gatta C and Bengio Y. 2015. Fitnets： hints for thin deep nets ［EB/OL］. ［2015-03-27］. https://arxiv.org/pdf/1412.6550.pdfhttps://arxiv.org/pdf/1412.6550.pdf

Ruder S， Ghaffari P and Breslin J G. 2017. Knowledge adaptation： teaching to adapt ［EB/OL］. ［2017-02-07］. https://arxiv.org/pdf/1702.02052.pdfhttps://arxiv.org/pdf/1702.02052.pdf

Sainath T N， Kingsbury B， Sindhwani V， Arisoy E and Ramabhadran B. 2013. Low-rank matrix factorization for deep neural network training with high-dimensional output targets//Proceedings of 2013 IEEE International Conference on Acoustics， Speech and Signal Processing. Vancouver， Canada： IEEE： 6655-6659 ［DOI： 10.1109/ICASSP.2013.6638949http://dx.doi.org/10.1109/ICASSP.2013.6638949］

Sandler M， Howard A， Zhu M L， Zhmoginov A and Chen L C. 2019. MobileNetV2： inverted residuals and linear bottlenecks ［EB/OL］. ［2019-03-21］. https://arxiv.org/pdf/1801.04381.pdfhttps://arxiv.org/pdf/1801.04381.pdf

Sau B B and Balasubramanian V N. 2016. Deep model compression： distilling knowledge from noisy teachers ［EB/OL］. ［2016-11-02］. https://arxiv.org/pdf/1610.09650.pdfhttps://arxiv.org/pdf/1610.09650.pdf

Shao S， Li Z M， Zhang T Y， Peng C， Yu G， Zhang X Y， Li J and Sun J. 2019. Objects365： a large-scale， high-quality dataset for object detection//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul， Korea （South）： IEEE： 8430-8439 ［DOI： 10.1109/ICCV.2019.00852http://dx.doi.org/10.1109/ICCV.2019.00852］

Shapiro L. 2019. Computer vision introduction ［EB/OL］. ［2019-02-22］. https://courses.cs.washington.edu/courses/cse473/19wi/notes/Vision1-19.pdfhttps://courses.cs.washington.edu/courses/cse473/19wi/notes/Vision1-19.pdf

Shu C Y， Liu Y F， Gao J F， Yan Z and Shen C H. 2021. Channel-wise knowledge distillation for dense prediction ［EB/OL］. ［2021-01-22］. https://arxiv.org/pdf/2011.13256.pdfhttps://arxiv.org/pdf/2011.13256.pdf

Simonyan K and Zisserman A. 2015. Very deep convolutional networks for large-scale image recognition ［EB/OL］. ［2015-04-10］. https://arxiv.org/pdf/1409.1556.pdfhttps://arxiv.org/pdf/1409.1556.pdf

Sinha R K， Pandey R and Pattnaik R. 2018. Deep learning for computer vision tasks： a review ［EB/OL］. ［2018-04-11］ . https://arxiv.org/ftp/arxiv/papers/1804/1804.03928.pdfhttps://arxiv.org/ftp/arxiv/papers/1804/1804.03928.pdf

Szegedy C， Liu W， Jia Y Q， Sermanet P， Reed S， Anguelov D， Erhan D， Vanhoucke V and Rabinovich A. 2015. Going deeper with convolutions//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston， USA： IEEE： 1-9 ［DOI： 10.1109/CVPR.2015.7298594http://dx.doi.org/10.1109/CVPR.2015.7298594］

Tian Y L， Krishnan D and Isola P. 2022. Contrastive representation distillation ［EB/OL］. ［2022-01-24］. https://arxiv.org/pdf/1910.10699.pdfhttps://arxiv.org/pdf/1910.10699.pdf

Tung F and Mori G. 2018. Deep neural network compression by in-parallel pruning-quantization. IEEE Transactions on Pattern Analysis and Machine Intelligence， 42（3）： 568-579 ［DOI： 10.1109/TPAMI.2018.2886192http://dx.doi.org/10.1109/TPAMI.2018.2886192］

Tung F and Mori G. 2019. Similarity-preserving knowledge distillation//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul， Korea （South）： IEEE： 1365-1374 ［DOI： 10.1109/ICCV.2019.00145http://dx.doi.org/10.1109/ICCV.2019.00145］

Voulodimos A， Doulamis N， Doulamis A and Protopapadakis E. 2018. Deep learning for computer vision： a brief review. Computational Intelligence and Neuroscience， 2018： #7068349 ［DOI： 10.1155/2018/7068349http://dx.doi.org/10.1155/2018/7068349］

Wang D D， Li Y D， Wang L Q and Gong B Q. 2020a. Neural networks are more productive teachers than human raters： active Mixup for data-efficient knowledge distillation from a Blackbox model//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Virtual： IEEE： 1498-1507 ［DOI： 10.1109/CVPR42600.2020.00157http://dx.doi.org/10.1109/CVPR42600.2020.00157］

Wang T， Yuan L， Zhang X P and Feng J S. 2019. Distilling object detectors with fine-grained feature imitation//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach， USA： IEEE： 4933-4942 ［DOI： 10.1109/CVPR.2019.00507http://dx.doi.org/10.1109/CVPR.2019.00507］

Wang X L， Kong T， Shen C H， Jiang Y N and Li L. 2020b. SOLO： segmenting objects by locations ［EB/OL］. ［2020-07-19］. https://arxiv.org/pdf/1912.04488.pdfhttps://arxiv.org/pdf/1912.04488.pdf

Wang Y K， Zhou W， Jiang T， Bai X and Xu Y C. 2020c. Intra-class feature variation distillation for semantic segmentation//Proceedings of the 16th European Conference on Computer Vision. Virtual： Springer： 346-362 ［DOI： 10.1007/978-3-030-58571-6_21http://dx.doi.org/10.1007/978-3-030-58571-6_21］

Wang Z Y， Deng Z D and Wang S Y. 2016. Accelerating convolutional neural networks with dominant convolutional kernel and knowledge pre-regression//Proceedings of the 14th European Conference on Computer Vision. Amsterdam， the Netherlands： Springer： 533-548 ［DOI： 10.1007/978-3-319-46484-8_32http://dx.doi.org/10.1007/978-3-319-46484-8_32］

Wu J X， Leng C， Wang Y H， Hu Q H and Cheng J. 2016. Quantized convolutional neural networks for mobile devices//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas， USA： IEEE： 4820-4828 ［DOI： 10.1109/CVPR.2016.521http://dx.doi.org/10.1109/CVPR.2016.521］

Wu Y. 2007. An introduction to computer vision ［EB/OL］. ［2007-03-26］. http：//users.eecs.northwestern.edu/http://users.eecs.northwestern.edu/～yingwu/teaching/EECS432/Notes/intro.pdf

Xie J F， Shuai B， Hu J F， Lin J Y and Zheng W S. 2018. Improving fast segmentation with teacher-student learning ［EB/OL］. ［2018-10-19］. https://arxiv.org/pdf/1810.08476.pdfhttps://arxiv.org/pdf/1810.08476.pdf

Xie S N， Girshick R， Dollár P， Tu Z W and He K M. 2017. Aggregated residual transformations for deep neural networks//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu， USA： IEEE： 1492-1500 ［DOI： 10.1109/CVPR.2017.634http://dx.doi.org/10.1109/CVPR.2017.634］

Xu Z， Hsu Y C and Huang J W. 2018. Training shallow and thin networks for acceleration via knowledge distillation with conditional adversarial networks ［EB/OL］. ［2018-04-16］. https://arxiv.org/pdf/1709.00513.pdfhttps://arxiv.org/pdf/1709.00513.pdf

Yang C L， Xie L X， Su C and Yuille A L. 2019. Snapshot distillation： Teacher-student optimization in one generation//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach， USA： IEEE： 2859-2868 ［DOI： 10.1109/CVPR.2019.00297http://dx.doi.org/10.1109/CVPR.2019.00297］

Yim J H， Joo D G， Bae J H and Kim J M. 2017. A gift from knowledge distillation： fast optimization， network minimization and transfer learning//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu， USA： IEEE： 4133-4141 ［DOI： 10.1109/CVPR.2017.754http://dx.doi.org/10.1109/CVPR.2017.754］

Yu F and Koltun V. 2016. Multi-scale context aggregation by dilated convolutions ［EB/OL］. ［2016-04-30］. https://arxiv.org/pdf/1511.07122.pdfhttps://arxiv.org/pdf/1511.07122.pdf

Yun S M， Park J J， Lee K M and Shin J W. 2020. Regularizing class-wise predictions via self-knowledge distillation//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Virtual： IEEE： 13876-13885 ［DOI： 10.1109/CVPR42600.2020.01389http://dx.doi.org/10.1109/CVPR42600.2020.01389］

Zagoruyko S and Komodakis N. 2017a. Wide residual networks ［EB/OL］. ［2017-06-14］. http：//arxiv.org/pdf/1605.07146.pdfhttp://arxiv.org/pdf/1605.07146.pdf

Zagoruyko S and Komodakis N. 2017b. Paying more attention to attention： Improving the performance of convolutional neural networks via attention transfer ［EB/OL］. ［2017-02-12］. https://arxiv.org/pdf/1612.03928.pdfhttps://arxiv.org/pdf/1612.03928.pdf

Zeng Z Y， Liu B， Fu J L， Chao H Y and Zhang L. 2019. WSOD2： learning bottom-up and top-down objectness distillation for weakly-supervised object detection//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul， Korea （South）： IEEE： 8292-8300 ［DOI： 10.1109/ICCV.2019.00838http://dx.doi.org/10.1109/ICCV.2019.00838］

Zhang H， Wu C R， Zhang Z Y， Zhu Y， Lin H B， Zhang Z， Sun Y， He T， Mueller J， Manmatha R， Li M and Smola A. 2022. ResNeSt： split-attention networks//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshop. New Orleans， USA： IEEE： 2736-2746 ［DOI： 10.1109/CVPRW56347.2022.00309http://dx.doi.org/10.1109/CVPRW56347.2022.00309］

Zhang H Y， Cisse M， Dauphin Y N and Lopez-Paz D. 2018. mixup： beyond empirical risk minimization ［EB/OL］. ［2018-04-27］. https://arxiv.org/pdf/1710.09412.pdfhttps://arxiv.org/pdf/1710.09412.pdf

Zhang L F and Ma K S. 2021. Improve object detection with feature-based knowledge distillation： towards accurate and efficient detectors ［EB/OL］. ［2021-03-16］. https://openreview.net/pdf？id=uKhGRvM8QNHhttps://openreview.net/pdf?id=uKhGRvM8QNH

Zhang L F， Song J B， Gao A， Chen J W， Bao C L and Ma K S. 2019. Be your own teacher： improve the performance of convolutional neural networks via self distillation//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul， Korea （South）： IEEE： 3713-3722 ［DOI： 10.1109/ICCV.2019.00381http://dx.doi.org/10.1109/ICCV.2019.00381］

Zhang Z， Ning G H and He Z H. 2017. Knowledge projection for deep neural networks ［EB/OL］. ［2017-10-26］. https://arxiv.org/pdf/1710.09505.pdfhttps://arxiv.org/pdf/1710.09505.pdf

Zhao H S， Shi J P， Qi X J， Wang X G and Jia J Y. 2017. Pyramid scene parsing network//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu， USA： IEEE： 2881-2890 ［DOI： 10.1109/CVPR.2017.660http://dx.doi.org/10.1109/CVPR.2017.660］

Zhou A J， Yao A B， Guo Y W， Xu L and Chen Y R. 2017. Incremental network quantization： towards lossless CNNs with low-precision weights ［EB/OL］. ［2017-08-25］. https://arxiv.org/pdf/1702.03044.pdfhttps://arxiv.org/pdf/1702.03044.pdf

Zhou X Y， Wang D Q and Krähenbühl P. 2019. Objects as points ［EB/OL］. ［2019-04-25］. https://arxiv.org/pdf/1904.07850.pdfhttps://arxiv.org/pdf/1904.07850.pdf

Alert me when the article has been cited

提交

Automatic capture for standard fetal cardiac four-chamber ultrasound view by fusing frame sequential relationships

Layout analysis of document images based on multifeature fusion

Full convolutional network for semantic segmentation and object detection

Comprehensive review of methods for vehicle logo recognition in intelligent transportation systems

Weakly supervised semantic segmentation based on deep learning