Current Issue Cover

宋涛, 张景涛, 李沩沩, 赵明富, 冉璐, 叶定兴, 杨贻晨, 岳岱衡(重庆理工大学)

摘 要
摘 要 :目的 知识蒸馏作为一种有效可行的模型压缩方法现已在图像分类领域得到了广泛的研究。由于目标检测网络的复杂性加大了知识蒸馏的难度,现有应用于目标检测的知识蒸馏方法效果并不理想。为此,提出一种适用于目标检测任务的知识蒸馏方法,即基于知识增强的特征编辑重建蒸馏,实现对目标检测模型的有效压缩。方法 针对现有方法只在教师模型和学生模型的对应特征层之间进行蒸馏,无法充分利用教师模型中隐藏的“暗知识”的问题,通过空间注意力和通道注意力分别对教师特征进行自下而上和自上而下的多尺度特征融合以进行知识增强;针对当教师模型和学生模型能力差距过大时,学生模型无法理解教师模型的知识导致蒸馏性能受限的问题,通过将教师模型的部分特征作为先验知识融合到学生模型的特征中,以缩小教师模型和学生模型之间的表征能力差距,并删除学生模型特征的边缘、轮廓等细节信息,实现特征编辑,然后迫使学生模型利用剩余特征结合先验知识通过一个简单的卷积块来恢复删除的细节信息,进行特征重建,使学生模型在此过程中得到正向的反馈从而学习到更好的特征。结果 实验在两个数据集上采用基于ResNet50的RetinaNet、Faster-RCNN、FCOS三种不同类型的检测器与最新的4种方法进行了比较,VOC2007测试集mAP对比基线分别提高了2.1%、2.7%和3.8%;NEU-DET测试集mAP对比基线分别提高了2.7%、2.6%和2.1%,均高于当前性能最优的算法。结论 本文所提出的方法能充分挖掘出教师模型的能力,有效提升学生模型的性能,同时适用于多种类型的目标检测器。
Knowledge enhancement-based feature editing for reconstructing distillation methods

songtao, zhangjingtao, liweiwei, zhaomingfu, ranlu, yedingxing, yangyichen, yuedaiheng(Chongqing University of Technology)

Abstract: Objective In recent years, convolutional neural networks have played a great potential in various fields of computer vision due to their excellent feature extraction ability. As the performance of the model increases, the size of the model also becomes more and more bloated, and the huge number of parameters makes its inference speed slow, and even with the help of GPU acceleration, it still can"t meet the real-time demand of many application scenarios. In addition, the memory and storage space occupation also increases the cost of use, which leads to these large models are difficult to be deployed and run on mobile devices or embedded platforms with limited arithmetic power and storage space, restricting their promotion, so how to compress large deep neural network models is a key issue. Knowledge distillation is a simple and effective method of model compression. Unlike the idea of model pruning or parameter quantization, knowledge distillation is essentially a special model training method, which does not directly make changes to the model structure or parameters so as to compress the volume, but rather, without changing the model structure and parameters, it is also used in model training to learn the hard labels in the training set in addition to the hard labels in the training set, and also to guide the teacher"s model by using the teacher"s model classification In addition to learning the hard labels in the training set, we also use the classification output of the teacher"s model as soft labels to guide the learning of the student model, so that the hidden "dark knowledge" in the teacher"s model, which is powerful but with a bloated network structure, is transferred to the student"s model, which is relatively simple in network structure and with smaller parameter volume, so that the student"s model, which has fewer model parameters and a faster reasoning speed, can achieve a comparable accuracy with the teacher"s model, thus achieving the goal of Model compression. However, the target detection task requires both classifying the target and outputting the specific position of the target in the picture, which is not enough just by learning the labels output from the teacher model, which is why the traditional knowledge distillation method used for classification does not work well on the target detection task. And the network structure of the detector is more complex, so existing knowledge distillation methods based on the target detection task usually let the features of the student model directly learn the features of the teacher model for distillation, instead of learning the labels of the teacher. However, existing methods also have multiple limitations, so a new knowledge distillation method applicable to the target detection task is proposed, i.e., feature editing reconstruction distillation based on knowledge enhancement to achieve effective compression of the target detection model. Method Two modules are constructed to address two common problems of the current knowledge distillation methods applied to the target detection task: 1) knowledge enhancement module, and 2) feature editing and reconstruction module. To address the problem that the existing methods distill only between the corresponding feature layers of the teacher model and the student model, which cannot fully utilize the hidden "dark knowledge" in the teacher model, the knowledge enhancement module is introduced to enhance the knowledge of the teacher model through spatial attention and channel attention, respectively. Therefore, the Knowledge Enhancement Module is introduced to perform bottom-up and top-down multi-scale feature fusion of teacher features through spatial attention and channel attention respectively for knowledge enhancement; in addition, as the performance of the teacher model continues to improve, the ability gap between students and teachers becomes larger and larger, and the trend of performance enhancement of the student model gradually reaches saturation or even decreases, the Feature Edit Reconstruction Module is used to construct a new distillation paradigm to fuse some features of the teacher model into the features of the student model as a priori knowledge to narrow the representation capability gap between the teacher model and the student model, and randomly deletes the detailed information such as edges, contours, etc. of the student model"s features through a pixel-level mask to realize feature editing, and then forces the student model to use the remaining features in combination with the a priori knowledge through a simple convolutional block to recover the The deleted detail information is used for feature reconstruction, and in the process of model learning to optimize the quality of the reconstructed feature maps, the student"s original feature maps are back-propagated through feature reconstruction thus learning features with stronger representational capabilities. Result Experiments were conducted based on three different types of detectors, RetinaNet, Faster-RCNN, and FCOS, on the generalized target detection dataset VOC2007 and the steel surface defects dataset NEU-DET, respectively, using the teacher model-student model of ResNet101-ResNet50. Firstly, through the feature map visualization, it is found that the feature map of the distilled detector responds significantly less to the background noise information and pays more attention to the foreground critical region; in the comparison of the visualization of the detection results, the distilled detector significantly improves the situation of misdetection and omission. Evaluating the detection results using the mAP metrics, the mAP comparison baselines for the VOC2007 test set improved by 2.1%, 2.7%, and 3.8%, respectively; and the mAP comparison baselines for the NEU-DET test set improved by 2.7%, 2.6%, and 2.1%, respectively. Conclusion In this study, a new knowledge distillation method for target detection task is proposed, and the experimental results show that this method can significantly improve the focus of the detector feature map on the key target region, reduce the interference of noise, thus reducing the false detection and leakage rate, and the accuracy improvement is better than several state-of-the-art algorithms, and it is suitable for both general target detection datasets and specialized defect detection datasets, with good generalization performance. It can be applied to many types of detectors at the same time.