Current Issue Cover
深度学习目标检测方法综述

赵永强1,2, 饶元1,2, 董世鹏1,2, 张君毅1,2(1.西安交通大学软件学院社会智能与复杂数据处理实验室, 西安 710049;2.
西安交通大学深圳研究院, 深圳 518057)

摘 要
目标检测的任务是从图像中精确且高效地识别、定位出大量预定义类别的物体实例。随着深度学习的广泛应用,目标检测的精确度和效率都得到了较大提升,但基于深度学习的目标检测仍面临改进与优化主流目标检测算法的性能、提高小目标物体检测精度、实现多类别物体检测、轻量化检测模型等关键技术的挑战。针对上述挑战,本文在广泛文献调研的基础上,从双阶段、单阶段目标检测算法的改进与结合的角度分析了改进与优化主流目标检测算法的方法,从骨干网络、增加视觉感受野、特征融合、级联卷积神经网络和模型的训练方式的角度分析了提升小目标检测精度的方法,从训练方式和网络结构的角度分析了用于多类别物体检测的方法,从网络结构的角度分析了用于轻量化检测模型的方法。此外,对目标检测的通用数据集进行了详细介绍,从4个方面对该领域代表性算法的性能表现进行了对比分析,对目标检测中待解决的问题与未来研究方向做出预测和展望。目标检测研究是计算机视觉和模式识别中备受青睐的热点,仍然有更多高精度和高效的算法相继提出,未来将朝着更多的研究方向发展。
关键词
Survey on deep learning object detection

Zhao Yongqiang1,2, Rao Yuan1,2, Dong Shipeng1,2, Zhang Junyi1,2(1.Laboratory of Social Intelligence and Complex Data Processing, School of Software Engineering, Xi'an Jiaotong University, Xi'an 710049, China;2.
Shenzhen Research Institution, Xi'an Jiaotong University, Shenzhen 518057, China)

Abstract
The task of object detection is to accurately and efficiently identify and locate a large number of predefined objects from images. It aims to locate interested objects from images, accurately determine the categories of each object, and provide the boundaries of each object. Since the proposal of Hinton on the use of deep neural network for automatic learning of high-level features in multimedia data, object detection based on deep learning has become an important research hotspot in computer vision. With the wide application of deep learning, the accuracy and efficiency of object detection are greatly improved. However, object detection based on deep learning still have four key technology challenges, namely, improving and optimizing the mainstream object detection algorithms, balancing the detection speed and accuracy, improving the small object detection accuracy, achieving multiclass object detection, and lightweighting the detection model. In view of the above challenges, this study analyzes and summarizes the existing research methods from different aspects. On the basis of extensive literature research, this work analyzed the methods of improving and optimizing the mainstream object detection algorithm from three aspects:the improvement of two-stage object detection algorithm, the improvement of single-stage object detection algorithm, and the combination of two-stage object detection algorithm and single-stage object detection algorithm. In the improvement of the two-stage object detection algorithm, some classical two-stage object detection algorithms, such as R-CNN (region based convolutional neural network), SPPNet(spatial pyramid pooling net), Fast R-CNN, and Faster R-CNN, and some state-of-the-art two-stage object detection algorithms, including Mask R-CNN, Soft-NMS(non maximum suppression), and Softer-NMS, are mainly described. In the improvement of single-stage object detection algorithm, some classical single-stage object detection algorithms, such as YOLO(you only look once)v1, SSD(single shot multiBox detector), and YOLOv2, and the state-of-the-art single-stage object detection algorithms, including YOLOv3, are mainly described. In the combination of two-stage and one-stage object detection algorithms, RON(reverse connection with objectness prior networks) and RefineDet algorithms are mainly described. This study analyzes and summarizes the methods to improve the accuracy of small object detection from five perspectives:using new backbone network, increasing visual field, feature fusion, cascade convolution neural network, and modifying the training method of the model. The new backbone network mainly introduces DetNet, DenseNet, and DarkNet. The backbone network DarkNet is introduced in detail in the improvement of single segment object detection algorithm. It mainly includes two backbone network architectures:DarkNet-19 application in YOLOv2 and DarkNet-53 application in YOLOv3. The related algorithms of increasing receptive field mainly include RFB(receptive field block) Net and TridentNet. The methods of feature fusion mainly involve feature pyramid networks, DES(detection with enriched semantics), and NAS-FPN(neural architecture search-feature pyramid networks). The related algorithms of cascade convolutional neural network mainly include Cascade R-CNN and HRNet. The related algorithms of model training mode optimization mainly consist of YOLOv2, SNIP(scale normalization for image pyramids), and Perceptual GAN(generative adversarial networks). In this study, the method of multiclass object detection is analyzed from the point of view of training method and network structure. The related algorithms of training method optimization mainly include large scale detection through Adaptation, YOLO9000, and Soft Sampling. The related algorithms of network structure improvement mainly include R-FCN-3000. This study analyzes the methods used in lightweight detection model from the perspective of network structure, such as ShuffleNetv1, ShuffleNetv2, MobileNetv1, MobileNetv2, and Mobile Netv3. MobileNetv1 uses depthwise separable convolution to reduce the parameters and computational complexity of the model, and employs pointwise convolution to solve the problem of information flow between the feature maps. MobileNetv2 uses linear bottlenecks to remove the nonlinear activation layer behind the small dimension output layer, thus ensuring the expressive ability of the model. MobileNetv2 also utilizes inverted residual block to improve the model. MobileNetv3 employs complementary search technology combination and network structure improvement to improve the detection accuracy and speed of the model. In this study, the common datasets, such as Caltech, Tiny Images, Cifar, Sun, Places, and Open Images, and the commonly used datasets, including PASCAL VOC 2007, PASCAL VOC 2012, MS COCO(common objects in context), and ImageNet, are introduced in detail. The information of each dataset is summarized, and a set of datasets is established. A table of general datasets is presented, and the dataset name, total images, number of categories, image size, started year, and characteristics of each dataset are introduced in detail. At the same time, the main performance indexes of object detection algorithms, such as accuracy, precision, recall, average precision, and mean average precision, are introduced in detail. Finally, according to the object detection, this work introduces the main performance indicators in detail. Four key technical challenges in the process of measurement, research, and development are compared and analyzed. In addition, a table is set up to describe the performance of some representative algorithms in object detection from the aspects of algorithm name, backbone network, input image size, test dataset, detection accuracy, detection speed, and single-stage or two-stage partition. The traditional object detection algorithm, the improvement and optimization algorithm of the mainstream object detection algorithm, the related information of the small object detection accuracy algorithm, and the multicategory object detection algorithm are improved, to predict and prospect the problems to be solved in object detection and the future research direction. The related research of object detection is still a hot spot in computer vision and pattern recognition. Several high-precision and efficient algorithms are proposed constantly, and increasing research directions will be developed in the future. The key technologies of object detection based on in-depth learning need to be solved in the next step. The future research directions mainly include how to make the model suitable for the detection needs of specific scenarios, how to achieve accurate object detection problems under the condition of lack of prior knowledge, how to obtain high-performance backbone network and information, how to add rich image semantic information, how to improve the interpretability of deep learning model, and how to automate the realization of the optimal network architecture.
Keywords

订阅号|日报