赵永强,饶元,董世鹏,张君毅(西安交通大学软件学院社会智能与复杂数据处理实验室, 西安 710049;
Survey on deep learning object detection
Zhao Yongqiang,Rao Yuan,Dong Shipeng,Zhang Junyi(Laboratory of Social Intelligence and Complex Data Processing, School of Software Engineering, Xi'an Jiaotong University, Xi'an 710049, China;
The task of object detection is to accurately and efficiently identify and locate a large number of predefined objects from images. It aims to locate interested objects from images, accurately determine the categories of each object, and provide the boundaries of each object. Since the proposal of Hinton on the use of deep neural network for automatic learning of high-level features in multimedia data, object detection based on deep learning has become an important research hotspot in computer vision. With the wide application of deep learning, the accuracy and efficiency of object detection are greatly improved. However, object detection based on deep learning still have four key technology challenges, namely, improving and optimizing the mainstream object detection algorithms, balancing the detection speed and accuracy, improving the small object detection accuracy, achieving multiclass object detection, and lightweighting the detection model. In view of the above challenges, this study analyzes and summarizes the existing research methods from different aspects. On the basis of extensive literature research, this work analyzed the methods of improving and optimizing the mainstream object detection algorithm from three aspects:the improvement of two-stage object detection algorithm, the improvement of single-stage object detection algorithm, and the combination of two-stage object detection algorithm and single-stage object detection algorithm. In the improvement of the two-stage object detection algorithm, some classical two-stage object detection algorithms, such as R-CNN (region based convolutional neural network), SPPNet(spatial pyramid pooling net), Fast R-CNN, and Faster R-CNN, and some state-of-the-art two-stage object detection algorithms, including Mask R-CNN, Soft-NMS(non maximum suppression), and Softer-NMS, are mainly described. In the improvement of single-stage object detection algorithm, some classical single-stage object detection algorithms, such as YOLO(you only look once)v1, SSD(single shot multiBox detector), and YOLOv2, and the state-of-the-art single-stage object detection algorithms, including YOLOv3, are mainly described. In the combination of two-stage and one-stage object detection algorithms, RON(reverse connection with objectness prior networks) and RefineDet algorithms are mainly described. This study analyzes and summarizes the methods to improve the accuracy of small object detection from five perspectives:using new backbone network, increasing visual field, feature fusion, cascade convolution neural network, and modifying the training method of the model. The new backbone network mainly introduces DetNet, DenseNet, and DarkNet. The backbone network DarkNet is introduced in detail in the improvement of single segment object detection algorithm. It mainly includes two backbone network architectures:DarkNet-19 application in YOLOv2 and DarkNet-53 application in YOLOv3. The related algorithms of increasing receptive field mainly include RFB(receptive field block) Net and TridentNet. The methods of feature fusion mainly involve feature pyramid networks, DES(detection with enriched semantics), and NAS-FPN(neural architecture search-feature pyramid networks). The related algorithms of cascade convolutional neural network mainly include Cascade R-CNN and HRNet. The related algorithms of model training mode optimization mainly consist of YOLOv2, SNIP(scale normalization for image pyramids), and Perceptual GAN(generative adversarial networks). In this study, the method of multiclass object detection is analyzed from the point of view of training method and network structure. The related algorithms of training method optimization mainly include large scale detection through Adaptation, YOLO9000, and Soft Sampling. The related algorithms of network structure improvement mainly include R-FCN-3000. This study analyzes the methods used in lightweight detection model from the perspective of network structure, such as ShuffleNetv1, ShuffleNetv2, MobileNetv1, MobileNetv2, and Mobile Netv3. MobileNetv1 uses depthwise separable convolution to reduce the parameters and computational complexity of the model, and employs pointwise convolution to solve the problem of information flow between the feature maps. MobileNetv2 uses linear bottlenecks to remove the nonlinear activation layer behind the small dimension output layer, thus ensuring the expressive ability of the model. MobileNetv2 also utilizes inverted residual block to improve the model. MobileNetv3 employs complementary search technology combination and network structure improvement to improve the detection accuracy and speed of the model. In this study, the common datasets, such as Caltech, Tiny Images, Cifar, Sun, Places, and Open Images, and the commonly used datasets, including PASCAL VOC 2007, PASCAL VOC 2012, MS COCO(common objects in context), and ImageNet, are introduced in detail. The information of each dataset is summarized, and a set of datasets is established. A table of general datasets is presented, and the dataset name, total images, number of categories, image size, started year, and characteristics of each dataset are introduced in detail. At the same time, the main performance indexes of object detection algorithms, such as accuracy, precision, recall, average precision, and mean average precision, are introduced in detail. Finally, according to the object detection, this work introduces the main performance indicators in detail. Four key technical challenges in the process of measurement, research, and development are compared and analyzed. In addition, a table is set up to describe the performance of some representative algorithms in object detection from the aspects of algorithm name, backbone network, input image size, test dataset, detection accuracy, detection speed, and single-stage or two-stage partition. The traditional object detection algorithm, the improvement and optimization algorithm of the mainstream object detection algorithm, the related information of the small object detection accuracy algorithm, and the multicategory object detection algorithm are improved, to predict and prospect the problems to be solved in object detection and the future research direction. The related research of object detection is still a hot spot in computer vision and pattern recognition. Several high-precision and efficient algorithms are proposed constantly, and increasing research directions will be developed in the future. The key technologies of object detection based on in-depth learning need to be solved in the next step. The future research directions mainly include how to make the model suitable for the detection needs of specific scenarios, how to achieve accurate object detection problems under the condition of lack of prior knowledge, how to obtain high-performance backbone network and information, how to add rich image semantic information, how to improve the interpretability of deep learning model, and how to automate the realization of the optimal network architecture.