深度学习目标检测方法综述
Survey on deep learning object detection
- 2020年25卷第4期 页码:629-654
纸质出版日期: 2020-04-16 ,
录用日期: 2019-09-22
DOI: 10.11834/jig.190307
移动端阅览
浏览全部资源
扫码关注微信
纸质出版日期: 2020-04-16 ,
录用日期: 2019-09-22
移动端阅览
赵永强, 饶元, 董世鹏, 张君毅. 深度学习目标检测方法综述[J]. 中国图象图形学报, 2020,25(4):629-654.
Yongqiang Zhao, Yuan Rao, Shipeng Dong, Junyi Zhang. Survey on deep learning object detection[J]. Journal of Image and Graphics, 2020,25(4):629-654.
目标检测的任务是从图像中精确且高效地识别、定位出大量预定义类别的物体实例。随着深度学习的广泛应用,目标检测的精确度和效率都得到了较大提升,但基于深度学习的目标检测仍面临改进与优化主流目标检测算法的性能、提高小目标物体检测精度、实现多类别物体检测、轻量化检测模型等关键技术的挑战。针对上述挑战,本文在广泛文献调研的基础上,从双阶段、单阶段目标检测算法的改进与结合的角度分析了改进与优化主流目标检测算法的方法,从骨干网络、增加视觉感受野、特征融合、级联卷积神经网络和模型的训练方式的角度分析了提升小目标检测精度的方法,从训练方式和网络结构的角度分析了用于多类别物体检测的方法,从网络结构的角度分析了用于轻量化检测模型的方法。此外,对目标检测的通用数据集进行了详细介绍,从4个方面对该领域代表性算法的性能表现进行了对比分析,对目标检测中待解决的问题与未来研究方向做出预测和展望。目标检测研究是计算机视觉和模式识别中备受青睐的热点,仍然有更多高精度和高效的算法相继提出,未来将朝着更多的研究方向发展。
The task of object detection is to accurately and efficiently identify and locate a large number of predefined objects from images. It aims to locate interested objects from images
accurately determine the categories of each object
and provide the boundaries of each object. Since the proposal of Hinton on the use of deep neural network for automatic learning of high-level features in multimedia data
object detection based on deep learning has become an important research hotspot in computer vision. With the wide application of deep learning
the accuracy and efficiency of object detection are greatly improved. However
object detection based on deep learning still have four key technology challenges
namely
improving and optimizing the mainstream object detection algorithms
balancing the detection speed and accuracy
improving the small object detection accuracy
achieving multiclass object detection
and lightweighting the detection model. In view of the above challenges
this study analyzes and summarizes the existing research methods from different aspects. On the basis of extensive literature research
this work analyzed the methods of improving and optimizing the mainstream object detection algorithm from three aspects:the improvement of two-stage object detection algorithm
the improvement of single-stage object detection algorithm
and the combination of two-stage object detection algorithm and single-stage object detection algorithm. In the improvement of the two-stage object detection algorithm
some classical two-stage object detection algorithms
such as R-CNN (region based convolutional neural network)
SPPNet(spatial pyramid pooling net)
Fast R-CNN
and Faster R-CNN
and some state-of-the-art two-stage object detection algorithms
including Mask R-CNN
Soft-NMS(non maximum suppression)
and Softer-NMS
are mainly described. In the improvement of single-stage object detection algorithm
some classical single-stage object detection algorithms
such as YOLO(you only look once)v1
SSD(single shot multiBox detector)
and YOLOv2
and the state-of-the-art single-stage object detection algorithms
including YOLOv3
are mainly described. In the combination of two-stage and one-stage object detection algorithms
RON(reverse connection with objectness prior networks) and RefineDet algorithms are mainly described. This study analyzes and summarizes the methods to improve the accuracy of small object detection from five perspectives:using new backbone network
increasing visual field
feature fusion
cascade convolution neural network
and modifying the training method of the model. The new backbone network mainly introduces DetNet
DenseNet
and DarkNet. The backbone network DarkNet is introduced in detail in the improvement of single segment object detection algorithm. It mainly includes two backbone network architectures:DarkNet-19 application in YOLOv2 and DarkNet-53 application in YOLOv3. The related algorithms of increasing receptive field mainly include RFB(receptive field block) Net and TridentNet. The methods of feature fusion mainly involve feature pyramid networks
DES(detection with enriched semantics)
and NAS-FPN(neural architecture search-feature pyramid networks). The related algorithms of cascade convolutional neural network mainly include Cascade R-CNN and HRNet. The related algorithms of model training mode optimization mainly consist of YOLOv2
SNIP(scale normalization for image pyramids)
and Perceptual GAN(generative adversarial networks). In this study
the method of multiclass object detection is analyzed from the point of view of training method and network structure. The related algorithms of training method optimization mainly include large scale detection through Adaptation
YOLO9000
and Soft Sampling. The related algorithms of network structure improvement mainly include R-FCN-3000. This study analyzes the methods used in lightweight detection model from the perspective of network structure
such as ShuffleNetv1
ShuffleNetv2
MobileNetv1
MobileNetv2
and Mobile Netv3. MobileNetv1 uses depthwise separable convolution to reduce the parameters and computational complexity of the model
and employs pointwise convolution to solve the problem of information flow between the feature maps. MobileNetv2 uses linear bottlenecks to remove the nonlinear activation layer behind the small dimension output layer
thus ensuring the expressive ability of the model. MobileNetv2 also utilizes inverted residual block to improve the model. MobileNetv3 employs complementary search technology combination and network structure improvement to improve the detection accuracy and speed of the model. In this study
the common datasets
such as Caltech
Tiny Images
Cifar
Sun
Places
and Open Images
and the commonly used datasets
including PASCAL VOC 2007
PASCAL VOC 2012
MS COCO(common objects in context)
and ImageNet
are introduced in detail. The information of each dataset is summarized
and a set of datasets is established. A table of general datasets is presented
and the dataset name
total images
number of categories
image size
started year
and characteristics of each dataset are introduced in detail. At the same time
the main performance indexes of object detection algorithms
such as accuracy
precision
recall
average precision
and mean average precision
are introduced in detail. Finally
according to the object detection
this work introduces the main performance indicators in detail. Four key technical challenges in the process of measurement
research
and development are compared and analyzed. In addition
a table is set up to describe the performance of some representative algorithms in object detection from the aspects of algorithm name
backbone network
input image size
test dataset
detection accuracy
detection speed
and single-stage or two-stage partition. The traditional object detection algorithm
the improvement and optimization algorithm of the mainstream object detection algorithm
the related information of the small object detection accuracy algorithm
and the multicategory object detection algorithm are improved
to predict and prospect the problems to be solved in object detection and the future research direction. The related research of object detection is still a hot spot in computer vision and pattern recognition. Several high-precision and efficient algorithms are proposed constantly
and increasing research directions will be developed in the future. The key technologies of object detection based on in-depth learning need to be solved in the next step. The future research directions mainly include how to make the model suitable for the detection needs of specific scenarios
how to achieve accurate object detection problems under the condition of lack of prior knowledge
how to obtain high-performance backbone network and information
how to add rich image semantic information
how to improve the interpretability of deep learning model
and how to automate the realization of the optimal network architecture.
目标检测深度学习小目标多类别轻量化
object detectiondeep learningsmall objectmulti-classlightweighting
Bodla N, Singh B, Chellappa R and Davis L S. 2017. Soft-NMS-improving object detection with one line of code//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 5562-5570[DOI: 10.1109/ICCV.2017.593http://dx.doi.org/10.1109/ICCV.2017.593]
Cai Z W and Vasconcelos N. 2018. Cascade R-CNN: delving into high quality object detection//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA: IEEE[DOI: 10.1109/CVPR.2018.00644http://dx.doi.org/10.1109/CVPR.2018.00644]
Chen L C, Papandreou G, Kokkinos I, Murphy K and Yuille A L. 2018. DeepLab:semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4):834-848[DOI:10.1109/TPAMI.2017.2699184]
Cheng Y, Wang D, Zhou P and Zhang T. 2018. Model compression and acceleration for deep neural networks:the principles, progress, and challenges. IEEE Signal Processing Magazine, 35(1):126-136[DOI:10.1109/MSP.2017.2765695]
Chollet F. 2017. Xception: deep learning with depthwise separable convolutions//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE: 1800-1807[DOI: 10.1109/CVPR.2017.195http://dx.doi.org/10.1109/CVPR.2017.195]
Dai J F, Li Y, He K M and Sun J. 2016. R-FCN: object detection via region-based fully convolutional networks[EB/OL]. (2016-05-20)[2019-06-20].https://arxiv.org/pdf/1605.06409.pdfhttps://arxiv.org/pdf/1605.06409.pdf
Dai J F, Qi H Z, Xiong Y W, Li Y, Zhang G D, Hu H and Wei Y C. 2017. Deformable convolutional networks//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 764-773[DOI: 10.1109/ICCV.2017.89http://dx.doi.org/10.1109/ICCV.2017.89]
Demirel B, Cinbis R G and Ikizler-Cinbis N. 2018. Zero-shot object detection by hybrid region embedding[EB/OL]. (2018-05-16)[2019-06-20].https://arxiv.org/pdf/1805.06157.pdfhttps://arxiv.org/pdf/1805.06157.pdf
Divvala S K, Efros A A and Hebert M. 2012. How important are "deformable parts" in the deformable parts model?//Fusiello A, Murino V and Cucchiara R, eds. Computer Vision-ECCV 2012. Workshops and Demonstrations, . Berlin, Heidelberg: Springer: 31-40[DOI: 10.1007/978-3-642-33885-4_4http://dx.doi.org/10.1007/978-3-642-33885-4_4]
Everingham M, Eslami S M A, van Gool L, Williams C K I, Winn J and Zisserman A. 2015. The PASCAL visual object classes challenge:a retrospective. International Journal of Computer Vision, 111(1):98-136[DOI:10.1007/s11263-014-0733-5]
Fu C Y, Liu W, Ranga A, Tyagi A and Berg A C. 2017. DSSD: deconvolutional single shot detector[EB/OL]. (2017-01-23)[2019-06-20].https://arxiv.org/pdf/1701.06659.pdfhttps://arxiv.org/pdf/1701.06659.pdf
Gao H Y, Tao X, Shen X Y and Jia J Y. 2019. Dynamic scene deblurring with parameter selective sharing and nested skip connections[EB-OL].[2019-06-20].http://jiaya.me/papers/deblur_cvpr19.pdfhttp://jiaya.me/papers/deblur_cvpr19.pdf
Ghiasi G, Lin T Y, Pang R M and Le Q V. 2019. NAS-FPN: learning scalable feature pyramid architecture for object detection[EB/OL]. (2019-04-16)[2019-06-20].https://arxiv.org/pdf/1904.07392.pdfhttps://arxiv.org/pdf/1904.07392.pdf
Girshick R, Donahue J, Darrell T and Malik J. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, OH, USA: IEEE: 580-587[DOI: 10.1109/CVPR.2014.81http://dx.doi.org/10.1109/CVPR.2014.81]
Girshick R, Iandola F, Darrell T and Malik J. 2015. Deformable part models are convolutional neural networks//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA, USA: IEEE: 437-446[DOI: 10.1109/CVPR.2015.7298641http://dx.doi.org/10.1109/CVPR.2015.7298641]
Goodfellow I J, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A and Bengio Y. 2014. Generative adversarial nets//Proceedings of the 27th International Conference on Neural Information Processing Systems. Montreal, Canada: MIT Press: 2672-2680
Griffin G, Holub A and Perona P. 2017. Caltech-256 object category dataset[EB/OL].[2019-06-20].https://authors.library.caltech.edu/7694/1/CNS-TR-2007-001.pdfhttps://authors.library.caltech.edu/7694/1/CNS-TR-2007-001.pdf
Guo Y, Li Y L and Wang S J. 2019. Hierarchical structure and joint training for large scale semi-supervised object detection[EB/OL]. (2019-05-30)[2019-06-20].https://arxiv.org/pdf/1905.12863.pdfhttps://arxiv.org/pdf/1905.12863.pdf
Han G X, Zhang X and Li C R. 2017. Single shot object detection with top-down refinement//Proceedings of 2017 IEEE International Conference on Image Processing. Beijing, China: IEEE: 3360-3364[DOI: 10.1109/ICIP.2017.8296905http://dx.doi.org/10.1109/ICIP.2017.8296905]
Havard W, Besacier L and Rosec O. 2017. SPEECH-COCO: 600k visually grounded spoken captions aligned to MSCOCO dataset[EB/OL]. (2017-07-26)[2019-06-20].https://arxiv.org/pdf/1707.08435.pdfhttps://arxiv.org/pdf/1707.08435.pdf
He K M, Gkioxari G, Dollár P and Girshick R. 2017. Mask R-CNN//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 2980-2988[DOI: 10.1109/ICCV.2017.322http://dx.doi.org/10.1109/ICCV.2017.322]
He K M, Zhang X Y, Ren S Q and Sun J. 2015. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(9):1904-1916[DOI:10.1109/TPAMI.2015.2389824]
He, K M, Zhang X Y, Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE: 770-778[DOI: 10.1109/CVPR.2016.90http://dx.doi.org/10.1109/CVPR.2016.90]
He Y H, Zhang X Y, Savvides M and Kitani K. 2018. Softer-NMS: rethinking bounding box regression for accurate object detection[EB/OL]. (2018-09-23)[2019-06-20].https://arxiv.org/pdf/1809.08545.pdfhttps://arxiv.org/pdf/1809.08545.pdf
Hoffman J, Guadarrama S, Tzeng E, Hu J, Donahue J, Girshick R, Darrell T and Saenko K. 2014. LSDA: large scale detection through adaptation[EB/OL]. (2014-07-18)[2019-06-20].https://arxiv.org/pdf/1407.5035.pdfhttps://arxiv.org/pdf/1407.5035.pdf
Howard A, Sandler M, Chu G, Chen L C, Chen B, Tan M X, Wang W J, Zhu Y K, Pang R M, Vasudevan V, Le Q V and Adam H. 2019. Searching for MobileNetV3[EB/OL].[2019-06-20].https://arxiv.org/pdf/1905.02244.pdfhttps://arxiv.org/pdf/1905.02244.pdf
Howard A G, Zhu M L, Chen B, Kalenichenko D, Wang W J, Weyand T, Andreetto M and Adam H. 2017. MobileNets: efficient convolutional neural networks for mobile vision applications[EB/OL]. (2017-04-17)[2019-06-20].https://arxiv.org/pdf/1704.04861.pdfhttps://arxiv.org/pdf/1704.04861.pdf
Huang G, Liu Z, van der Maaten L and Weinberger, K Q. 2017. Densely connected convolutional networks//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE: 2261-2269[DOI: 10.1109/CVPR.2017.243http://dx.doi.org/10.1109/CVPR.2017.243]
Juan L and Gwun O. 2013. A comparison of SIFT, PCA-SIFT and SURF. International Journal of Image Processing (IJIP), 3(4):143-152.
Karlinsky L, Shtok J, Harary S, Schwartz E, Aides A, Feris R, Giryes R and Bronstein A M. 2018. RepMet: representative-based metric learning for classification and one-shot object detection[EB/OL].[2019-06-20].https://arxiv.org/abs/1806.04728https://arxiv.org/abs/1806.04728
Kocabas M, Karagoz S and Akbas E. 2019. Self-supervised learning of 3D human pose using multi-view geometry[EB/OL]. (2019-04-09)[2019-06-20].http://arxiv.org/pdf/1903.02330.pdfhttp://arxiv.org/pdf/1903.02330.pdf
Kong T, Sun F C, Yao A B, Liu H P, Lu M and Chen Y R. 2017. RON: reverse connection with objectness prior networks for object detection//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE: 5244-5252[DOI: 10.1109/CVPR.2017.557http://dx.doi.org/10.1109/CVPR.2017.557]
Kong T, Yao A B, Chen Y R and Sun F C. 2016. Hypernet: towards accurate region proposal generation and joint object detection//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE: 845-853[DOI: 10.1109/CVPR.2016.98http://dx.doi.org/10.1109/CVPR.2016.98]
Krasin I, Duerig T, Alldrin N, Ferrari V, Abu-El-Haija S, Kuznetsova A, Rom H, Uijlings J, Popov S, Kamali S, Malloci M, PontTuset J, Veit A, Belongie S, Gomes V, Gupta A, Sun C, Chechik G, Cai D, Feng Z, Narayanan D and Murphy K. 2017. OpenImages: a public dataset for large-scale multi-label and multi-class image classification[EB/OL].[2019-06-20].https://github.com/openimageshttps://github.com/openimages.
LeCun Y, Bengio Y and Hinton G. 2015. Deep learning. Nature, 521(7553):436-444[DOI:10.1038/Nature14539]
Li J N, Liang X D, Wei Y C, Xu, T F, Feng J S and Yan S C. 2017. Perceptual generative adversarial networks for small object detection//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE: 1951-1959[DOI: 10.1109/CVPR.2017.211http://dx.doi.org/10.1109/CVPR.2017.211]
Li Y H, Chen Y T, Wang N Y and Zhang Z X. 2019. Scale-aware trident networks for object detection[EB/OL].[2019-06-20].https://arxiv.org/pdf/1901.01892.pdfhttps://arxiv.org/pdf/1901.01892.pdf
Li Z M, Peng C, Yu G, Zhang X Y, Deng Y D and Sun J. 2018. DetNet: a backbone network for object detection[EB/OL].[2019-06-20].https://arxiv.org/pdf/1804.06215.pdfhttps://arxiv.org/pdf/1804.06215.pdf
Lin M, Chen Q and Yan S C. 2014. Network in network[EB/OL].[2019-06-20].https://arxiv.org/pdf/1312.4400.pdfhttps://arxiv.org/pdf/1312.4400.pdf
Lin T Y, Goyal P, Girshick R, He K M and Dollár P. 2017. Focal loss for dense object detection//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 2999-3007[DOI: 10.1109/ICCV.2017.324http://dx.doi.org/10.1109/ICCV.2017.324]
Liu L, Ouyang W L, Wang X G, Fieguth P, Chen J, Liu X W and Pietikäinen M. 2019a. Deep learning for generic object detection: a survey[EB/OL].[2019-06-20].https://arxiv.org/pdf/1809.02165.pdfhttps://arxiv.org/pdf/1809.02165.pdf
Liu P J, Zhang H Z, Zhang K, Lin L and Zuo W M. 2018a. Multi-level wavelet-CNN for image restoration//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Salt Lake City, UT, USA: IEEE: 773-782[DOI: 10.1109/CVPRW.2018.00121http://dx.doi.org/10.1109/CVPRW.2018.00121]
Liu S T, Huang D and Wang Y H. 2018b. Receptive field block net for accurate and fast object detection[EB/OL].[2019-06-20].https://arxiv.org/pdf/1711.07767.pdfhttps://arxiv.org/pdf/1711.07767.pdf
Liu T, Zhao Y, Wei Y C, Zhao Y F and Wei S K. 2019b. Concealed object detection for activate millimeter wave image. IEEE Transactions on Industrial Electronics, 66(12):9909-9917[DOI:10.1109/TIE.2019.2893843]
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C Y and Berg A C. 2016. SSD: single shot MultiBox detector//Proceedings of the 14th European Conference on Computer Vision. Amsterdam: Springer: 21-37[DOI: 10.1007/978-3-319-46448-0_2http://dx.doi.org/10.1007/978-3-319-46448-0_2]
Ma N N, Zhang X Y, Zheng H T and Sun J. 2018. ShuffleNet V2: practical guidelines for efficient CNN architecture design[EB/OL]. (2018-07-30)[2019-06-20].https://arxiv.org/pdf/1807.11164.pdfhttps://arxiv.org/pdf/1807.11164.pdf
Nielsen FÅ. 2018. Linking ImageNet WordNet synsets with wikidata[EB/OL]. (2018-03-05)[2019-06-20].https://arxiv.org/pdf/1803.04349.pdfhttps://arxiv.org/pdf/1803.04349.pdf
Ning X F, Zhu W and Chen S F. 2017. Recognition, object detection and segmentation of white background photos based on deep learning//Proceedings of the 32nd Youth Academic Annual Conference of Chinese Association of Automation. Hefei, China: IEEE: 182-187[DOI: 10.1109/YAC.2017.7967401http://dx.doi.org/10.1109/YAC.2017.7967401]
Pham H, Guan M Y, Zoph B, Le Q V and Dean J. 2018. Efficient neuralarchitecture search via parameter sharing[EB/OL].[2019-06-20].http://arxiv.org/pdf/1802.03268.pdfhttp://arxiv.org/pdf/1802.03268.pdf
Qi C, Ouyang W L, Li H S, Wang X G, Liu B and Yu N H. 2017. Online multi-object tracking using cnn-based single object tracker with spatial-temporal attention mechanism//Proceedings of 2017 IEEE International Conference on Computer Vision: 4836-4845
Rahman S, Khan S and Porikli F. 2018. Zero-shot object detection: learning to simultaneously recognize and localize novel concepts//Proceedings of the 14th Asian Conference on Computer Vision. Perth: Springer: 547-563[DOI: 10.1007/978-3-030-20887-5_34http://dx.doi.org/10.1007/978-3-030-20887-5_34]
Real E, Aggarwal A, Huang Y P and Le Q V. 2019. Regularized evolution for image classifier architecture search[EB/OL].[2019-06-20].http://arxiv.org/pdf/1802.01548.pdfhttp://arxiv.org/pdf/1802.01548.pdf
Redmon J, Divvala S, Girshick R and Farhadi A. 2016. You only look once: unified, real-time object detection//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE: 779-788[DOI: 10.1109/CVPR.2016.91http://dx.doi.org/10.1109/CVPR.2016.91]
Redmon J and Farhadi A. 2017. YOLO9000: better, faster, stronger//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE: 6517-6525[DOI: 10.1109/CVPR.2017.690http://dx.doi.org/10.1109/CVPR.2017.690]
Redmon J and Farhadi A. 2018. YOLOv3: an incremental improvement[EB/OL].[2019-06-20].https://arxiv.org/pdf/1804.02767.pdfhttps://arxiv.org/pdf/1804.02767.pdf
Ren M Y, Triantafillou E, Ravi S, Snell J, Swersky K, Tenenbaum J B, Larochelle H and Zemel R S. 2018. Meta-learning for semi-supervised few-shot classification[EB/OL].[2019-06-20].https://arxiv.org/pdf/1803.00676.pdfhttps://arxiv.org/pdf/1803.00676.pdf
Rosario V M D, Borin E and Breternitz Jr, M. 2019. The multi-lane capsule network (MLCN)[EB/OL].[2019-06-22].https://arxiv.org/pdf/1902.08431.pdfhttps://arxiv.org/pdf/1902.08431.pdf
Sandler M, Howard A, Zhu M L, Zhmoginov A and Chen L C. 2019. Inverted residuals and linear bottlenecks: mobile networks for classification, detection and segmentation[EB/OL].[2019-06-20].https://arxiv.org/pdf/1801.04381v1.pdfhttps://arxiv.org/pdf/1801.04381v1.pdf
Seferbekov S, Iglovikov V, Buslaev A and Shvets A. 2018. Feature pyramid network for multi-class land segmentation//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Salt Lake City, UT, USA: IEEE: 272-2723[DOI: 10.1109/CVPRW.2018.00051http://dx.doi.org/10.1109/CVPRW.2018.00051]
Shan Y H, Lu W F and Chew C M. 2019. Pixel and feature level based domain adaptation for object detection in autonomous driving[EB/OL].[2019-06-20].https://arxiv.org/pdf/1810.00345.pdfhttps://arxiv.org/pdf/1810.00345.pdf
Shelhamer E, Long J and Darrell T. 2017. Fully convolutional networks for semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(4):640-651[DOI:10.1109/TPAMI.2016.2572683]
Simonyan K and Zisserman A. 2014. Very deep convolutional networks for large-scale image recognition[EB/OL].[2019-06-20].https://arxiv.org/pdf/1409.1556.pdfhttps://arxiv.org/pdf/1409.1556.pdf
Singh B, Li H D, Sharma A and Davis L S. 2018. R-FCN-3000 at 30fps: decoupling detection and classification//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA: IEEE: 1081-1090[DOI: 10.1109/CVPR.2018.00119http://dx.doi.org/10.1109/CVPR.2018.00119]
Sun K, Xiao B, Liu D and Wang J D. 2019a. Deep high-resolution representation learning for human pose estimation[EB/OL]. (2019-02-25)[2019-06-20].https://arxiv.org/pdf/1902.09212.pdfhttps://arxiv.org/pdf/1902.09212.pdf
Sun Y F, Xu Q, Li Y, Zhang C, Li Y K, Wang S J and Sun J. 2019b. Perceive where to focus: learning visibility-aware part-level features for partial person re-identification[EB/OL]. (2019-04-01)[2019-06-20].http://arxiv.org/pdf/1904.00537.pdfhttp://arxiv.org/pdf/1904.00537.pdf
Szegedy C, Ioffe S, Vanhoucke V and Alemi A. 2016. Inception-v4, inception-ResNet and the impact of residual connections on learning[EB/OL].[2019-06-20].https://arxiv.org/pdf/1602.07261.pdfhttps://arxiv.org/pdf/1602.07261.pdf
Tan M X, Chen B, Pang R M, Vasudevan V, Sandler M, Howard A, and Le Q V. 2019. MnasNet: platform-aware neural architecture search for mobile[EB/OL].[2019-06-20].https://arxiv.org/pdf/1807.11626.pdfhttps://arxiv.org/pdf/1807.11626.pdf
Torralba A, Fergus R and Freeman W T. 2008.80 million tiny images:a large data set for nonparametric object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(11):1958-1970[DOI:10.1109/TPAMI.2008.128]
Uijlings J R R, van de Sande K E A, Gevers T and Smeulders A W M. 2013. Selective search for object recognition. International Journal of Computer Vision, 104(2):154-171[DOI:10.1007/s11263-013-0620-5]
Verma S and Zhang Z L. 2018. Graph capsule convolutional neural networks[EB/OL].[2019-06-20].https://arxiv.org/pdf/1805.08090.pdfhttps://arxiv.org/pdf/1805.08090.pdf
Wang R J, Li X and Ling C X. 2019a. Pelee: a real-time object detection system on mobile devices[EB/OL].[2019-06-20].https://arxiv.org/pdf/1804.06882.pdfhttps://arxiv.org/pdf/1804.06882.pdf
Wang X D, Cai Z W, Gao D S and Vasconcelos N. 2019b. Towards universal object detection by domain attention[EB/OL].[2019-06-20].https://arxiv.org/abs/1904.04402.pdfhttps://arxiv.org/abs/1904.04402.pdf
Wang X Y, Han T X and Yan S C. 2009. An HOG-LBP human detector with partial occlusion handling//Proceedings of the 12th IEEEInternational Conference on Computer Vision. Kyoto, Japan: IEEE: 32-39[DOI: 10.1109/ICCV.2009.5459207http://dx.doi.org/10.1109/ICCV.2009.5459207]
Wei Y, Pan X Y, Qin H W, Ouyang W L and Yan J J. 2018. Quantization mimic: towards very tiny CNN for object detection[EB/OL].[2019-06-20].http://openaccess.thecvf.com/content_ECCV_2018/papers/Yi_Wei_Quantization_Mimic_Towards_ECCV_2018_paper.pdfhttp://openaccess.thecvf.com/content_ECCV_2018/papers/Yi_Wei_Quantization_Mimic_Towards_ECCV_2018_paper.pdf
Williams T and Li R. 2018. An ensemble of convolutional neural networks using wavelets for image classification. Journal of Software Engineering and Applications, 11(2):69-88[DOI:10.4236/jsea.2018.112004]
Wu Z, Bodla N, Singh B, Najibi M, Chellappa R and Davis L S. 2018. Soft sampling for robust object detection[EB/OL].[2019-06-20].https://arxiv.org/pdf/1806.06986.pdfhttps://arxiv.org/pdf/1806.06986.pdf
Xiao J X, Ehinger K A, Hays J, Torralba A and Oliva A. 2016. SUN database:exploring a large collection of scene categories. International Journal of Computer Vision, 119(1):3-22[DOI:10.1007/s11263-014-0748-y]
Yan Y C, Zhang Q, Ni B B, Zhang W D, Xu M H and Yang X K. 2019. Learning context graph for person search[EB/OL]. (2019-04-03)[2019-06-20].http://arxiv.org/pdf/1904.01830.pdfhttp://arxiv.org/pdf/1904.01830.pdf
Yang, T J, Howard A, Chen B, Zhang X, Go A, Sandler M, Sze V and Adam H. 2018. NetAdapt: platform-aware neural network adaptation for mobile applications[EB/OL].[2019-06-20].https://arxiv.org/pdf/1804.03230.pdfhttps://arxiv.org/pdf/1804.03230.pdf
Zagoruyko S, Lerer A, Lin T Y, Pinheiro P O, Gross S, Chintala S and Dollár P. 2016. A MultiPath network for object detection[EB/OL].[2019-06-20].https://arxiv.org/pdf/1604.02135.pdfhttps://arxiv.org/pdf/1604.02135.pdf
Zhang K, Zuo W M and Zhang L. 2019a. Deep plug-and-play super-resolution for arbitrary blur kernels[EB/OL]. (2019-03-29)[2019-06-20].http://arxiv.org/pdf/1903.12529.pdfhttp://arxiv.org/pdf/1903.12529.pdf
Zhang X Y, Zhou X Y, Lin M X and Sun J. 2018a. ShuffleNet: an extremely efficient convolutional neural network for mobile devices//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA: IEEE: 6848-6856[DOI: 10.1109/CVPR.2018.00716http://dx.doi.org/10.1109/CVPR.2018.00716]
Zhang Z, He T, Zhang H, Zhang Z Y, Xie J Y, Li M and Services A W. 2019b. Bag of freebies for training object detection neural networks[EB/OL].[2019-06-20].https://arxiv.org/pdf/1902.04103.pdfhttps://arxiv.org/pdf/1902.04103.pdf
Zhang Z S, Qiao S Y, Xie C, Shen W, Wang B and Yuille A L. 2018b. Single-shot object detection with enriched semantics//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA: IEEE: 5813-582[DOI: 10.1109/CVPR.2018.00609http://dx.doi.org/10.1109/CVPR.2018.00609]
Zhang S F, Wen L Y, Bian X, Lei Z and Li S Z. 2018c. Single-shot refinement neural network for object detection//Proceedings of 2008 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).[s.l.]: IEEE: 4203-4212
Zhao Q J, Sheng T, Wang Y T, Tang Z, Chen Y, Cai L and Ling H B. 2019. M2det: a single-shot object detector based on multi-level feature pyramid network[EB/OL].[2019-06-20].https://arxiv.org/pdf/1811.04533.pdfhttps://arxiv.org/pdf/1811.04533.pdf
Zhou B L, Lapedriza A, Khosla A, Oliva A and Torralba A. 2018a.Places:a 10 million image database for scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(6):1452-1464[DOI:10.1109/TPAMI.2017.2723009]
Zhou P, Ni B B, Geng C, Hu J G and Xu Y. 2018b. Scale-transferrable object detection//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA: IEEE: 528-537[DOI: 10.1109/CVPR.2018.00062http://dx.doi.org/10.1109/CVPR.2018.00062]
Zhou X Y, Gong W, Fu W L and Du F T. 2017. Application of deep learning in object detection//Proceedings of 2017 IEEE/ACIS 16th International Conference on Computer and Information Science. Wuhan, China: IEEE: 631-634[DOI: 10.1109/ICIS.2017.7960069http://dx.doi.org/10.1109/ICIS.2017.7960069]
Zoph B and Le Q V. 2019. Neural architecture search with reinforcement learning[EB/OL].[2019-06-20].https://arxiv.org/pdf/1611.01578.pdfhttps://arxiv.org/pdf/1611.01578.pdf
相关作者
相关机构