深度学习目标检测方法综述

赵永强; 饶元; 董世鹏; 张君毅

doi:10.11834/jig.190307

综述 | 浏览量 : 0 下载量: 0 CSCD: 67

PDF
导出
分享
收藏
专辑

深度学习目标检测方法综述
Survey on deep learning object detection
2020年25卷第4期页码：629-654
纸质出版日期： 2020-04-16 ，

录用日期： 2019-09-22
DOI： 10.11834/jig.190307
稿件说明：

移动端阅览

赵永强, 饶元, 董世鹏, 张君毅. 深度学习目标检测方法综述[J]. 中国图象图形学报, 2020,25(4):629-654.

Yongqiang Zhao, Yuan Rao, Shipeng Dong, Junyi Zhang. Survey on deep learning object detection[J]. Journal of Image and Graphics, 2020,25(4):629-654.
赵永强, 饶元, 董世鹏, 张君毅. 深度学习目标检测方法综述[J]. 中国图象图形学报, 2020,25(4):629-654. DOI： 10.11834/jig.190307.

Yongqiang Zhao, Yuan Rao, Shipeng Dong, Junyi Zhang. Survey on deep learning object detection[J]. Journal of Image and Graphics, 2020,25(4):629-654. DOI： 10.11834/jig.190307.

摘要

目标检测的任务是从图像中精确且高效地识别、定位出大量预定义类别的物体实例。随着深度学习的广泛应用，目标检测的精确度和效率都得到了较大提升，但基于深度学习的目标检测仍面临改进与优化主流目标检测算法的性能、提高小目标物体检测精度、实现多类别物体检测、轻量化检测模型等关键技术的挑战。针对上述挑战，本文在广泛文献调研的基础上，从双阶段、单阶段目标检测算法的改进与结合的角度分析了改进与优化主流目标检测算法的方法，从骨干网络、增加视觉感受野、特征融合、级联卷积神经网络和模型的训练方式的角度分析了提升小目标检测精度的方法，从训练方式和网络结构的角度分析了用于多类别物体检测的方法，从网络结构的角度分析了用于轻量化检测模型的方法。此外，对目标检测的通用数据集进行了详细介绍，从4个方面对该领域代表性算法的性能表现进行了对比分析，对目标检测中待解决的问题与未来研究方向做出预测和展望。目标检测研究是计算机视觉和模式识别中备受青睐的热点，仍然有更多高精度和高效的算法相继提出，未来将朝着更多的研究方向发展。

Abstract

The task of object detection is to accurately and efficiently identify and locate a large number of predefined objects from images. It aims to locate interested objects from images

accurately determine the categories of each object

and provide the boundaries of each object. Since the proposal of Hinton on the use of deep neural network for automatic learning of high-level features in multimedia data

object detection based on deep learning has become an important research hotspot in computer vision. With the wide application of deep learning

the accuracy and efficiency of object detection are greatly improved. However

object detection based on deep learning still have four key technology challenges

namely

improving and optimizing the mainstream object detection algorithms

balancing the detection speed and accuracy

improving the small object detection accuracy

achieving multiclass object detection

and lightweighting the detection model. In view of the above challenges

this study analyzes and summarizes the existing research methods from different aspects. On the basis of extensive literature research

this work analyzed the methods of improving and optimizing the mainstream object detection algorithm from three aspects:the improvement of two-stage object detection algorithm

the improvement of single-stage object detection algorithm

and the combination of two-stage object detection algorithm and single-stage object detection algorithm. In the improvement of the two-stage object detection algorithm

some classical two-stage object detection algorithms

such as R-CNN (region based convolutional neural network)

SPPNet(spatial pyramid pooling net)

Fast R-CNN

and Faster R-CNN

and some state-of-the-art two-stage object detection algorithms

including Mask R-CNN

Soft-NMS(non maximum suppression)

and Softer-NMS

are mainly described. In the improvement of single-stage object detection algorithm

some classical single-stage object detection algorithms

such as YOLO(you only look once)v1

SSD(single shot multiBox detector)

and YOLOv2

and the state-of-the-art single-stage object detection algorithms

including YOLOv3

are mainly described. In the combination of two-stage and one-stage object detection algorithms

RON(reverse connection with objectness prior networks) and RefineDet algorithms are mainly described. This study analyzes and summarizes the methods to improve the accuracy of small object detection from five perspectives:using new backbone network

increasing visual field

feature fusion

cascade convolution neural network

and modifying the training method of the model. The new backbone network mainly introduces DetNet

DenseNet

and DarkNet. The backbone network DarkNet is introduced in detail in the improvement of single segment object detection algorithm. It mainly includes two backbone network architectures:DarkNet-19 application in YOLOv2 and DarkNet-53 application in YOLOv3. The related algorithms of increasing receptive field mainly include RFB(receptive field block) Net and TridentNet. The methods of feature fusion mainly involve feature pyramid networks

DES(detection with enriched semantics)

and NAS-FPN(neural architecture search-feature pyramid networks). The related algorithms of cascade convolutional neural network mainly include Cascade R-CNN and HRNet. The related algorithms of model training mode optimization mainly consist of YOLOv2

SNIP(scale normalization for image pyramids)

and Perceptual GAN(generative adversarial networks). In this study

the method of multiclass object detection is analyzed from the point of view of training method and network structure. The related algorithms of training method optimization mainly include large scale detection through Adaptation

YOLO9000

and Soft Sampling. The related algorithms of network structure improvement mainly include R-FCN-3000. This study analyzes the methods used in lightweight detection model from the perspective of network structure

such as ShuffleNetv1

ShuffleNetv2

MobileNetv1

MobileNetv2

and Mobile Netv3. MobileNetv1 uses depthwise separable convolution to reduce the parameters and computational complexity of the model

and employs pointwise convolution to solve the problem of information flow between the feature maps. MobileNetv2 uses linear bottlenecks to remove the nonlinear activation layer behind the small dimension output layer

thus ensuring the expressive ability of the model. MobileNetv2 also utilizes inverted residual block to improve the model. MobileNetv3 employs complementary search technology combination and network structure improvement to improve the detection accuracy and speed of the model. In this study

the common datasets

such as Caltech

Tiny Images

Cifar

Sun

Places

and Open Images

and the commonly used datasets

including PASCAL VOC 2007

PASCAL VOC 2012

MS COCO(common objects in context)

and ImageNet

are introduced in detail. The information of each dataset is summarized

and a set of datasets is established. A table of general datasets is presented

and the dataset name

total images

number of categories

image size

started year

and characteristics of each dataset are introduced in detail. At the same time

the main performance indexes of object detection algorithms

such as accuracy

precision

recall

average precision

and mean average precision

are introduced in detail. Finally

according to the object detection

this work introduces the main performance indicators in detail. Four key technical challenges in the process of measurement

research

and development are compared and analyzed. In addition

a table is set up to describe the performance of some representative algorithms in object detection from the aspects of algorithm name

backbone network

input image size

test dataset

detection accuracy

detection speed

and single-stage or two-stage partition. The traditional object detection algorithm

the improvement and optimization algorithm of the mainstream object detection algorithm

the related information of the small object detection accuracy algorithm

and the multicategory object detection algorithm are improved

to predict and prospect the problems to be solved in object detection and the future research direction. The related research of object detection is still a hot spot in computer vision and pattern recognition. Several high-precision and efficient algorithms are proposed constantly

and increasing research directions will be developed in the future. The key technologies of object detection based on in-depth learning need to be solved in the next step. The future research directions mainly include how to make the model suitable for the detection needs of specific scenarios

how to achieve accurate object detection problems under the condition of lack of prior knowledge

how to obtain high-performance backbone network and information

how to add rich image semantic information

how to improve the interpretability of deep learning model

and how to automate the realization of the optimal network architecture.

关键词

目标检测深度学习小目标多类别轻量化

Keywords

object detectiondeep learningsmall objectmulti-classlightweighting

references

Bodla N, Singh B, Chellappa R and Davis L S. 2017. Soft-NMS-improving object detection with one line of code//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 5562-5570[DOI: 10.1109/ICCV.2017.593http://dx.doi.org/10.1109/ICCV.2017.593]

Cai Z W and Vasconcelos N. 2018. Cascade R-CNN: delving into high quality object detection//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA: IEEE[DOI: 10.1109/CVPR.2018.00644http://dx.doi.org/10.1109/CVPR.2018.00644]

Chen L C, Papandreou G, Kokkinos I, Murphy K and Yuille A L. 2018. DeepLab:semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4):834-848[DOI:10.1109/TPAMI.2017.2699184]

Cheng Y, Wang D, Zhou P and Zhang T. 2018. Model compression and acceleration for deep neural networks:the principles, progress, and challenges. IEEE Signal Processing Magazine, 35(1):126-136[DOI:10.1109/MSP.2017.2765695]

Chollet F. 2017. Xception: deep learning with depthwise separable convolutions//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE: 1800-1807[DOI: 10.1109/CVPR.2017.195http://dx.doi.org/10.1109/CVPR.2017.195]

Dai J F, Li Y, He K M and Sun J. 2016. R-FCN: object detection via region-based fully convolutional networks[EB/OL]. (2016-05-20)[2019-06-20].https://arxiv.org/pdf/1605.06409.pdfhttps://arxiv.org/pdf/1605.06409.pdf

Dai J F, Qi H Z, Xiong Y W, Li Y, Zhang G D, Hu H and Wei Y C. 2017. Deformable convolutional networks//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 764-773[DOI: 10.1109/ICCV.2017.89http://dx.doi.org/10.1109/ICCV.2017.89]

Demirel B, Cinbis R G and Ikizler-Cinbis N. 2018. Zero-shot object detection by hybrid region embedding[EB/OL]. (2018-05-16)[2019-06-20].https://arxiv.org/pdf/1805.06157.pdfhttps://arxiv.org/pdf/1805.06157.pdf

Divvala S K, Efros A A and Hebert M. 2012. How important are "deformable parts" in the deformable parts model?//Fusiello A, Murino V and Cucchiara R, eds. Computer Vision-ECCV 2012. Workshops and Demonstrations, . Berlin, Heidelberg: Springer: 31-40[DOI: 10.1007/978-3-642-33885-4_4http://dx.doi.org/10.1007/978-3-642-33885-4_4]

Everingham M, Eslami S M A, van Gool L, Williams C K I, Winn J and Zisserman A. 2015. The PASCAL visual object classes challenge:a retrospective. International Journal of Computer Vision, 111(1):98-136[DOI:10.1007/s11263-014-0733-5]

Fu C Y, Liu W, Ranga A, Tyagi A and Berg A C. 2017. DSSD: deconvolutional single shot detector[EB/OL]. (2017-01-23)[2019-06-20].https://arxiv.org/pdf/1701.06659.pdfhttps://arxiv.org/pdf/1701.06659.pdf

Gao H Y, Tao X, Shen X Y and Jia J Y. 2019. Dynamic scene deblurring with parameter selective sharing and nested skip connections[EB-OL].[2019-06-20].http://jiaya.me/papers/deblur_cvpr19.pdfhttp://jiaya.me/papers/deblur_cvpr19.pdf

Ghiasi G, Lin T Y, Pang R M and Le Q V. 2019. NAS-FPN: learning scalable feature pyramid architecture for object detection[EB/OL]. (2019-04-16)[2019-06-20].https://arxiv.org/pdf/1904.07392.pdfhttps://arxiv.org/pdf/1904.07392.pdf

Girshick R, Donahue J, Darrell T and Malik J. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, OH, USA: IEEE: 580-587[DOI: 10.1109/CVPR.2014.81http://dx.doi.org/10.1109/CVPR.2014.81]

Girshick R, Iandola F, Darrell T and Malik J. 2015. Deformable part models are convolutional neural networks//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA, USA: IEEE: 437-446[DOI: 10.1109/CVPR.2015.7298641http://dx.doi.org/10.1109/CVPR.2015.7298641]

Goodfellow I J, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A and Bengio Y. 2014. Generative adversarial nets//Proceedings of the 27th International Conference on Neural Information Processing Systems. Montreal, Canada: MIT Press: 2672-2680

Griffin G, Holub A and Perona P. 2017. Caltech-256 object category dataset[EB/OL].[2019-06-20].https://authors.library.caltech.edu/7694/1/CNS-TR-2007-001.pdfhttps://authors.library.caltech.edu/7694/1/CNS-TR-2007-001.pdf

Guo Y, Li Y L and Wang S J. 2019. Hierarchical structure and joint training for large scale semi-supervised object detection[EB/OL]. (2019-05-30)[2019-06-20].https://arxiv.org/pdf/1905.12863.pdfhttps://arxiv.org/pdf/1905.12863.pdf

Han G X, Zhang X and Li C R. 2017. Single shot object detection with top-down refinement//Proceedings of 2017 IEEE International Conference on Image Processing. Beijing, China: IEEE: 3360-3364[DOI: 10.1109/ICIP.2017.8296905http://dx.doi.org/10.1109/ICIP.2017.8296905]

Havard W, Besacier L and Rosec O. 2017. SPEECH-COCO: 600k visually grounded spoken captions aligned to MSCOCO dataset[EB/OL]. (2017-07-26)[2019-06-20].https://arxiv.org/pdf/1707.08435.pdfhttps://arxiv.org/pdf/1707.08435.pdf

He K M, Gkioxari G, Dollár P and Girshick R. 2017. Mask R-CNN//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 2980-2988[DOI: 10.1109/ICCV.2017.322http://dx.doi.org/10.1109/ICCV.2017.322]

He K M, Zhang X Y, Ren S Q and Sun J. 2015. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(9):1904-1916[DOI:10.1109/TPAMI.2015.2389824]

He, K M, Zhang X Y, Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE: 770-778[DOI: 10.1109/CVPR.2016.90http://dx.doi.org/10.1109/CVPR.2016.90]

He Y H, Zhang X Y, Savvides M and Kitani K. 2018. Softer-NMS: rethinking bounding box regression for accurate object detection[EB/OL]. (2018-09-23)[2019-06-20].https://arxiv.org/pdf/1809.08545.pdfhttps://arxiv.org/pdf/1809.08545.pdf

Hoffman J, Guadarrama S, Tzeng E, Hu J, Donahue J, Girshick R, Darrell T and Saenko K. 2014. LSDA: large scale detection through adaptation[EB/OL]. (2014-07-18)[2019-06-20].https://arxiv.org/pdf/1407.5035.pdfhttps://arxiv.org/pdf/1407.5035.pdf

Howard A, Sandler M, Chu G, Chen L C, Chen B, Tan M X, Wang W J, Zhu Y K, Pang R M, Vasudevan V, Le Q V and Adam H. 2019. Searching for MobileNetV3[EB/OL].[2019-06-20].https://arxiv.org/pdf/1905.02244.pdfhttps://arxiv.org/pdf/1905.02244.pdf

Howard A G, Zhu M L, Chen B, Kalenichenko D, Wang W J, Weyand T, Andreetto M and Adam H. 2017. MobileNets: efficient convolutional neural networks for mobile vision applications[EB/OL]. (2017-04-17)[2019-06-20].https://arxiv.org/pdf/1704.04861.pdfhttps://arxiv.org/pdf/1704.04861.pdf

Huang G, Liu Z, van der Maaten L and Weinberger, K Q. 2017. Densely connected convolutional networks//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE: 2261-2269[DOI: 10.1109/CVPR.2017.243http://dx.doi.org/10.1109/CVPR.2017.243]

Juan L and Gwun O. 2013. A comparison of SIFT, PCA-SIFT and SURF. International Journal of Image Processing (IJIP), 3(4):143-152.

Karlinsky L, Shtok J, Harary S, Schwartz E, Aides A, Feris R, Giryes R and Bronstein A M. 2018. RepMet: representative-based metric learning for classification and one-shot object detection[EB/OL].[2019-06-20].https://arxiv.org/abs/1806.04728https://arxiv.org/abs/1806.04728

Kocabas M, Karagoz S and Akbas E. 2019. Self-supervised learning of 3D human pose using multi-view geometry[EB/OL]. (2019-04-09)[2019-06-20].http://arxiv.org/pdf/1903.02330.pdfhttp://arxiv.org/pdf/1903.02330.pdf

Kong T, Sun F C, Yao A B, Liu H P, Lu M and Chen Y R. 2017. RON: reverse connection with objectness prior networks for object detection//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE: 5244-5252[DOI: 10.1109/CVPR.2017.557http://dx.doi.org/10.1109/CVPR.2017.557]

Kong T, Yao A B, Chen Y R and Sun F C. 2016. Hypernet: towards accurate region proposal generation and joint object detection//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE: 845-853[DOI: 10.1109/CVPR.2016.98http://dx.doi.org/10.1109/CVPR.2016.98]

Krasin I, Duerig T, Alldrin N, Ferrari V, Abu-El-Haija S, Kuznetsova A, Rom H, Uijlings J, Popov S, Kamali S, Malloci M, PontTuset J, Veit A, Belongie S, Gomes V, Gupta A, Sun C, Chechik G, Cai D, Feng Z, Narayanan D and Murphy K. 2017. OpenImages: a public dataset for large-scale multi-label and multi-class image classification[EB/OL].[2019-06-20].https://github.com/openimageshttps://github.com/openimages.

LeCun Y, Bengio Y and Hinton G. 2015. Deep learning. Nature, 521(7553):436-444[DOI:10.1038/Nature14539]

Li J N, Liang X D, Wei Y C, Xu, T F, Feng J S and Yan S C. 2017. Perceptual generative adversarial networks for small object detection//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE: 1951-1959[DOI: 10.1109/CVPR.2017.211http://dx.doi.org/10.1109/CVPR.2017.211]

Li Y H, Chen Y T, Wang N Y and Zhang Z X. 2019. Scale-aware trident networks for object detection[EB/OL].[2019-06-20].https://arxiv.org/pdf/1901.01892.pdfhttps://arxiv.org/pdf/1901.01892.pdf

Li Z M, Peng C, Yu G, Zhang X Y, Deng Y D and Sun J. 2018. DetNet: a backbone network for object detection[EB/OL].[2019-06-20].https://arxiv.org/pdf/1804.06215.pdfhttps://arxiv.org/pdf/1804.06215.pdf

Lin M, Chen Q and Yan S C. 2014. Network in network[EB/OL].[2019-06-20].https://arxiv.org/pdf/1312.4400.pdfhttps://arxiv.org/pdf/1312.4400.pdf

Lin T Y, Goyal P, Girshick R, He K M and Dollár P. 2017. Focal loss for dense object detection//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 2999-3007[DOI: 10.1109/ICCV.2017.324http://dx.doi.org/10.1109/ICCV.2017.324]

Liu L, Ouyang W L, Wang X G, Fieguth P, Chen J, Liu X W and Pietikäinen M. 2019a. Deep learning for generic object detection: a survey[EB/OL].[2019-06-20].https://arxiv.org/pdf/1809.02165.pdfhttps://arxiv.org/pdf/1809.02165.pdf

Liu P J, Zhang H Z, Zhang K, Lin L and Zuo W M. 2018a. Multi-level wavelet-CNN for image restoration//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Salt Lake City, UT, USA: IEEE: 773-782[DOI: 10.1109/CVPRW.2018.00121http://dx.doi.org/10.1109/CVPRW.2018.00121]

Liu S T, Huang D and Wang Y H. 2018b. Receptive field block net for accurate and fast object detection[EB/OL].[2019-06-20].https://arxiv.org/pdf/1711.07767.pdfhttps://arxiv.org/pdf/1711.07767.pdf

Liu T, Zhao Y, Wei Y C, Zhao Y F and Wei S K. 2019b. Concealed object detection for activate millimeter wave image. IEEE Transactions on Industrial Electronics, 66(12):9909-9917[DOI:10.1109/TIE.2019.2893843]

Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C Y and Berg A C. 2016. SSD: single shot MultiBox detector//Proceedings of the 14th European Conference on Computer Vision. Amsterdam: Springer: 21-37[DOI: 10.1007/978-3-319-46448-0_2http://dx.doi.org/10.1007/978-3-319-46448-0_2]

Ma N N, Zhang X Y, Zheng H T and Sun J. 2018. ShuffleNet V2: practical guidelines for efficient CNN architecture design[EB/OL]. (2018-07-30)[2019-06-20].https://arxiv.org/pdf/1807.11164.pdfhttps://arxiv.org/pdf/1807.11164.pdf

Nielsen FÅ. 2018. Linking ImageNet WordNet synsets with wikidata[EB/OL]. (2018-03-05)[2019-06-20].https://arxiv.org/pdf/1803.04349.pdfhttps://arxiv.org/pdf/1803.04349.pdf

Ning X F, Zhu W and Chen S F. 2017. Recognition, object detection and segmentation of white background photos based on deep learning//Proceedings of the 32nd Youth Academic Annual Conference of Chinese Association of Automation. Hefei, China: IEEE: 182-187[DOI: 10.1109/YAC.2017.7967401http://dx.doi.org/10.1109/YAC.2017.7967401]

Pham H, Guan M Y, Zoph B, Le Q V and Dean J. 2018. Efficient neuralarchitecture search via parameter sharing[EB/OL].[2019-06-20].http://arxiv.org/pdf/1802.03268.pdfhttp://arxiv.org/pdf/1802.03268.pdf

Qi C, Ouyang W L, Li H S, Wang X G, Liu B and Yu N H. 2017. Online multi-object tracking using cnn-based single object tracker with spatial-temporal attention mechanism//Proceedings of 2017 IEEE International Conference on Computer Vision: 4836-4845

Rahman S, Khan S and Porikli F. 2018. Zero-shot object detection: learning to simultaneously recognize and localize novel concepts//Proceedings of the 14th Asian Conference on Computer Vision. Perth: Springer: 547-563[DOI: 10.1007/978-3-030-20887-5_34http://dx.doi.org/10.1007/978-3-030-20887-5_34]

Real E, Aggarwal A, Huang Y P and Le Q V. 2019. Regularized evolution for image classifier architecture search[EB/OL].[2019-06-20].http://arxiv.org/pdf/1802.01548.pdfhttp://arxiv.org/pdf/1802.01548.pdf

Redmon J, Divvala S, Girshick R and Farhadi A. 2016. You only look once: unified, real-time object detection//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE: 779-788[DOI: 10.1109/CVPR.2016.91http://dx.doi.org/10.1109/CVPR.2016.91]

Redmon J and Farhadi A. 2017. YOLO9000: better, faster, stronger//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE: 6517-6525[DOI: 10.1109/CVPR.2017.690http://dx.doi.org/10.1109/CVPR.2017.690]

Redmon J and Farhadi A. 2018. YOLOv3: an incremental improvement[EB/OL].[2019-06-20].https://arxiv.org/pdf/1804.02767.pdfhttps://arxiv.org/pdf/1804.02767.pdf

Ren M Y, Triantafillou E, Ravi S, Snell J, Swersky K, Tenenbaum J B, Larochelle H and Zemel R S. 2018. Meta-learning for semi-supervised few-shot classification[EB/OL].[2019-06-20].https://arxiv.org/pdf/1803.00676.pdfhttps://arxiv.org/pdf/1803.00676.pdf

Rosario V M D, Borin E and Breternitz Jr, M. 2019. The multi-lane capsule network (MLCN)[EB/OL].[2019-06-22].https://arxiv.org/pdf/1902.08431.pdfhttps://arxiv.org/pdf/1902.08431.pdf

Sandler M, Howard A, Zhu M L, Zhmoginov A and Chen L C. 2019. Inverted residuals and linear bottlenecks: mobile networks for classification, detection and segmentation[EB/OL].[2019-06-20].https://arxiv.org/pdf/1801.04381v1.pdfhttps://arxiv.org/pdf/1801.04381v1.pdf

Seferbekov S, Iglovikov V, Buslaev A and Shvets A. 2018. Feature pyramid network for multi-class land segmentation//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Salt Lake City, UT, USA: IEEE: 272-2723[DOI: 10.1109/CVPRW.2018.00051http://dx.doi.org/10.1109/CVPRW.2018.00051]

Shan Y H, Lu W F and Chew C M. 2019. Pixel and feature level based domain adaptation for object detection in autonomous driving[EB/OL].[2019-06-20].https://arxiv.org/pdf/1810.00345.pdfhttps://arxiv.org/pdf/1810.00345.pdf

Shelhamer E, Long J and Darrell T. 2017. Fully convolutional networks for semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(4):640-651[DOI:10.1109/TPAMI.2016.2572683]

Simonyan K and Zisserman A. 2014. Very deep convolutional networks for large-scale image recognition[EB/OL].[2019-06-20].https://arxiv.org/pdf/1409.1556.pdfhttps://arxiv.org/pdf/1409.1556.pdf

Singh B, Li H D, Sharma A and Davis L S. 2018. R-FCN-3000 at 30fps: decoupling detection and classification//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA: IEEE: 1081-1090[DOI: 10.1109/CVPR.2018.00119http://dx.doi.org/10.1109/CVPR.2018.00119]

Sun K, Xiao B, Liu D and Wang J D. 2019a. Deep high-resolution representation learning for human pose estimation[EB/OL]. (2019-02-25)[2019-06-20].https://arxiv.org/pdf/1902.09212.pdfhttps://arxiv.org/pdf/1902.09212.pdf

Sun Y F, Xu Q, Li Y, Zhang C, Li Y K, Wang S J and Sun J. 2019b. Perceive where to focus: learning visibility-aware part-level features for partial person re-identification[EB/OL]. (2019-04-01)[2019-06-20].http://arxiv.org/pdf/1904.00537.pdfhttp://arxiv.org/pdf/1904.00537.pdf

Szegedy C, Ioffe S, Vanhoucke V and Alemi A. 2016. Inception-v4, inception-ResNet and the impact of residual connections on learning[EB/OL].[2019-06-20].https://arxiv.org/pdf/1602.07261.pdfhttps://arxiv.org/pdf/1602.07261.pdf

Tan M X, Chen B, Pang R M, Vasudevan V, Sandler M, Howard A, and Le Q V. 2019. MnasNet: platform-aware neural architecture search for mobile[EB/OL].[2019-06-20].https://arxiv.org/pdf/1807.11626.pdfhttps://arxiv.org/pdf/1807.11626.pdf

Torralba A, Fergus R and Freeman W T. 2008.80 million tiny images:a large data set for nonparametric object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(11):1958-1970[DOI:10.1109/TPAMI.2008.128]

Uijlings J R R, van de Sande K E A, Gevers T and Smeulders A W M. 2013. Selective search for object recognition. International Journal of Computer Vision, 104(2):154-171[DOI:10.1007/s11263-013-0620-5]

Verma S and Zhang Z L. 2018. Graph capsule convolutional neural networks[EB/OL].[2019-06-20].https://arxiv.org/pdf/1805.08090.pdfhttps://arxiv.org/pdf/1805.08090.pdf

Wang R J, Li X and Ling C X. 2019a. Pelee: a real-time object detection system on mobile devices[EB/OL].[2019-06-20].https://arxiv.org/pdf/1804.06882.pdfhttps://arxiv.org/pdf/1804.06882.pdf

Wang X D, Cai Z W, Gao D S and Vasconcelos N. 2019b. Towards universal object detection by domain attention[EB/OL].[2019-06-20].https://arxiv.org/abs/1904.04402.pdfhttps://arxiv.org/abs/1904.04402.pdf

Wang X Y, Han T X and Yan S C. 2009. An HOG-LBP human detector with partial occlusion handling//Proceedings of the 12th IEEEInternational Conference on Computer Vision. Kyoto, Japan: IEEE: 32-39[DOI: 10.1109/ICCV.2009.5459207http://dx.doi.org/10.1109/ICCV.2009.5459207]

Wei Y, Pan X Y, Qin H W, Ouyang W L and Yan J J. 2018. Quantization mimic: towards very tiny CNN for object detection[EB/OL].[2019-06-20].http://openaccess.thecvf.com/content_ECCV_2018/papers/Yi_Wei_Quantization_Mimic_Towards_ECCV_2018_paper.pdfhttp://openaccess.thecvf.com/content_ECCV_2018/papers/Yi_Wei_Quantization_Mimic_Towards_ECCV_2018_paper.pdf

Williams T and Li R. 2018. An ensemble of convolutional neural networks using wavelets for image classification. Journal of Software Engineering and Applications, 11(2):69-88[DOI:10.4236/jsea.2018.112004]

Wu Z, Bodla N, Singh B, Najibi M, Chellappa R and Davis L S. 2018. Soft sampling for robust object detection[EB/OL].[2019-06-20].https://arxiv.org/pdf/1806.06986.pdfhttps://arxiv.org/pdf/1806.06986.pdf

Xiao J X, Ehinger K A, Hays J, Torralba A and Oliva A. 2016. SUN database:exploring a large collection of scene categories. International Journal of Computer Vision, 119(1):3-22[DOI:10.1007/s11263-014-0748-y]

Yan Y C, Zhang Q, Ni B B, Zhang W D, Xu M H and Yang X K. 2019. Learning context graph for person search[EB/OL]. (2019-04-03)[2019-06-20].http://arxiv.org/pdf/1904.01830.pdfhttp://arxiv.org/pdf/1904.01830.pdf

Yang, T J, Howard A, Chen B, Zhang X, Go A, Sandler M, Sze V and Adam H. 2018. NetAdapt: platform-aware neural network adaptation for mobile applications[EB/OL].[2019-06-20].https://arxiv.org/pdf/1804.03230.pdfhttps://arxiv.org/pdf/1804.03230.pdf

Zagoruyko S, Lerer A, Lin T Y, Pinheiro P O, Gross S, Chintala S and Dollár P. 2016. A MultiPath network for object detection[EB/OL].[2019-06-20].https://arxiv.org/pdf/1604.02135.pdfhttps://arxiv.org/pdf/1604.02135.pdf

Zhang K, Zuo W M and Zhang L. 2019a. Deep plug-and-play super-resolution for arbitrary blur kernels[EB/OL]. (2019-03-29)[2019-06-20].http://arxiv.org/pdf/1903.12529.pdfhttp://arxiv.org/pdf/1903.12529.pdf

Zhang X Y, Zhou X Y, Lin M X and Sun J. 2018a. ShuffleNet: an extremely efficient convolutional neural network for mobile devices//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA: IEEE: 6848-6856[DOI: 10.1109/CVPR.2018.00716http://dx.doi.org/10.1109/CVPR.2018.00716]

Zhang Z, He T, Zhang H, Zhang Z Y, Xie J Y, Li M and Services A W. 2019b. Bag of freebies for training object detection neural networks[EB/OL].[2019-06-20].https://arxiv.org/pdf/1902.04103.pdfhttps://arxiv.org/pdf/1902.04103.pdf

Zhang Z S, Qiao S Y, Xie C, Shen W, Wang B and Yuille A L. 2018b. Single-shot object detection with enriched semantics//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA: IEEE: 5813-582[DOI: 10.1109/CVPR.2018.00609http://dx.doi.org/10.1109/CVPR.2018.00609]

Zhang S F, Wen L Y, Bian X, Lei Z and Li S Z. 2018c. Single-shot refinement neural network for object detection//Proceedings of 2008 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).[s.l.]: IEEE: 4203-4212

Zhao Q J, Sheng T, Wang Y T, Tang Z, Chen Y, Cai L and Ling H B. 2019. M2det: a single-shot object detector based on multi-level feature pyramid network[EB/OL].[2019-06-20].https://arxiv.org/pdf/1811.04533.pdfhttps://arxiv.org/pdf/1811.04533.pdf

Zhou B L, Lapedriza A, Khosla A, Oliva A and Torralba A. 2018a.Places:a 10 million image database for scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(6):1452-1464[DOI:10.1109/TPAMI.2017.2723009]

Zhou P, Ni B B, Geng C, Hu J G and Xu Y. 2018b. Scale-transferrable object detection//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA: IEEE: 528-537[DOI: 10.1109/CVPR.2018.00062http://dx.doi.org/10.1109/CVPR.2018.00062]

Zhou X Y, Gong W, Fu W L and Du F T. 2017. Application of deep learning in object detection//Proceedings of 2017 IEEE/ACIS 16th International Conference on Computer and Information Science. Wuhan, China: IEEE: 631-634[DOI: 10.1109/ICIS.2017.7960069http://dx.doi.org/10.1109/ICIS.2017.7960069]

Zoph B and Le Q V. 2019. Neural architecture search with reinforcement learning[EB/OL].[2019-06-20].https://arxiv.org/pdf/1611.01578.pdfhttps://arxiv.org/pdf/1611.01578.pdf

文章被引用时，请邮件提醒。

提交

融合帧间时序关系的标准胎儿四腔心超声切面自动获取