目标检测中的尺度变换应用综述
Scale changing in general object detection: a survey
- 2020年25卷第9期 页码:1754-1772
收稿:2019-12-13,
修回:2020-3-2,
录用:2020-3-9,
纸质出版:2020-09-16
DOI: 10.11834/jig.190624
移动端阅览

浏览全部资源
扫码关注微信
收稿:2019-12-13,
修回:2020-3-2,
录用:2020-3-9,
纸质出版:2020-09-16
移动端阅览
目标检测试图用给定的标签标记自然图像中出现的对象实例,已经广泛用于自动驾驶、监控安防等领域。随着深度学习技术的普及,基于卷积神经网络的通用目标检测框架获得了远好于其他方法的目标检测结果。然而,由于卷积神经网络的特性限制,通用目标检测依然面临尺度、光照和遮挡等许多问题的挑战。本文的目的是对卷积神经网络架构中针对尺度的目标检测策略进行全面综述。首先,介绍通用目标检测的发展概况及使用的主要数据集,包括通用目标检测框架的两种类别及发展,详述基于候选区域的两阶段目标检测算法的沿革和结构层面的创新,以及基于一次回归的目标检测算法的3个不同的流派。其次,对针对检测问题中影响效果的尺度问题的优化思路进行简单分类,包括多特征融合策略、针对感受野的卷积变形和训练策略的设计等。最后,给出了各个不同检测框架在通用数据集上对不同尺寸目标的检测准确度,以及未来可能的针对尺度变换的发展方向。
General object detection has been one of most important research topics in the field of computer vision. This task attempts to locate and mark an object instance that appears in a natural image using a series of given labels. The technique has been widely used in actual application scenarios
such as automatic driving and security monitoring. With the development and popularization of deep learning technology
the acquisition of the semantic information of images has become easier; thus
the general object detection framework based on convolutional neural networks (CNNs) has obtained better results compared with other target detection methods. Given that the large-scale dataset of the task is relatively better than datasets designed for other vision tasks and the metrics are well defined
this task rapidly evolves in CNN-based computer vision tasks. However
general object detection tasks still face many problems
such as scale and illumination changes and occlusions
due to the limitations of the CNN structure. Given that the features extracted by CNNs are sensitive to the scale
multiscale detection is often valuable but challenging in the field of CNN-based target detection. Research on scale transformation also has reference value for other scales in small target- or pixel-level tasks
such as the semantic segmentation and pose detection of images. This study mainly aims to provide a comprehensive overview of object detection strategies for scales in CNN architectures
that is
how to locate and classify different sizes of targets robustly. First
we introduce the development of general target detection problems and the main datasets used. Then
we introduce two categories of the general object detection framework. One of the categories
i.e.
two-stage strategies
first obtains the region proposals and then selects the proposals by points of classification confidence; it mostly takes region-based convolutional neural networks (RCNN) as the baseline. With the development of the RCNN structure
all the links are transformed into specific convolution layers
thus forming an end-to-end structure. In addition
several tricks are designed for the baseline to solve specific problems
thus improving the robustness of the baseline for all kinds of object regions. The other category
i.e.
one-stage strategies
obtains the region location and category by regressing once; it starts with a structure named "you only look once" which regresses the information of the object for every block divided. Then
the baseline becomes convolutional and end to end and uses deep and effective features. This baseline has also become popular since focal loss has been proposed because it solves the problem in which regression may cause an unbalance of positive and negative samples. Besides
some other methods
which detect objects via point location and learn from pose estimation tasks
also obtain satisfactory results in general target detection. We then introduce a simple classification of the optimization ideas for scale problems; these ideas include multi-feature fusion strategies
convolution deformations for receptive fields
and training strategy designs. Multi-feature fusion strategies are used to detect the classes of objects that are not always performed in a small scale. Multi-feature fusion can obtain semantic information from different image scales and fuse them to attain the most suitable scale. It can also effectively identify the different sizes of one-class objects. Widely used structures can be divided as follows: those that use single-shot detection and those with feature pyramid networks. Some structures have a jump layer fusion design. In a receptive field
every feature corresponds with an image or lower-level feature. The specific design can solve a target that always appears small in the image. The general receptive field of a convolution is the same as the size of the kernel; another special convolution kernel is designed. Dilated kernels are the most deformed kernels
which are used with the designed pooling layer to obtain a dense high-level feature. Some scholars have designed an offset layer to attain the most useful deformation information automatically for the convolution kernel. A training strategy can also be designed for small targets. A dataset that only includes small objects can be designed
and different sizes of the image can be trained in the structure in an orderly manner. Resampling images is also a common strategy. We provide the detection accuracy results for different sizes of targets on common datasets for different detection frameworks. Results are obtained from the Microsoft common objects in context (MS COCO) dataset. We use average precision (AP) to measure the result of the detection
and the result set includes results for small
medium
and large targets and those for different intersection-over-union thresholds. It shows the influence of the changes for scale. This study provides a set of possible future development directions for scale transformation. It also includes strategies on how to obtain robust features and detection modules and how to design a training dataset.
Alexe B, Deselaers T and Ferrari V. 2012. Measuring the objectness of image windows. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(11):2189-2202[DOI:10.1109/TPAMI.2012.28]
Arbeláez P, Pont-Tuset J, Barron J, Marques F and Malik J. 2014. Multiscale combinatorial grouping//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus: IEEE: 328-335[ DOI:10.1109/CVPR.2014.49 http://dx.doi.org/10.1109/CVPR.2014.49 ]
Bell S, Zitnick C L, Bala K and Girshick R. 2016. Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks//Proceedings of2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE: 2874-2883[ DOI:10.1109/CVPR.2016.314 http://dx.doi.org/10.1109/CVPR.2016.314 ]
Bodla N, Singh B, Chellappa R and Davis L S. 2017. Soft-NMS-improving object detection with one line of code//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice: IEEE: 5562-5570[ DOI:10.1109/iccv.2017.593 http://dx.doi.org/10.1109/iccv.2017.593 ]
Cai Z W, Fan Q F, Feris R S and Vasconcelos N. 2016. A unified multi-scale deep convolutional neural network for fast object detection//Proceedings of the 14th European Conference on Computer Vision. Amsterdam: Springer: 354-370[ DOI:10.1007/978-3-319-46493-0_22 http://dx.doi.org/10.1007/978-3-319-46493-0_22 ]
Cai Z W and Vasconcelos N. 2018. Cascade R-CNN: delving into high quality object detection//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE: 6154-6162[ DOI:10.1109/cvpr.2018.00644 http://dx.doi.org/10.1109/cvpr.2018.00644 ]
Carreira J and Sminchisescu C. 2012. C CPMC:automatic object segmentation using constrained parametric Min-Cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(7):1312-1328[DOI:10.1109/TPAMI.2011.231]
Chen C Y, Liu M Y, Tuzel O and Xiao J X. 2017a. R-CNN for small object detection//Proceedings of the 13th Asian Conference on Computer Vision. Taipei, China: Springer: 214-230[ DOI:10.1007/978-3-319-54193-8_14 http://dx.doi.org/10.1007/978-3-319-54193-8_14 ]
Chen L C, Papandreou G, Kokkinos I, Murphy K and Yuille A L. 2018. DeepLab:semantic image segmentation with deep convolutional nets, Atrous convolution, and fully connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4):834-848[DOI:10.1109/TPAMI.2017.2699184]
Chen L C, Papandreou G, Schroff F and Adam H. 2017b. Rethinking atrous convolution for semantic image segmentation[EB/OL].[2019-12-01] . https://arxiv.org/pdf/1706.05587.pdf https://arxiv.org/pdf/1706.05587.pdf
Chollet F. 2017. Xception: deep learning with depthwise separable convolutions//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE: 1800-1807[ DOI:10.1109/CVPR.2017.195 http://dx.doi.org/10.1109/CVPR.2017.195 ]
Cui L S, Ma R, Lv P, Jiang X H, Gao Z M, Zhou B and Xu M L. 2018. MDSSD: multi-scale deconvolutional single shot detector for small objects[EB/OL].[2019-12-01] . https://arxiv.org/pdf/1805.07009.pdf https://arxiv.org/pdf/1805.07009.pdf
Dai J F, Li Y, He K M and Sun J. 2016. R-FCN: object detection via region-based fully convolutional networks//Proceedings of the 30th Conference on Neural Information Processing Systems. Barcelona: NIPS: 379-387
Dai J F, Qi H Z, Xiong Y W, Li Y, Zhang G D, Hu H and Wei Y C. 2017. Deformable convolutional networks//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice: IEEE: 764-773[ DOI:10.1109/iccv.2017.89 http://dx.doi.org/10.1109/iccv.2017.89 ]
Dalal N and Triggs B. 2005. Histograms of oriented gradients for human detection//Proceedings of 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Diego: IEEE: 886-893[ DOI:10.1109/CVPR.2005.177 http://dx.doi.org/10.1109/CVPR.2005.177 ]
Dean T, Ruzon M A, Segal M, Shlens J, Vijayanarasimhan S and Yagnik J. 2013. Fast, accurate detection of 100 000 object classes on a single machine//Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland: IEEE: 1814-1821[ DOI:10.1109/CVPR.2013.237 http://dx.doi.org/10.1109/CVPR.2013.237 ]
Deng J K, Guo J, Xue N N and Zafeiriou S. 2018. Arcface: additive angular margin loss for deep face recognition[EB/OL].[2019-12-01] . https://arxiv.org/pdf/1801.07698.pdf https://arxiv.org/pdf/1801.07698.pdf
Dollar P, Wojek C, Schiele B and Perona P. 2012. Pedestrian detection:an evaluation of the state of theart. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(4):743-761[DOI:10.1109/TPAMI.2011.155]
Du C B, Gao S S, Liu Y and Gao B B. 2019. Multi-focus image fusion using deep support value convolutional neural network. Optik, 176:567-578[DOI:10.1016/j.ijleo.2018.09.089]
Duan K W, Bai S, Xie L X, Qi H G, Huang Q M and Tian Q. 2019. CenterNet: keypoint triplets for object detection[EB/OL].[2019-12-01] . https://arxiv.org/pdf/1904.08189.pdf https://arxiv.org/pdf/1904.08189.pdf
Everingham M, van Gool L, Williams C K I, Winn J and Zisserman A. 2010. The Pascal Visual Object Classes (VOC) challenge. International Journal of Computer Vision, 88(2):303-338[DOI:10.1007/s11263-009-0275-4]
Felzenszwalb P, Mcallester D and Ramanan D. 2008. A discriminatively trained, multiscale, deformable part model//Proceedings of 2008 IEEE Conference on Computer Vision and Pattern Recognition. Anchorage: IEEE: 1-8[ DOI:10.1109/CVPR.2008.4587597 http://dx.doi.org/10.1109/CVPR.2008.4587597 ]
Fu C Y, Liu W, Ranga A, Tyagi A and Berg A C. 2017. DSSD: deconvolutional single shot detector[EB/OL].[2019-12-01] . https://arxiv.org/pdf/1701.06659.pdf https://arxiv.org/pdf/1701.06659.pdf
Geiger A, Lenz P and Urtasun R. 2012. Are we ready for autonomous driving? The KITTI vision benchmark suite//Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence: IEEE: 3354-3361[ DOI:10.1109/CVPR.2012.6248074 http://dx.doi.org/10.1109/CVPR.2012.6248074 ]
Gidaris S and Komodakis N. 2015. Object detection via a multi-region and semantic segmentation-aware CNN model//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago: IEEE: 1134-1142[ DOI:10.1109/ICCV.2015.135 http://dx.doi.org/10.1109/ICCV.2015.135 ]
Girshick R. 2015. Fast R-CNN//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago: IEEE: 1440-1448[ DOI:10.1109/ICCV.2015.169 http://dx.doi.org/10.1109/ICCV.2015.169 ]
Girshick R, Donahue J, Darrell T and Malik J. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus: IEEE: 580-587[ DOI:10.1109/CVPR.2014.81 http://dx.doi.org/10.1109/CVPR.2014.81 ]
He K M, Gkioxari G, Dollár P and Girshick R. 2017. Mask R-CNN//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice: IEEE: 2980-2988[ DOI:10.1109/iccv.2017.322 http://dx.doi.org/10.1109/iccv.2017.322 ]
He K M, Zhang X Y, Ren S Q and Sun J. 2015. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(9):1904-1916[DOI:10.1109/TPAMI.2015.2389824]
He K M, Zhang X Y, Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE: 770-778[ DOI:10.1109/CVPR.2016.90 http://dx.doi.org/10.1109/CVPR.2016.90 ]
Howard A G, Zhu M L, Chen B, Kalenichenko D, Wang W J, Weyand T, Andreetto M and Adam H. 2017. Mobilenets: efficient convolutional neural networks for mobile vision applications[EB/OL].[2019-12-01] . https://arxiv.org/pdf/1704.04861.pdf https://arxiv.org/pdf/1704.04861.pdf
Hu J, Shen L and Sun G. 2018. Squeeze-and-excitation networks//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE: 7132-7141[ DOI:10.1109/cvpr.2018.00745 http://dx.doi.org/10.1109/cvpr.2018.00745 ]
Huang G, Liu Z, van der Maaten L and Weinberger K Q. 2017. Densely connected convolutional networks//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE: 2261-2269[ DOI:10.1109/CVPR.2017.243 http://dx.doi.org/10.1109/CVPR.2017.243 ]
Iandola F N, Han S, Moskewicz M W, Ashraf K, Dally W J and Keutzer K. 2017. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and < 0.5MB model size[EB/OL].[2019-12-01] . https://arxiv.org/pdf/1602.07360.pdf https://arxiv.org/pdf/1602.07360.pdf
Jaderberg M, Simonyan K, Zisserman A and Kavukcuoglu K. 2015. Spatial transformer networks//Proceedings of the 28th International Conference on Neural Information Processing Systems. Cambridge: MIT Press: 2017-2025
Jeong J, Park H and Kwak N. 2017. Enhancement of SSD by concatenating feature maps for object detection//Proceedings of 2007 British Machine Vision Conference. London: BMVA Press: #22514709[ DOI:10.5244/c.31.76 http://dx.doi.org/10.5244/c.31.76 ]
Jia X, Gavves E, Fernando B and Tuytelaars T. 2015. Guiding long-short term memory for image caption generation[EB/OL].[2019-12-01] . https://arxiv.org/pdf/1509.04942.pdf https://arxiv.org/pdf/1509.04942.pdf
Kong T, Yao A B, Chen Y R and Sun F C. 2016. HyperNet: towards accurate region proposal generation and joint object detection//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE: 845-853[ DOI:10.1109/CVPR.2016.98 http://dx.doi.org/10.1109/CVPR.2016.98 ]
Kong T, Sun F C, Liu H P, Jiang Y N and Shi J B. 2019. FoveaBox: beyond anchor-based object detector[EB/OL].[2019-12-01] . https://arxiv.org/pdf/1904.03797.pdf https://arxiv.org/pdf/1904.03797.pdf
Law H and Deng J. 2018. CornerNet: detecting objects as paired keypoints//Proceedings of the 15th European Conference on Computer Vision. Munich: Springer: 765-781[ DOI:10.1007/978-3-030-01264-9_45 http://dx.doi.org/10.1007/978-3-030-01264-9_45 ]
Law H, Teng Y. Russakovsky O and Deng J. 2019. CornerNet-Lite: efficient keypoint based object detection[EB/OL].[2019-12-01] . https://arxiv.org/pdf/1904.08900.pdf https://arxiv.org/pdf/1904.08900.pdf
Le M H, Woo B S and Jo K H. 2011. A Comparison of SIFT and Harris Conner features for correspondence points matching//Proceedings of the 17th Korea-Japan Joint Workshop on Frontiers of Computer Vision. Ulsan: IEEE: 1-4[ DOI:10.1109/FCV.2011.5739748 http://dx.doi.org/10.1109/FCV.2011.5739748 ]
Li Y H, Chen Y T, Wang N Y and Zhang Z X. 2019. Scale-aware trident networks for object detection//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, IEEE: 6053-6062[ DOI:10.1109/ICCV.2019.00615 http://dx.doi.org/10.1109/ICCV.2019.00615 ]
Li Z X and Zhou F Q. 2017. FSSD: feature fusion single shot multibox detector[EB/OL].[2019-12-01] . https://arxiv.org/pdf/1712.00960.pdf https://arxiv.org/pdf/1712.00960.pdf
Liau H, Nimmagadda Y and Wong Y. 2018. Fire SSD: wide fire modules based single shot detector on edge device[EB/OL] .[2019-12-01]. https://arxiv.org/pdf/1806.05363.pdf https://arxiv.org/pdf/1806.05363.pdf
Lin T Y, Dollár P, Girshick R, He K M, Hariharan B and Belongie S. 2017a. Feature pyramid networks for object detection//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE: 936-944[ DOI:10.1109/CVPR.2017.106 http://dx.doi.org/10.1109/CVPR.2017.106 ]
Lin T Y, Goyal P, Girshick R, He K M and Dollár P. 2017b. Focal loss for dense object detection//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice: IEEE: 2999-3007[ DOI:10.1109/iccv.2017.324 http://dx.doi.org/10.1109/iccv.2017.324 ]
Lin T Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P and Zitnick C L. 2014. Microsoft COCO: common objects in context//Proceedings of the 13th European Conference on Computer Vision. Zürich: Springer: 740-755[ DOI:10.1007/978-3-319-10602-1_48 http://dx.doi.org/10.1007/978-3-319-10602-1_48 ]
Liu S T, Huang D and Wang Y H. 2018. Receptive field block net for accurate and fast object detection//Proceedings of the 15th European Conference on Computer Vision. Munich: Springer: 404-419[ DOI:10.1007/978-3-030-01252-6_24 http://dx.doi.org/10.1007/978-3-030-01252-6_24 ]
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C Y and Berg A C. 2016. SSD: single shot MultiBox detector//Proceedings of the 14th European Conference on Computer Vision. Amsterdam: Springer: 21-37[ DOI:10.1007/978-3-319-46448-0_2 http://dx.doi.org/10.1007/978-3-319-46448-0_2 ]
Long J, Shelhamer E and Darrell T. 2015. Fully convolutional networks for semantic segmentation//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE: 3431-3440[ DOI:10.1109/CVPR.2015.7298965 http://dx.doi.org/10.1109/CVPR.2015.7298965 ]
Newell A, Yang K Y and Deng J. 2016. Stacked hourglass networks for human pose estimation//Proceedings of the 14th European Conference on Computer Vision. Amsterdam: Springer: 483-499[ DOI:10.1007/978-3-319-46484-8_29 http://dx.doi.org/10.1007/978-3-319-46484-8_29 ]
Papandreou G, Kokkinos I and Savalle P A. 2015. Modeling local and global deformations in deep learning: epitomic convolution, multiple instance learning, and sliding window detection//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE: 390-399[ DOI:10.1109/CVPR.2015.7298636 http://dx.doi.org/10.1109/CVPR.2015.7298636 ]
Redmon J, Divvala S, Girshick R and Farhadi A. 2016. You only look once: unified, real-time object detection//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE: 779-788[ DOI:10.1109/CVPR.2016.91 http://dx.doi.org/10.1109/CVPR.2016.91 ]
Redmon J and Farhadi A. 2017. YOLO9000: better, faster, stronger//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE: 6517-6525[ DOI:10.1109/CVPR.2017.690 http://dx.doi.org/10.1109/CVPR.2017.690 ]
Redmon J and Farhadi A. 2018. YOLOv3: an incremental improvement[EB/OL].[2019-12-01] . https://arxiv.org/pdf/1804.02767.pdf https://arxiv.org/pdf/1804.02767.pdf
Ren S Q, He K M, Girshick R and Sun J. 2017. Faster R-CNN:towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(6):1137-1149[DOI:10.1109/TPAMI.2016.2577031]
Ren X F and Ramanan D. 2013. Histograms of sparse codes for object detection//Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland: IEEE: 3246-3253[ DOI:10.1109/CVPR.2013.417 http://dx.doi.org/10.1109/CVPR.2013.417 ]
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S A, Huang Z H, Karpathy A, Khosla A, Bernstein M, Berg A C and Li F F. 2015. ImageNet large scale visual recognition challenge. International Journal of Computer Vision, 115(3):211-252[DOI:10.1007/s11263-015-0816-y]
Sandler M, Howard A, Zhu M L, Zhmoginov A and Chen L C. 2018. MobileNetV2: inverted residuals and linear bottlenecks//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE: 4510-4520[ DOI:10.1109/CVPR.2018.00474 http://dx.doi.org/10.1109/CVPR.2018.00474 ]
Schroff F, Kalenichenko D and Philbin J. 2015. FaceNet: a unified embedding for face recognition and clustering//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE: 815-823[ DOI:10.1109/CVPR.2015.7298682] http://dx.doi.org/10.1109/CVPR.2015.7298682] .
Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R and LeCun Y. 2014. OverFeat: integrated recognition, localization and detection using convolutional networks[EB/OL].[ 2019-12-01]. https://arxiv.org/pdf/1312.6229.pdf https://arxiv.org/pdf/1312.6229.pdf
Sermanet P, Kavukcuoglu K, Chintala S and Lecun Y. 2013. Pedestrian detection with unsupervised multi-stage feature learning//Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland: IEEE: 3626-3633[ DOI:10.1109/CVPR.2013.465 http://dx.doi.org/10.1109/CVPR.2013.465 ]
Shen Z Q, Liu Z, Li J G, Jiang Y G, Chen Y R and Xue X Y. 2017. DSOD: learning deeply supervised object detectors from scratch//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice: IEEE: 1937-1945[ DOI:10.1109/iccv.2017.212 http://dx.doi.org/10.1109/iccv.2017.212 ]
Shrivastava A, Gupta A and Girshick R. 2016. Training region-based object detectors with online hard example mining[EB/OL].[2019-12-01] . https://arxiv.org/pdf/1409.1556.pdf https://arxiv.org/pdf/1409.1556.pdf
Simonyan K and Zisserman A. 2015. Very deep convolutional networks for large-scale image recognition[EB/OL].[2019-12-01] . https://arxiv.org/pdf/1409.1556.pdf https://arxiv.org/pdf/1409.1556.pdf
Singh B and Davis L S. 2018. An analysis of scale invariance in object detection-SNIP//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE: 3578-3587[ DOI:10.1109/cvpr.2018.00377 http://dx.doi.org/10.1109/cvpr.2018.00377 ]
Singh B, Najibi M and Davis L S. 2018. SNIPER: efficient multi-scale training//Proceedings of the 32nd Conference on Neural Information Processing Systems. Montréal: NeurIPS: 9310-9320
Sun Y, Liang D, Wang X G and Tang X O. 2015. DeepID3: face recognition with very deep neural networks[EB/OL].[2019-12-01] . https://arxiv.org/pdf/1502.00873.pdf https://arxiv.org/pdf/1502.00873.pdf
Tang H, Xiao B, Li W S and Wang G Y. 2017. Pixel convolutional neural network for multi-focus image fusion. Information Sciences, 433-434:125-141[DOI:10.1016/j.ins.2017.12.043]
Tian Z, Shen C H, Chen H and He T. 2019. FCOS: fully convolutional one-stage object detection[EB/OL].[2019-12-01] . https://arxiv.org/pdf/1904.01355.pdf https://arxiv.org/pdf/1904.01355.pdf
Tychsen-Smith L, Petersson L. 2018. Improving object localization with fitness NMS and bounded IoU loss//Proceedings of IEEE computer vision and pattern recognition. Salt Lake City: IEEE: 6877-6885.[ DOI:10.1109/cvpr.2018.00719 http://dx.doi.org/10.1109/cvpr.2018.00719 ]
Uijlings J R R, van de Sande K E A, Gevers T and Smeulders A W M. 2013. Selective search for object recognition. International Journalof Computer Vision, 104(2):154-171[DOI:10.1007/s11263-013-0620-5]
van Etten A. 2018. You only look twice: rapid multi-scale object detection in satellite imagery[EB/OL].[2019-12-01] . https://arxiv.org/pdf/1805.09512.pdf https://arxiv.org/pdf/1805.09512.pdf
Xiao J X, Ehinger K A, Hays J, Torralba A and Oliva A. 2016. SUN database:exploring a large collection of scene categories. International Journal of Computer Vision, 119(1):3-22[DOI:10.1007/s11263-014-0748-y]
Xie S N, Girshick R, Dollár P, Tu Z W and He K M. 2017. Aggregated residual transformations for deep neural networks//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE: 5987-5995[ DOI:10.1109/CVPR.2017.634 http://dx.doi.org/10.1109/CVPR.2017.634 ]
Xu K, Ba J L, Kiros R, Cho K, Courville A, Salakhutdinov R, Zemel R S and Bengio Y. 2015. Show, attend and tell: neural image caption generation with visual attention//Proceedings of the 32nd International Conference on Machine Learning. Lille: JMLR: 2048-2057.
Yang B, Yan J J, Lei Z and Li S Z. 2016. CRAFT objects from images//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE: 6043-6051[ DOI:10.1109/CVPR.2016.650 http://dx.doi.org/10.1109/CVPR.2016.650 ]
Yu J H, Jiang Y N, Wang Z Y, Cao Z M and Huang T. 2016. UnitBox: an advanced object detection network//Proceedings of the 24th ACM International Conference on Multimedia. Amsterdam: ACM: 516-520[ DOI:10.1145/2964284.2967274 http://dx.doi.org/10.1145/2964284.2967274 ]
Zhang S, Gong Y H and Wang J J. 2019. The development of deep convolution neural network and its applications on computer vision. Chinese Journal of Computers 42(3):453-482
张顺, 龚怡宏, 王进军. 2019.深度卷积神经网络的发展及其在计算机视觉领域的应用.计算机学报, 42(3):453-482[DOI:10.11897/SP.J.1016.2019.00453]
Zhang S F, Wen L Y, Bian X, Lei Z and Li S Z. 2018. Single-shot refinement neural network for object detection//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE: 4203-4212[ DOI:10.1109/cvpr.2018.00442 http://dx.doi.org/10.1109/cvpr.2018.00442 ]
Zhang S S, Benenson R and Schiele B. 2017. CityPersons: a diverse dataset for pedestrian detection//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE: 4457-4465[ DOI:10.1109/CVPR.2017.474 http://dx.doi.org/10.1109/CVPR.2017.474 ]
Zhang Y, Li B H, Lu H C, Irie A and Ruan X. 2016. Sample-specific SVM learning for person re-identification//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE: 1278-1287[ DOI:10.1109/CVPR.2016.143 http://dx.doi.org/10.1109/CVPR.2016.143 ]
Zhang Y T, Sohn K, Villegas R, Pan G and Lee H. 2015. Improving object detection with deep convolutional networks via Bayesian optimization and structured prediction//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE: 249-258[ DOI:10.1109/CVPR.2015.7298621 http://dx.doi.org/10.1109/CVPR.2015.7298621 ]
Zheng L W, Fu C M and Zhao Y. 2018. Extend the shallow part of single shot multibox detector via convolutional neural network//Proceedings of SPIE 10806, 10th International Conference on Digital Image Processing. Shanghai: SPIE: #1080613[ DOI:10.1117/12.2503001 http://dx.doi.org/10.1117/12.2503001 ]
Zhong Z Y, Sun L and Huo Q. 2019.An anchor-free region proposal network for Faster R-CNN-based text detection approaches. International Journal on Document Analysis and Recognition, 22(3):315-327[DOI:10.1007/s10032-019-00335-y]
Zhou X Y, Wang D Q and Krähenbuhl P. 2019. Objects as points[EB/OL].[2019-12-01] . https://arxiv.org/pdf/1904.07850.pdf https://arxiv.org/pdf/1904.07850.pdf
Zhu C C, He Y H and Savvides M. 2019. Feature selective anchor-free module for single-shot object detection//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE: 840-849[ DOI:10.1109/CVPR.2019.00093 http://dx.doi.org/10.1109/CVPR.2019.00093 ]
Zhu P F, Wen L Y, Bian X, Ling H B and Hu Q H. 2018. Vision meets drones: a challenge[EB/OL].[2019-12-01] . https://arxiv.org/pdf/1804.07437.pdf https://arxiv.org/pdf/1804.07437.pdf
Zhu Y S, Zhao C Y, Wang J Q, Zhao X, Wu Y and Lu H Q. 2017. CoupleNet: coupling global structure with local parts for object detection//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice: IEEE: 4146-4154[ DOI:10.1109/iccv.2017.444 http://dx.doi.org/10.1109/iccv.2017.444 ]
Zitnick C L and Dollár P. 2014. Edge boxes: locating object proposals from edges//Proceedings of the 13th European Conference on Computer Vision. Zurich: Springer: 391-405[ DOI:10.1007/978-3-319-10602-1_26 http://dx.doi.org/10.1007/978-3-319-10602-1_26 ]
相关作者
相关机构
京公网安备11010802024621