改进的非极大值抑制算法的目标检测

赵文清; 严海; 邵绪强

doi:10.11834/jig.180275

图像分析和识别 | 浏览量 : 0 下载量: 41 CSCD: 19

PDF
导出
分享
收藏
专辑

改进的非极大值抑制算法的目标检测
Object detection based on improved non-maximum suppression algorithm
2018年23卷第11期页码：1676-1685
收稿：2018-05-08，

修回：2018-6-25，

纸质出版：2018-11-16
DOI： 10.11834/jig.180275
稿件说明：

移动端阅览

赵文清, 严海, 邵绪强. 改进的非极大值抑制算法的目标检测[J]. 中国图象图形学报, 2018,23(11):1676-1685. DOI： 10.11834/jig.180275.

Wenqing Zhao, Hai Yan, Xuqiang Shao. Object detection based on improved non-maximum suppression algorithm[J]. Journal of Image and Graphics, 2018, 23(11): 1676-1685. DOI： 10.11834/jig.180275.

摘要

目的

作为目标检测的后置处理算法，非极大值抑制（NMS）算法被用于移除多余的检测框。然而，NMS算法在每轮迭代中抑制所有与预选取检测框Intersection-over-Union（IoU）值大于给定阈值的检测框，容易造成目标的漏检和误检。此外，阈值的选取对整个算法的效果有着至关重要的影响。针对这个问题，本文提出了改进的NMS算法，分别为分段比例惩罚因子NMS算法和连续比例惩罚因子NMS算法。在连续比例惩罚因子NMS算法中，阈值对算法的运行效果仅有轻微的影响。

方法

改进的NMS算法首先根据检测框与预选取检测框的IoU值大小计算出检测框对应的比例惩罚因子；然后将检测框置信度分数乘以比例惩罚因子，通过比例惩罚因子逐轮降低检测框的分数；最后经过多轮迭代后移除分数低于阈值的检测框。

结果

基于分段比例惩罚因子NMS算法和连续比例惩罚因子NMS算法的Faster RCNN目标检测模型在PASCAL VOC 2007数据集下，Faster RCNN的检测平均精度均值（mAP）相较于传统的NMS算法分别提高了1.5%和1.6%。其中，以火车类为例，当准确率和召回率均为80%时，火车类检测的漏检率和误检率分别降低了1.8%和1.2%。与传统的NMS算法相比，本文所提出改进的NMS算法可以有效地保留目标检测框和移除目标的假正例检测框，从而降低NMS算法的漏检率和误检率。

结论

在时间复杂度相同和运行效率一致的情况下，与传统的NMS算法相比，本文所提出的改进NMS算法mAP值得到了显著的提升，同时本文算法为其他目标检测模型提供了一个通用的解决方法。

Abstract

Objective

Object detection has been a popular research topic in the field of computer vision and is an essential component for security video surveillance system and other computer vision applications. Image recognition

which is based on convolutional neural network

has fulfilled remarkable achievements. Many current object detection pipelines due to the deep learning can be divided into three stages as follows:1) extracts region proposals

2) classifies and refines each region proposal

and 3) removes extra detection boxes that might belong to the same object. Non-maximum suppression (NMS) algorithm is frequently used in Stage 3 as an essential part of object detection and obtains impressive effect. Numerous studies have focused on feature design

classifier design

and object proposals

although the NMS algorithm is a core part of object detection. Few studies on the NMS algorithms exist. The NMS algorithm is used as a post-processing step of object detection to remove the redundant detection boxes. However

this algorithm suppresses all detection boxes with higher intersection-over-union (IoU) overlap than the threshold with pre-selected detection box. NMS algorithm may remove the positive detection box if the positive detection box is adjacent to the pre-selected with a high IoU value. It may also preserve the negative detection box because this box with the pre-selected detection box has a low IoU value. Mean average precision (mAP) decreases as a result of the missing and false positives; thus

the traditional NMS can also be called GreedyNMS. GreedyNMS easily causes missed and false detections.

Method

To overcome these shortages

an improved NMS algorithm is proposed in accordance with the different IoU values to assign a proportional penalty coefficient to reduce detection scores. The improved NMS algorithm includes the piecewise and the continuous proportional penalty factor NMS algorithms. The piecewise proportional penalty factor NMS algorithm reduces the scores of detection boxes and has a higher IoU than threshold

. The detection boxes with IoU

which is less than the threshold

maintains its original score. The detection boxes whose scores are lower than another threshold

are removed after many iterations. The performance of this algorithm remains limited by the threshold

. The continuous proportional penalty factor NMS algorithm no longer uses threshold

but directly reduces all detection boxes

except those with the maximum score in each iteration. In the continuous proportional penalty factor NMS algorithm

the threshold slightly affects the performance of the algorithm. The improved NMS algorithm initially calculates the proportional penalty factors the correspond to the detection boxes in accordance with the IoU value of the pre-selection detection box. The improved NMS algorithm multiplies the confidence scores of the detection boxes by the proportional penalty factors and reduces the detection box scores through the proportional penalty factor after many iterations. Moreover

the improved NMS algorithm removes the detection boxes with a score below the threshold after many iterations. The piecewise and the continuous proportional penalty factor NMS algorithms are used in each iteration in a post-processing step of object detection rather than in a region proposal network. The threshold in the continuous proportional penalty factor is less sensitive to the performance of the algorithm than the influence of the threshold in GreedyNMS. In addition

the computational complexity of the improved NMS algorithm is O(

)

which is the same as that of GreedyNMS

where

$$ n$$

is the number of detection boxes.

Result

This experiment is based on faster RCNN on PASCAL VOC 2007 that has 20 object categories

and the basic network is VGG16. We train the models on the union set of VOC 2007 trainval and evaluate a VOC 2007 test set. Object detection accuracy is measured by the mAP. The improved NMS algorithm obtains significant improvements on standard datasets

such as PASCAL VOC (1.5% for the piecewise proportional penalty factor NMS algorithm and 1.6% for the continuous proportional penalty factor NMS algorithm) using the piecewise and the continuous proportional penalty factor NMS algorithms in a basic faster RCNN. Compared with GreedyNMS

the piecewise proportional penalty factor NMS algorithm has significantly improved by up to 1.5% in the mAP when the threshold is 0.3 or 0.4. However

the performance of the piecewise proportional penalty factor NMS algorithm remains limited by selecting the threshold. Therefore

the influence of the threshold on the performance of the algorithm is weakened in the continuous proportional penalty NMS algorithm. Compared with the GreedyNMS algorithm

the continuous proportional penalty NMS algorithm has significantly improved by up to 1.6% in the mAP

and the threshold is less sensitive to the performance of the algorithm. The missed and misdetection rates decreased by 1.8% and 1.2%

respectively

when the precision and recall rates are 80%.

Conclusion

The traditional NMS algorithm can easily miss the positive detection boxes and preserve the negative detection boxes. An improved NMS algorithm

which includes the piecewise and the continuous proportional penalty NMS algorithms

is proposed. Compared with the traditional NMS algorithm

the improved NMS algorithm can effectively preserve the object detection boxes and remove the false positive detection boxes. It can also reduce the missed and false detection rates of the NMS algorithm. In addition

the improved and the traditional NMS algorithms have the same time complexity and similar operating efficiency. The experiments show that the detection performance of the faster RCNN has been significantly improved using the improved NMS algorithm. The next step is to continue to improve the algorithm to obtain enhanced generalization capabilities in a single-stage detection model. Simultaneously

the algorithm remains applicable to other object detection models.

关键词

Keywords

references

Zhang H, Wang K F, Wang F Y. Advances and perspectives on applications of deep learning in visual object detection[J]Acta Automatica Sinica, 2017, 43(8):1289-1305.

张慧, 王坤峰, 王飞跃.深度学习在目标视觉检测中的应用进展与展望[J].自动化学报, 2017, 43(8):1289-1305.[DOI:10.16383/j.aas.2017.c160822]

Cai N, Zhou Y, Liu G, et al. Survey of robust principal component analysis methods for moving-object detection[J]Journal of Image and Graphics, 2016, 21(10):1265-1275.

蔡念, 周杨, 刘根, 等.鲁棒主成分分析的运动目标检测综述[J].中国图象图形学报, 2016, 21(10):1265-1275. [DOI:10.11834/jig.20161001]

Zhang S F, Huang X H, Wang M. Algorithm of infrared background suppression and small target detection[J]. Journal of Image and Graphics, 2016, 21(8):1039-1047.

张世锋, 黄心汉, 王敏.红外背景抑制与小目标检测算法[J].中国图象图形学报, 2016, 21(8):1039-1047. [DOI:10.11834/jig.20160808]

Shi X B, Zhang J, Dai Q, et al. A deformed object tracking method utilizing saliency segmentation and target detection[J]. Journal of Computer-Aided Design&Computer Graphics, 2016, 28(4):644-652.

石祥滨, 张健, 代钦, 等.采用显著性分割与目标检测的形变目标跟踪方法[J].计算机辅助设计与图形学学报, 2016, 28(4):644-652. [DOI:10.3969/j.issn.1003-9775.2016.04.015]

Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, OH, USA: IEEE, 2014: 580-587.[ DOI: 10.1109/CVPR.2014.81 http://dx.doi.org/10.1109/CVPR.2014.81 ]

Liu W, Anguelov D, Erhan D, et al. SSD: single shot multiBox detector[C]//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, The Netherlands: Springer, 2016: 21-37.[ DOI: 10.1007/978-3-319-46448-0_2 http://dx.doi.org/10.1007/978-3-319-46448-0_2 ]

Redmon J, Divvala S, Girshick R, et al. You only look once: unified, real-time object detection[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2016: 779-788.[ DOI: 10.1109/CVPR.2016.91 http://dx.doi.org/10.1109/CVPR.2016.91 ]

van de Sande K E A, Uijlings J R R, Gevers T, et al. Segmentation as selective search for object recognition[C]//Proceedings of 2011 International Conference on Computer Vision. Barcelona, Spain: IEEE, 2011: 1879-1886.[ DOI: 10.1109/ICCV.2011.6126456 http://dx.doi.org/10.1109/ICCV.2011.6126456 ]

Ren S Q, He K M, Girshick R, et al. Faster R-CNN:towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6):1137-1149.[DOI:10.1109/TPAMI.2016.2577031]

Girshick R. Fast R-CNN[C]//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE, 2015: 1440-1448.[ DOI: 10.1109/ICCV.2015.169 http://dx.doi.org/10.1109/ICCV.2015.169 ]

He K M, Zhang X Y, Ren S Q, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9):1904-1916.[DOI:10.1109/TPAMI.2015.2389824]

Shen Z Q, Liu Z, Li J G, et al. DSOD: learning deeply supervised object detectors from scratch[C]//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE, 2017: 1937-1945.[ DOI: 10.1109/ICCV.2017.212 http://dx.doi.org/10.1109/ICCV.2017.212 ]

Hu H, Gu J Y, Zhang Z, et al. Relation networks for object detection[J]. arXiv: 1711.11575, 2018.

Rosenfeld A, Thurston M. Edge and curve detection for visual scene analysis[J]. IEEE Transactions on Computers, 1971, C-20(5):562-569.[DOI:10.1109/T-C.1971.223290]

Hosang J, Benenson R, Schiele B. Learning non-maximum suppression[C]//Proceedingsof 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE, 2017: 6469-6477.[ DOI: 10.1109/CVPR.2017.685 http://dx.doi.org/10.1109/CVPR.2017.685 ]

Stewart R, Andriluka M, Ng A Y. End-to-end people detection in crowded scenes[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2016: 2325-2333.[ DOI: 10.1109/CVPR.2016.255 http://dx.doi.org/10.1109/CVPR.2016.255 ]

Henderson P, Ferrari V. End-to-end training of object class detectors for mean average precision[C]//Proceedings of the 13th Asian Conference on Computer Vision. Taipei, China: Springer, 2016: 198-213.[ DOI: 10.1007/978-3-319-54193-8_13 http://dx.doi.org/10.1007/978-3-319-54193-8_13 ]

Hosang J, Benenson R, Schiele B. A convnet for non-maximum suppression[C]//Proceedings of the 38th German Conference on Pattern Recognition. Hannover, Germany: Springer, 2016: 192-204.[ DOI: 10.1007/978-3-319-45886-1_16 http://dx.doi.org/10.1007/978-3-319-45886-1_16 ]

Chen J H, Ye X N. Improvement of non-maximum suppression in pedestrian detection[J]. Journal of East China University of Science and Technology:Natural Science Edition, 2015, 41(3):371-378.

陈金辉, 叶西宁.行人检测中非极大值抑制算法的改进[J].华东理工大学学报:自然科学版, 2015, 41(3):371-378. [DOI:10.3969/j.issn.1006-3080.2015.03.015]

Everingham M, van Gool L, Williams C K I, et al. The PASCAL visual object classes (VOC) challenge[J]. International Journal of Computer Vision, 2010, 88(2):303-338.[DOI:International Journal of Computer Vision]

Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[C]//Proceedings of 2015 International Conference on Learning Representations. 2015: 150-163.[ DOI: 10.1109/CVPR.2014.81 http://dx.doi.org/10.1109/CVPR.2014.81 ]