发布时间: 2018-11-16 摘要点击次数: 全文下载次数: DOI: 10.11834/jig.180275 2018 | Volume 23 | Number 11 图像分析和识别

 收稿日期: 2018-05-08; 修回日期: 2018-06-25 基金项目: 国家自然科学基金项目（61502168）；河北省自然科学基金项目（F2016502069） 第一作者简介: 赵文清, 1973年生, 女, 教授, 博士, 主要研究方向为机器学习、数据挖掘、贝叶斯网络学习与智能技术在电力系统中的应用等。E-mail:jbzwq@126.com;邵绪强, 男, 副教授, 主要研究方向为计算机动画、GPU并行计算、3维重建。E-mail:shaoxuqiang@163.com. 中图法分类号: TP391 文献标识码: A 文章编号: 1006-8961(2018)11-1676-10

# 关键词

Object detection based on improved non-maximum suppression algorithm
Zhao Wenqing, Yan Hai, Shao Xuqiang
North China Electric Power University, Baoding 071003, China
Supported by: National Natural Science Foundation of China(61502168); Natural Science Foundation of Hebei Province, China(F2016502069)

# Abstract

Objective Object detection has been a popular research topic in the field of computer vision and is an essential component for security video surveillance system and other computer vision applications. Image recognition, which is based on convolutional neural network, has fulfilled remarkable achievements. Many current object detection pipelines due to the deep learning can be divided into three stages as follows:1) extracts region proposals, 2) classifies and refines each region proposal, and 3) removes extra detection boxes that might belong to the same object. Non-maximum suppression (NMS) algorithm is frequently used in Stage 3 as an essential part of object detection and obtains impressive effect. Numerous studies have focused on feature design, classifier design, and object proposals, although the NMS algorithm is a core part of object detection. Few studies on the NMS algorithms exist. The NMS algorithm is used as a post-processing step of object detection to remove the redundant detection boxes. However, this algorithm suppresses all detection boxes with higher intersection-over-union (IoU) overlap than the threshold with pre-selected detection box. NMS algorithm may remove the positive detection box if the positive detection box is adjacent to the pre-selected with a high IoU value. It may also preserve the negative detection box because this box with the pre-selected detection box has a low IoU value. Mean average precision (mAP) decreases as a result of the missing and false positives; thus, the traditional NMS can also be called GreedyNMS. GreedyNMS easily causes missed and false detections. Method To overcome these shortages, an improved NMS algorithm is proposed in accordance with the different IoU values to assign a proportional penalty coefficient to reduce detection scores. The improved NMS algorithm includes the piecewise and the continuous proportional penalty factor NMS algorithms. The piecewise proportional penalty factor NMS algorithm reduces the scores of detection boxes and has a higher IoU than threshold T. The detection boxes with IoU, which is less than the threshold T, maintains its original score. The detection boxes whose scores are lower than another threshold σ are removed after many iterations. The performance of this algorithm remains limited by the threshold T. The continuous proportional penalty factor NMS algorithm no longer uses threshold T but directly reduces all detection boxes, except those with the maximum score in each iteration. In the continuous proportional penalty factor NMS algorithm, the threshold slightly affects the performance of the algorithm. The improved NMS algorithm initially calculates the proportional penalty factors the correspond to the detection boxes in accordance with the IoU value of the pre-selection detection box. The improved NMS algorithm multiplies the confidence scores of the detection boxes by the proportional penalty factors and reduces the detection box scores through the proportional penalty factor after many iterations. Moreover, the improved NMS algorithm removes the detection boxes with a score below the threshold after many iterations. The piecewise and the continuous proportional penalty factor NMS algorithms are used in each iteration in a post-processing step of object detection rather than in a region proposal network. The threshold in the continuous proportional penalty factor is less sensitive to the performance of the algorithm than the influence of the threshold in GreedyNMS. In addition, the computational complexity of the improved NMS algorithm is O(n2), which is the same as that of GreedyNMS, where $n$ is the number of detection boxes. Result This experiment is based on faster RCNN on PASCAL VOC 2007 that has 20 object categories, and the basic network is VGG16. We train the models on the union set of VOC 2007 trainval and evaluate a VOC 2007 test set. Object detection accuracy is measured by the mAP. The improved NMS algorithm obtains significant improvements on standard datasets, such as PASCAL VOC (1.5% for the piecewise proportional penalty factor NMS algorithm and 1.6% for the continuous proportional penalty factor NMS algorithm) using the piecewise and the continuous proportional penalty factor NMS algorithms in a basic faster RCNN. Compared with GreedyNMS, the piecewise proportional penalty factor NMS algorithm has significantly improved by up to 1.5% in the mAP when the threshold is 0.3 or 0.4. However, the performance of the piecewise proportional penalty factor NMS algorithm remains limited by selecting the threshold. Therefore, the influence of the threshold on the performance of the algorithm is weakened in the continuous proportional penalty NMS algorithm. Compared with the GreedyNMS algorithm, the continuous proportional penalty NMS algorithm has significantly improved by up to 1.6% in the mAP, and the threshold is less sensitive to the performance of the algorithm. The missed and misdetection rates decreased by 1.8% and 1.2%, respectively, when the precision and recall rates are 80%. Conclusion The traditional NMS algorithm can easily miss the positive detection boxes and preserve the negative detection boxes. An improved NMS algorithm, which includes the piecewise and the continuous proportional penalty NMS algorithms, is proposed. Compared with the traditional NMS algorithm, the improved NMS algorithm can effectively preserve the object detection boxes and remove the false positive detection boxes. It can also reduce the missed and false detection rates of the NMS algorithm. In addition, the improved and the traditional NMS algorithms have the same time complexity and similar operating efficiency. The experiments show that the detection performance of the faster RCNN has been significantly improved using the improved NMS algorithm. The next step is to continue to improve the algorithm to obtain enhanced generalization capabilities in a single-stage detection model. Simultaneously, the algorithm remains applicable to other object detection models.

# Key words

object detection; non-maximum suppression algorithm; detection boxes; scale factor; false positives

# 1 Faster RCNN目标检测模型

Faster RCNN检测流程主要由RPN模型及Fast RCNN模型两部分组成，Faster RCNN检测流程如图 1所示。

RPN模型的主要作用是替代selective search(SS)算法从图像中提取目标建议框，并对每个建议框进行分类以及位置回归。在分类中，RPN网络仅将建议框分为前景和背景两类。RPN模型首先基于预训练的图像分类模型提取特征，提取预训练网络模型最后一层卷积层的特征图，找出特征图上每一点在输入图像上的对应点，基于该点在原图像上生成9种不同尺度的矩形检测框，然后对每个检测框进行分类以及位置回归，最后使用NMS算法去除重叠较大的检测框，选取分数较高的检测框作为Fast RCNN模型的输入。因此，RPN可完全替代SS算法，并且在提取检测框速度方面有明显的优势。

 $\begin{array}{l} L(\{ {p_i}\}, \{ {t_i}\} ) = \frac{1}{{{N_{{\rm{cls}}}}}}\sum\limits_i {{L_{{\rm{cls}}}}} ({p_i}, p_i^*) + \\ \;\;\;\;\;\;\;\;\;\;\mu \frac{1}{{{N_{{\rm{reg}}}}}}\sum\limits_i {p_i^*} {L_{{\rm{reg}}}}({t_i}, t_i^*) \end{array}$ (1)

Fast RCNN由3部分组成：第1部分预训练网络提取特征图，在Fast RCNN检测模型中，首先基于RPN模型提取图像的区域检测框，然后求得区域建议框在特征图上的对应区域；第2部分RoI(region of interest)池化层，大小不同的区域检测框对应的特征图区域具有不同的尺度，使用RoI池化层将不同大小的特征图处理成相同尺度，然后将处理后的特征图输入全连接层；第3部分NMS算法合并同一目标的检测框，在未进行预处理前，同一目标具有多个检测框，使用NMS算法移除多余的检测框。

# 2.1 非极大值抑制算法[14]

1) 检测框集合$\mathit{\boldsymbol{P}}$、检测框对应的置信度分数集合$\mathit{\boldsymbol{C}}$、最终保留的检测框集合$\mathit{\boldsymbol{K}}$

2) 将集合$\mathit{\boldsymbol{P}}$按照集合$\mathit{\boldsymbol{C}}$中对应的分数由大到小排序，将第1个检测框作为抑制检测框[19]并将其并入集合$\mathit{\boldsymbol{K}}$，然后从集合$\mathit{\boldsymbol{P}}$中移除该检测框。

3) 计算其余检测框与该第1个检测框的${\rm{IoU}}$值，从集合$\mathit{\boldsymbol{P}}$中移除所有${\rm{IoU}}$值大于给定阈值$T$的检测框。

4) 重复上述步骤2)和步骤3)直至集合$\mathit{\boldsymbol{P}}$中检测框的个数为0。

# 2.2.1 分段比例惩罚因子NMS(Seg-NMS)算法

 $\begin{array}{l} \;\;\;\;\;\;\;\;\;\;\;\;\;\;{\mu _i} = f({\rm{IoU}}({p_m}, {p_i})) = \\ \left\{ \begin{array}{l} 1\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;{\rm{IoU}}({p_m}, {p_i}) < T\\ 1-{\rm{lg}}\left( {{\rm{IoU}}\left( {{p_m}, {p_i}} \right) + 1} \right)\;\;\;\;{\rm{IoU}}\left( {{p_m}, {p_i}} \right) \ge T \end{array} \right. \end{array}$ (2)

# 2.2.2 连续比例惩罚因子NMS(Con-NMS)算法

while:$\mathit{\boldsymbol{P}} \ne \emptyset$//当集合非空

${p_m} = {\rm{max}}\left(\mathit{\boldsymbol{P}} \right)$ //分数最大检测框

$\mathit{\boldsymbol{P}} = \mathit{\boldsymbol{P}}-{p_m}$ //移除检测框${p_m}$

$\mathit{\boldsymbol{K}} = \mathit{\boldsymbol{K}} \cup {p_m}$ //检测框${p_m}$并入集合$\mathit{\boldsymbol{K}}$

for ${p_i}$ in $\mathit{\boldsymbol{P}}$ //执行循环

${\mu _i} = f({\rm{IoU}}({p_m}, {p_i}))$//计算比例惩罚因子

${c_i} = {\mu _i}{c_i}$//更新置信度分数

if ${c_i} < \sigma$//分数小于$σ$

$\mathit{\boldsymbol{P}} = \mathit{\boldsymbol{P}}-{p_i}$ //移除检测框${p_i}$

return $\mathit{\boldsymbol{K}}$//返回最终检测框集合$\mathit{\boldsymbol{K}}$

# 3 实验分析

Faster RCNN的训练过程共分为4个阶段：

1) 使用预训练网络模型训练RPN模型，输出大量的检测框；

2) 使用与阶段1)中相同的预训练网络以及输出的检测框训练FastRCNN模型。由于阶段1)和阶段2)是分开训练地，此时阶段1)和阶段2)中预训练网络的参数一定是不同的；

3) 使用阶段2)中的预训练网络参数初始化阶段1)中的预训练网络参数，固定该网络参数后重新训练RPN模型，仅仅微调RPN模型独有的层的参数；

4) 使用阶段3)中的预训练网络参数初始化Fast RCNN，并固定该网络参数，仅仅微调Fast RCNN独有的层的参数。

# 3.1 实验数据及实验参数

RPN在训练阶段可以进行端到端的训练，每个mini-batch的大小为256，正负样本比例为1 :1，当正样本数量不足时填充负样本。网络中各层初始化的参数满足均值为0、标准差为0.01的高斯分布。初始学习率为0.001迭代60 k次，然后学习率改为0.000 1迭代20 k次，动量以及权重衰减分别为0.9和0.000 5。

# 3.2 Seg-NMS算法和Con-NMS算法实验结果

Table 1 Experimental results of the Seg-NMS algorithm when the parameter $σ$ is 0.001

 $T$ 0.3 0.4 0.5 0.6 0.7 Seg-NMS/% 71.5 71.4 68.8 64.6 56.4

Table 2 Experimental results of the Con-NMS algorithm

 $σ$ 0.001 0.002 0.003 0.004 0.005 Con-NMS/% 71.6 71.3 71.2 71.2 71.1

# 3.3 性能分析

Table 3 Experimental results of GreedyNMS[14]algorithm and Seg-NMS algorithm

 算法 $T$ 0.3 0.4 0.5 0.6 0.7 GreedyNMS [14]/% 69.6 70.0 69.1 64.4 56.6 Seg-NMS/% 71.5 71.4 68.8 64.6 57.0

Table 4 Detection results of the improved NMS algorithm and the traditional NMS[14] algorithm on the VOC 2007 test data set

 算法 mAP 自行车 船 汽车 椅子 桌子 摩托车 人 植物 羊 火车 GreedyNMS [19]/% 70.0 78.7 51.5 80.1 50.0 64.7 76.1 77.0 38.4 68.7 75.1 Seg-NMS/% 71.5 81.4 60.2 82.6 52.8 66.3 76.2 79.2 43.8 70.3 79.2 Con-NMS/% 71.6 82.0 57.9 81.6 52.4 66.8 78.7 79.3 43.0 71.5 80.9 注：加粗字体为Seg-NMS和Con-NMS相较于GreedyNMS检测精度有明显提升的类别。

# 参考文献

• [1] Zhang H, Wang K F, Wang F Y. Advances and perspectives on applications of deep learning in visual object detection[J]. Acta Automatica Sinica, 2017, 43(8): 1289–1305. [张慧, 王坤峰, 王飞跃. 深度学习在目标视觉检测中的应用进展与展望[J]. 自动化学报, 2017, 43(8): 1289–1305. ] [DOI:10.16383/j.aas.2017.c160822]
• [2] Cai N, Zhou Y, Liu G, et al. Survey of robust principal component analysis methods for moving-object detection[J]. Journal of Image and Graphics,, 2016, 21(10): 1265–1275. [蔡念, 周杨, 刘根, 等. 鲁棒主成分分析的运动目标检测综述[J]. 中国图象图形学报, 2016, 21(10): 1265–1275. ] [DOI:10.11834/jig.20161001]
• [3] Zhang S F, Huang X H, Wang M. Algorithm of infrared background suppression and small target detection[J]. Journal of Image and Graphics, 2016, 21(8): 1039–1047. [张世锋, 黄心汉, 王敏. 红外背景抑制与小目标检测算法[J]. 中国图象图形学报, 2016, 21(8): 1039–1047. ] [DOI:10.11834/jig.20160808]
• [4] Shi X B, Zhang J, Dai Q, et al. A deformed object tracking method utilizing saliency segmentation and target detection[J]. Journal of Computer-Aided Design & Computer Graphics, 2016, 28(4): 644–652. [石祥滨, 张健, 代钦, 等. 采用显著性分割与目标检测的形变目标跟踪方法[J]. 计算机辅助设计与图形学学报, 2016, 28(4): 644–652. ] [DOI:10.3969/j.issn.1003-9775.2016.04.015]
• [5] Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, OH, USA: IEEE, 2014: 580-587.[DOI: 10.1109/CVPR.2014.81]
• [6] Liu W, Anguelov D, Erhan D, et al. SSD: single shot multiBox detector[C]//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, The Netherlands: Springer, 2016: 21-37.[DOI: 10.1007/978-3-319-46448-0_2]
• [7] Redmon J, Divvala S, Girshick R, et al. You only look once: unified, real-time object detection[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2016: 779-788.[DOI: 10.1109/CVPR.2016.91]
• [8] van de Sande K E A, Uijlings J R R, Gevers T, et al. Segmentation as selective search for object recognition[C]//Proceedings of 2011 International Conference on Computer Vision. Barcelona, Spain: IEEE, 2011: 1879-1886.[DOI: 10.1109/ICCV.2011.6126456]
• [9] Ren S Q, He K M, Girshick R, et al. Faster R-CNN:towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137–1149. [DOI:10.1109/TPAMI.2016.2577031]
• [10] Girshick R. Fast R-CNN[C]//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE, 2015: 1440-1448.[DOI: 10.1109/ICCV.2015.169]
• [11] He K M, Zhang X Y, Ren S Q, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1904–1916. [DOI:10.1109/TPAMI.2015.2389824]
• [12] Shen Z Q, Liu Z, Li J G, et al. DSOD: learning deeply supervised object detectors from scratch[C]//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE, 2017: 1937-1945.[DOI: 10.1109/ICCV.2017.212]
• [13] Hu H, Gu J Y, Zhang Z, et al. Relation networks for object detection[J]. arXiv: 1711.11575, 2018.
• [14] Rosenfeld A, Thurston M. Edge and curve detection for visual scene analysis[J]. IEEE Transactions on Computers, 1971, C-20(5): 562–569. [DOI:10.1109/T-C.1971.223290]
• [15] Hosang J, Benenson R, Schiele B. Learning non-maximum suppression[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE, 2017: 6469-6477.[DOI: 10.1109/CVPR.2017.685]
• [16] Stewart R, Andriluka M, Ng A Y. End-to-end people detection in crowded scenes[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2016: 2325-2333.[DOI: 10.1109/CVPR.2016.255]
• [17] Henderson P, Ferrari V. End-to-end training of object class detectors for mean average precision[C]//Proceedings of the 13th Asian Conference on Computer Vision. Taipei, China: Springer, 2016: 198-213.[DOI: 10.1007/978-3-319-54193-8_13]
• [18] Hosang J, Benenson R, Schiele B. A convnet for non-maximum suppression[C]//Proceedings of the 38th German Conference on Pattern Recognition. Hannover, Germany: Springer, 2016: 192-204.[DOI: 10.1007/978-3-319-45886-1_16]
• [19] Chen J H, Ye X N. Improvement of non-maximum suppression in pedestrian detection[J]. Journal of East China University of Science and Technology:Natural Science Edition, 2015, 41(3): 371–378. [陈金辉, 叶西宁. 行人检测中非极大值抑制算法的改进[J]. 华东理工大学学报:自然科学版, 2015, 41(3): 371–378. ] [DOI:10.3969/j.issn.1006-3080.2015.03.015]
• [20] Everingham M, van Gool L, Williams C K I, et al. The PASCAL visual object classes (VOC) challenge[J]. International Journal of Computer Vision, 2010, 88(2): 303–338.
• [21] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[C]//Proceedings of 2015 International Conference on Learning Representations. 2015: 150-163.[DOI: 10.1109/CVPR.2014.81]