|
发布时间: 2018-11-16 |
图像分析和识别 |
|
|
收稿日期: 2018-05-08; 修回日期: 2018-06-25
基金项目: 国家自然科学基金项目(61502168);河北省自然科学基金项目(F2016502069)
第一作者简介:
赵文清, 1973年生, 女, 教授, 博士, 主要研究方向为机器学习、数据挖掘、贝叶斯网络学习与智能技术在电力系统中的应用等。E-mail:jbzwq@126.com;
邵绪强, 男, 副教授, 主要研究方向为计算机动画、GPU并行计算、3维重建。E-mail:shaoxuqiang@163.com.
中图法分类号: TP391
文献标识码: A
文章编号: 1006-8961(2018)11-1676-10
|
摘要
目的 作为目标检测的后置处理算法,非极大值抑制(NMS)算法被用于移除多余的检测框。然而,NMS算法在每轮迭代中抑制所有与预选取检测框Intersection-over-Union(IoU)值大于给定阈值的检测框,容易造成目标的漏检和误检。此外,阈值的选取对整个算法的效果有着至关重要的影响。针对这个问题,本文提出了改进的NMS算法,分别为分段比例惩罚因子NMS算法和连续比例惩罚因子NMS算法。在连续比例惩罚因子NMS算法中,阈值对算法的运行效果仅有轻微的影响。方法 改进的NMS算法首先根据检测框与预选取检测框的IoU值大小计算出检测框对应的比例惩罚因子;然后将检测框置信度分数乘以比例惩罚因子,通过比例惩罚因子逐轮降低检测框的分数;最后经过多轮迭代后移除分数低于阈值的检测框。结果 基于分段比例惩罚因子NMS算法和连续比例惩罚因子NMS算法的Faster RCNN目标检测模型在PASCAL VOC 2007数据集下,Faster RCNN的检测平均精度均值(mAP)相较于传统的NMS算法分别提高了1.5%和1.6%。其中,以火车类为例,当准确率和召回率均为80%时,火车类检测的漏检率和误检率分别降低了1.8%和1.2%。与传统的NMS算法相比,本文所提出改进的NMS算法可以有效地保留目标检测框和移除目标的假正例检测框,从而降低NMS算法的漏检率和误检率。结论 在时间复杂度相同和运行效率一致的情况下,与传统的NMS算法相比,本文所提出的改进NMS算法mAP值得到了显著的提升,同时本文算法为其他目标检测模型提供了一个通用的解决方法。
关键词
目标检测; 非极大值抑制算法; 检测框; 比例因子; 假正例
Abstract
Objective
Object detection has been a popular research topic in the field of computer vision and is an essential component for security video surveillance system and other computer vision applications. Image recognition, which is based on convolutional neural network, has fulfilled remarkable achievements. Many current object detection pipelines due to the deep learning can be divided into three stages as follows:1) extracts region proposals, 2) classifies and refines each region proposal, and 3) removes extra detection boxes that might belong to the same object. Non-maximum suppression (NMS) algorithm is frequently used in Stage 3 as an essential part of object detection and obtains impressive effect. Numerous studies have focused on feature design, classifier design, and object proposals, although the NMS algorithm is a core part of object detection. Few studies on the NMS algorithms exist. The NMS algorithm is used as a post-processing step of object detection to remove the redundant detection boxes. However, this algorithm suppresses all detection boxes with higher intersection-over-union (IoU) overlap than the threshold with pre-selected detection box. NMS algorithm may remove the positive detection box if the positive detection box is adjacent to the pre-selected with a high IoU value. It may also preserve the negative detection box because this box with the pre-selected detection box has a low IoU value. Mean average precision (mAP) decreases as a result of the missing and false positives; thus, the traditional NMS can also be called GreedyNMS. GreedyNMS easily causes missed and false detections.
Method
To overcome these shortages, an improved NMS algorithm is proposed in accordance with the different IoU values to assign a proportional penalty coefficient to reduce detection scores. The improved NMS algorithm includes the piecewise and the continuous proportional penalty factor NMS algorithms. The piecewise proportional penalty factor NMS algorithm reduces the scores of detection boxes and has a higher IoU than threshold T. The detection boxes with IoU, which is less than the threshold T, maintains its original score. The detection boxes whose scores are lower than another threshold σ are removed after many iterations. The performance of this algorithm remains limited by the threshold T. The continuous proportional penalty factor NMS algorithm no longer uses threshold T but directly reduces all detection boxes, except those with the maximum score in each iteration. In the continuous proportional penalty factor NMS algorithm, the threshold slightly affects the performance of the algorithm. The improved NMS algorithm initially calculates the proportional penalty factors the correspond to the detection boxes in accordance with the IoU value of the pre-selection detection box. The improved NMS algorithm multiplies the confidence scores of the detection boxes by the proportional penalty factors and reduces the detection box scores through the proportional penalty factor after many iterations. Moreover, the improved NMS algorithm removes the detection boxes with a score below the threshold after many iterations. The piecewise and the continuous proportional penalty factor NMS algorithms are used in each iteration in a post-processing step of object detection rather than in a region proposal network. The threshold in the continuous proportional penalty factor is less sensitive to the performance of the algorithm than the influence of the threshold in GreedyNMS. In addition, the computational complexity of the improved NMS algorithm is O(n2), which is the same as that of GreedyNMS, where
Key words
object detection; non-maximum suppression algorithm; detection boxes; scale factor; false positives
0 引言
目标检测作为机器视觉领域的热点研究问题,在视频监控、自动驾驶、智能交通等领域有着广泛的应用[1-4]。目前,以RCNN[5]、SSD[6]、YOLO[7]家族为主的基于深度卷积神经网络检测模型,在目标检测领域取得了显著的进展[1]。基于深度卷积神经网络的目标检测算法可分为3步,第1步,提取目标检测框,例如Selective Search[8]、滑动窗口、RPN (Region Proposal Networks)[9]等;第2步,卷积神经网络对每个检测框进行分类以及位置回归;第3歩,合并同一检测目标的检测框。其中,第3步作为检测过程中不可或缺的部分常采用NMS算法,并且在文献[5-7, 9-13]中均取得了较好的处理效果。NMS算法最开始被应用于边缘检测[14],随后被进一步应用于人脸检测、目标检测等研究领域。然而,传统的NMS[14]算法是一种贪心NMS算法(GreedyNMS),在合并检测框时抑制所有
因此,文献[15]通过卷积神经网络,利用检测框的坐标、宽、高、置信度等信息通过学习的方式为每个目标生成特定的检测框来替代NMS算法。文献[16]直接生成一系列的稀疏建议框,并且训练一个LSTM来代替NMS。文献[17]提出一种真正的端到端的训练方式,在训练阶段将NMS算法包含在内,使分类器在测试阶段对NMS算法有一定的感知。文献[18]提出Tyrolean Network对所有的建议框重新打分,通过学习的方式获得最佳的输出,但是该方法在提升精度的同时大幅增加了计算量,从而使整个系统的检测速度下降。上述方法虽然有着与NMS算法相似的作用,但均未达到超过NMS算法的效果。此外,部分算法所需的计算量要远远超过NMS算法。因此,NMS算法依旧是合并检测框的首选算法。
由于在给定阈值
1 Faster RCNN目标检测模型
Faster RCNN检测流程主要由RPN模型及Fast RCNN模型两部分组成,Faster RCNN检测流程如图 1所示。
RPN模型的主要作用是替代selective search(SS)算法从图像中提取目标建议框,并对每个建议框进行分类以及位置回归。在分类中,RPN网络仅将建议框分为前景和背景两类。RPN模型首先基于预训练的图像分类模型提取特征,提取预训练网络模型最后一层卷积层的特征图,找出特征图上每一点在输入图像上的对应点,基于该点在原图像上生成9种不同尺度的矩形检测框,然后对每个检测框进行分类以及位置回归,最后使用NMS算法去除重叠较大的检测框,选取分数较高的检测框作为Fast RCNN模型的输入。因此,RPN可完全替代SS算法,并且在提取检测框速度方面有明显的优势。
通常情况下,RPN模型在训练阶段,将矩形建议框与ground-truth box的
$ \begin{array}{l} L(\{ {p_i}\}, \{ {t_i}\} ) = \frac{1}{{{N_{{\rm{cls}}}}}}\sum\limits_i {{L_{{\rm{cls}}}}} ({p_i}, p_i^*) + \\ \;\;\;\;\;\;\;\;\;\;\mu \frac{1}{{{N_{{\rm{reg}}}}}}\sum\limits_i {p_i^*} {L_{{\rm{reg}}}}({t_i}, t_i^*) \end{array} $ | (1) |
式中,
Fast RCNN由3部分组成:第1部分预训练网络提取特征图,在Fast RCNN检测模型中,首先基于RPN模型提取图像的区域检测框,然后求得区域建议框在特征图上的对应区域;第2部分RoI(region of interest)池化层,大小不同的区域检测框对应的特征图区域具有不同的尺度,使用RoI池化层将不同大小的特征图处理成相同尺度,然后将处理后的特征图输入全连接层;第3部分NMS算法合并同一目标的检测框,在未进行预处理前,同一目标具有多个检测框,使用NMS算法移除多余的检测框。
2 基于改进的NMS算法的目标检测
2.1 非极大值抑制算法[14]
当一幅待测试图像经过目标检测系统但未进行后置处理时,每个目标均会生成多个检测框。然而,在最终的检测结果中每个目标只需对应一个检测框,因此,GreedyNMS算法被用于移除多余的检测框。GreedyNMS算法如下:
1) 检测框集合
2) 将集合
3) 计算其余检测框与该第1个检测框的
4) 重复上述步骤2)和步骤3)直至集合
对GreedyNMS算法进行分析可以得出,检测框移除与否取决于阈值
图 2为基于Faster RCNN的目标检测效果图,后置处理算法均为GreedyNMS。图 2(a)表明了由于阈值
2.2 改进的非极大值抑制算法
如上所述,相邻的检测框都具有检测同一目标的可能性,然而GreedyNMS算法直接移除与预选取检测框
2.2.1 分段比例惩罚因子NMS(Seg-NMS)算法
分段比例惩罚因子NMS(Seg-NMS)算法的比例惩罚因子为
$ \begin{array}{l} \;\;\;\;\;\;\;\;\;\;\;\;\;\;{\mu _i} = f({\rm{IoU}}({p_m}, {p_i})) = \\ \left\{ \begin{array}{l} 1\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;{\rm{IoU}}({p_m}, {p_i}) < T\\ 1-{\rm{lg}}\left( {{\rm{IoU}}\left( {{p_m}, {p_i}} \right) + 1} \right)\;\;\;\;{\rm{IoU}}\left( {{p_m}, {p_i}} \right) \ge T \end{array} \right. \end{array} $ | (2) |
式中,
2.2.2 连续比例惩罚因子NMS(Con-NMS)算法
在2.2.1节中,本文提出的Seg-NMS算法依旧受限于阈值
输入:检测框集合
输出:
while:
for
if
return
其中,
Con-NMS算法的比例惩罚因子如式(3)所示。在Con-NMS算法中,检测框对应的比例惩罚因子是连续的过程,根据
$ \begin{array}{l} {\mu _i} = f({\rm{IoU}}({p_m}, {p_i})) = \\ 1-{\rm{lg}}({\rm{IoU}}({p_m}, {p_i}) + 1) \end{array} $ | (3) |
式中,
本文所提出的Seg-NMS算法和Con-NMS算法的比例惩罚因子应用于后置处理的每一轮迭代中,经过多轮迭代移除置信度分数小于
3 实验分析
采用4个阶段交替训练[6]的方式训练Faster RCNN目标检测模型,训练过程与文献[6]训练过程完全一致。
Faster RCNN的训练过程共分为4个阶段:
1) 使用预训练网络模型训练RPN模型,输出大量的检测框;
2) 使用与阶段1)中相同的预训练网络以及输出的检测框训练FastRCNN模型。由于阶段1)和阶段2)是分开训练地,此时阶段1)和阶段2)中预训练网络的参数一定是不同的;
3) 使用阶段2)中的预训练网络参数初始化阶段1)中的预训练网络参数,固定该网络参数后重新训练RPN模型,仅仅微调RPN模型独有的层的参数;
4) 使用阶段3)中的预训练网络参数初始化Fast RCNN,并固定该网络参数,仅仅微调Fast RCNN独有的层的参数。
3.1 实验数据及实验参数
本文所用实验数据是PASCAL VOC 2007[20]数据集,该数据集具有20类目标,分为训练集VOC 2007 trainval、验证集VOC 2007 val和测试集VOC 207 test 3个部分。
本文选用的预训练网络模型为VGG16[21],训练数据集为VOC 2007 trainval,评估模型的测试集为VOC 2007 test,以mAP作为目标检测精度的评价标准。本文建立Faster RCNN模型的过程与文献[9]的建立过程完全一致。
RPN在训练阶段可以进行端到端的训练,每个mini-batch的大小为256,正负样本比例为1 :1,当正样本数量不足时填充负样本。网络中各层初始化的参数满足均值为0、标准差为0.01的高斯分布。初始学习率为0.001迭代60 k次,然后学习率改为0.000 1迭代20 k次,动量以及权重衰减分别为0.9和0.000 5。
3.2 Seg-NMS算法和Con-NMS算法实验结果
表 1为Seg-NMS算法在参数
表 1
Seg-NMS算法在参数
Table 1
Experimental results of the Seg-NMS algorithm when the parameter
0.3 | 0.4 | 0.5 | 0.6 | 0.7 | |
Seg-NMS/% | 71.5 | 71.4 | 68.8 | 64.6 | 56.4 |
表 2
Con-NMS算法的实验结果
Table 2
Experimental results of the Con-NMS algorithm
|
|||||
0.001 | 0.002 | 0.003 | 0.004 | 0.005 | |
Con-NMS/% | 71.6 | 71.3 | 71.2 | 71.2 | 71.1 |
3.3 性能分析
表 3列出了GreedyNMS算法、Seg-NMS算法在不同阈值
算法 | $T$ | ||||
0.3 | 0.4 | 0.5 | 0.6 | 0.7 | |
GreedyNMS [14]/% | 69.6 | 70.0 | 69.1 | 64.4 | 56.6 |
Seg-NMS/% | 71.5 | 71.4 | 68.8 | 64.6 | 57.0 |
图 3分别为GreedyNMS算法在阈值
由于篇幅所限,本文仅列出GreedyNMS算法、Seg-NMS算法和Con-NMS算法在VOC 2007 test数据集上部分类别的测试结果,如表 4所示。其中,在GreedyNMS算法和Seg-NMS算法中,阈值
算法 | mAP | 自行车 | 船 | 汽车 | 椅子 | 桌子 | 摩托车 | 人 | 植物 | 羊 | 火车 |
GreedyNMS [19]/% | 70.0 | 78.7 | 51.5 | 80.1 | 50.0 | 64.7 | 76.1 | 77.0 | 38.4 | 68.7 | 75.1 |
Seg-NMS/% | 71.5 | 81.4 | 60.2 | 82.6 | 52.8 | 66.3 | 76.2 | 79.2 | 43.8 | 70.3 | 79.2 |
Con-NMS/% | 71.6 | 82.0 | 57.9 | 81.6 | 52.4 | 66.8 | 78.7 | 79.3 | 43.0 | 71.5 | 80.9 |
注:加粗字体为Seg-NMS和Con-NMS相较于GreedyNMS检测精度有明显提升的类别。 |
其中,当准确率和召回率均为80%时,火车类检测的漏检率和误检率分别降低了1.8%和1.2%。
图 6为GreedyNMS算法和Con-NMS算法的检测结果图,图 6(a)(c)为GreedyNMS算法检测效果。其中,从图 6(a)的目标检测结果中可以看出,GreedyNMS算法出现了假正例,造成了目标的误检,从图 6(c)的目标检测结果中可以看出,GreedyNMS算法抑制了正样本检测框,造成了目标的漏检。图 6(b)和图 6(d)为Con-NMS算法的检测结果,相较于GreedyNMS算法造成的目标误检和目标漏检,从图 6(b)的检测结果中可以看出,Con-NMS算法抑制了假正例的出现(目标误检),同时,从图 6(d)的检测结果中可以看出,Con-NMS算法保留了被GreedyNMS算法所抑制的正样本检测框,从而避免了目标的漏检。
4 结论
由于传统NMS算法容易造成目标的漏检和误检,本文提出了改进的NMS算法,分别为分段比例惩罚因子NMS(Seg-NMS)算法和连续比例惩罚因子NMS(Con-NMS)算法。通过多轮迭代的方式降低检测框的分数,从而避免了GreedyNMS算法由于阈值
参考文献
-
[1] Zhang H, Wang K F, Wang F Y. Advances and perspectives on applications of deep learning in visual object detection[J]. Acta Automatica Sinica, 2017, 43(8): 1289–1305. [张慧, 王坤峰, 王飞跃. 深度学习在目标视觉检测中的应用进展与展望[J]. 自动化学报, 2017, 43(8): 1289–1305. ] [DOI:10.16383/j.aas.2017.c160822]
-
[2] Cai N, Zhou Y, Liu G, et al. Survey of robust principal component analysis methods for moving-object detection[J]. Journal of Image and Graphics,, 2016, 21(10): 1265–1275. [蔡念, 周杨, 刘根, 等. 鲁棒主成分分析的运动目标检测综述[J]. 中国图象图形学报, 2016, 21(10): 1265–1275. ] [DOI:10.11834/jig.20161001]
-
[3] Zhang S F, Huang X H, Wang M. Algorithm of infrared background suppression and small target detection[J]. Journal of Image and Graphics, 2016, 21(8): 1039–1047. [张世锋, 黄心汉, 王敏. 红外背景抑制与小目标检测算法[J]. 中国图象图形学报, 2016, 21(8): 1039–1047. ] [DOI:10.11834/jig.20160808]
-
[4] Shi X B, Zhang J, Dai Q, et al. A deformed object tracking method utilizing saliency segmentation and target detection[J]. Journal of Computer-Aided Design & Computer Graphics, 2016, 28(4): 644–652. [石祥滨, 张健, 代钦, 等. 采用显著性分割与目标检测的形变目标跟踪方法[J]. 计算机辅助设计与图形学学报, 2016, 28(4): 644–652. ] [DOI:10.3969/j.issn.1003-9775.2016.04.015]
-
[5] Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, OH, USA: IEEE, 2014: 580-587.[DOI: 10.1109/CVPR.2014.81]
-
[6] Liu W, Anguelov D, Erhan D, et al. SSD: single shot multiBox detector[C]//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, The Netherlands: Springer, 2016: 21-37.[DOI: 10.1007/978-3-319-46448-0_2]
-
[7] Redmon J, Divvala S, Girshick R, et al. You only look once: unified, real-time object detection[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2016: 779-788.[DOI: 10.1109/CVPR.2016.91]
-
[8] van de Sande K E A, Uijlings J R R, Gevers T, et al. Segmentation as selective search for object recognition[C]//Proceedings of 2011 International Conference on Computer Vision. Barcelona, Spain: IEEE, 2011: 1879-1886.[DOI: 10.1109/ICCV.2011.6126456]
-
[9] Ren S Q, He K M, Girshick R, et al. Faster R-CNN:towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137–1149. [DOI:10.1109/TPAMI.2016.2577031]
-
[10] Girshick R. Fast R-CNN[C]//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE, 2015: 1440-1448.[DOI: 10.1109/ICCV.2015.169]
-
[11] He K M, Zhang X Y, Ren S Q, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1904–1916. [DOI:10.1109/TPAMI.2015.2389824]
-
[12] Shen Z Q, Liu Z, Li J G, et al. DSOD: learning deeply supervised object detectors from scratch[C]//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE, 2017: 1937-1945.[DOI: 10.1109/ICCV.2017.212]
-
[13] Hu H, Gu J Y, Zhang Z, et al. Relation networks for object detection[J]. arXiv: 1711.11575, 2018.
-
[14] Rosenfeld A, Thurston M. Edge and curve detection for visual scene analysis[J]. IEEE Transactions on Computers, 1971, C-20(5): 562–569. [DOI:10.1109/T-C.1971.223290]
-
[15] Hosang J, Benenson R, Schiele B. Learning non-maximum suppression[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE, 2017: 6469-6477.[DOI: 10.1109/CVPR.2017.685]
-
[16] Stewart R, Andriluka M, Ng A Y. End-to-end people detection in crowded scenes[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2016: 2325-2333.[DOI: 10.1109/CVPR.2016.255]
-
[17] Henderson P, Ferrari V. End-to-end training of object class detectors for mean average precision[C]//Proceedings of the 13th Asian Conference on Computer Vision. Taipei, China: Springer, 2016: 198-213.[DOI: 10.1007/978-3-319-54193-8_13]
-
[18] Hosang J, Benenson R, Schiele B. A convnet for non-maximum suppression[C]//Proceedings of the 38th German Conference on Pattern Recognition. Hannover, Germany: Springer, 2016: 192-204.[DOI: 10.1007/978-3-319-45886-1_16]
-
[19] Chen J H, Ye X N. Improvement of non-maximum suppression in pedestrian detection[J]. Journal of East China University of Science and Technology:Natural Science Edition, 2015, 41(3): 371–378. [陈金辉, 叶西宁. 行人检测中非极大值抑制算法的改进[J]. 华东理工大学学报:自然科学版, 2015, 41(3): 371–378. ] [DOI:10.3969/j.issn.1006-3080.2015.03.015]
-
[20] Everingham M, van Gool L, Williams C K I, et al. The PASCAL visual object classes (VOC) challenge[J]. International Journal of Computer Vision, 2010, 88(2): 303–338.
-
[21] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[C]//Proceedings of 2015 International Conference on Learning Representations. 2015: 150-163.[DOI: 10.1109/CVPR.2014.81]