发布时间: 2018-11-16
摘要点击次数:
全文下载次数:
DOI: 10.11834/jig.180275
2018 | Volume 23 | Number 11

图像分析和识别

改进的非极大值抑制算法的目标检测

赵文清, 严海, 邵绪强

华北电力大学, 保定 071003

收稿日期: 2018-05-08; 修回日期: 2018-06-25

基金项目: 国家自然科学基金项目（61502168）；河北省自然科学基金项目（F2016502069）

第一作者简介: 赵文清, 1973年生, 女, 教授, 博士, 主要研究方向为机器学习、数据挖掘、贝叶斯网络学习与智能技术在电力系统中的应用等。E-mail:jbzwq@126.com;
邵绪强, 男, 副教授, 主要研究方向为计算机动画、GPU并行计算、3维重建。E-mail:shaoxuqiang@163.com.

中图法分类号: TP391

文献标识码: A

文章编号: 1006-8961(2018)11-1676-10

摘要

目的作为目标检测的后置处理算法，非极大值抑制（NMS）算法被用于移除多余的检测框。然而，NMS算法在每轮迭代中抑制所有与预选取检测框Intersection-over-Union（IoU）值大于给定阈值的检测框，容易造成目标的漏检和误检。此外，阈值的选取对整个算法的效果有着至关重要的影响。针对这个问题，本文提出了改进的NMS算法，分别为分段比例惩罚因子NMS算法和连续比例惩罚因子NMS算法。在连续比例惩罚因子NMS算法中，阈值对算法的运行效果仅有轻微的影响。方法改进的NMS算法首先根据检测框与预选取检测框的IoU值大小计算出检测框对应的比例惩罚因子；然后将检测框置信度分数乘以比例惩罚因子，通过比例惩罚因子逐轮降低检测框的分数；最后经过多轮迭代后移除分数低于阈值的检测框。结果基于分段比例惩罚因子NMS算法和连续比例惩罚因子NMS算法的Faster RCNN目标检测模型在PASCAL VOC 2007数据集下，Faster RCNN的检测平均精度均值（mAP）相较于传统的NMS算法分别提高了1.5%和1.6%。其中，以火车类为例，当准确率和召回率均为80%时，火车类检测的漏检率和误检率分别降低了1.8%和1.2%。与传统的NMS算法相比，本文所提出改进的NMS算法可以有效地保留目标检测框和移除目标的假正例检测框，从而降低NMS算法的漏检率和误检率。结论在时间复杂度相同和运行效率一致的情况下，与传统的NMS算法相比，本文所提出的改进NMS算法mAP值得到了显著的提升，同时本文算法为其他目标检测模型提供了一个通用的解决方法。

关键词

目标检测; 非极大值抑制算法; 检测框; 比例因子; 假正例

Object detection based on improved non-maximum suppression algorithm

Zhao Wenqing, Yan Hai, Shao Xuqiang

North China Electric Power University, Baoding 071003, China

Supported by: National Natural Science Foundation of China(61502168); Natural Science Foundation of Hebei Province, China(F2016502069)

Abstract

Objective Object detection has been a popular research topic in the field of computer vision and is an essential component for security video surveillance system and other computer vision applications. Image recognition, which is based on convolutional neural network, has fulfilled remarkable achievements. Many current object detection pipelines due to the deep learning can be divided into three stages as follows:1) extracts region proposals, 2) classifies and refines each region proposal, and 3) removes extra detection boxes that might belong to the same object. Non-maximum suppression (NMS) algorithm is frequently used in Stage 3 as an essential part of object detection and obtains impressive effect. Numerous studies have focused on feature design, classifier design, and object proposals, although the NMS algorithm is a core part of object detection. Few studies on the NMS algorithms exist. The NMS algorithm is used as a post-processing step of object detection to remove the redundant detection boxes. However, this algorithm suppresses all detection boxes with higher intersection-over-union (IoU) overlap than the threshold with pre-selected detection box. NMS algorithm may remove the positive detection box if the positive detection box is adjacent to the pre-selected with a high IoU value. It may also preserve the negative detection box because this box with the pre-selected detection box has a low IoU value. Mean average precision (mAP) decreases as a result of the missing and false positives; thus, the traditional NMS can also be called GreedyNMS. GreedyNMS easily causes missed and false detections. Method To overcome these shortages, an improved NMS algorithm is proposed in accordance with the different IoU values to assign a proportional penalty coefficient to reduce detection scores. The improved NMS algorithm includes the piecewise and the continuous proportional penalty factor NMS algorithms. The piecewise proportional penalty factor NMS algorithm reduces the scores of detection boxes and has a higher IoU than threshold T. The detection boxes with IoU, which is less than the threshold T, maintains its original score. The detection boxes whose scores are lower than another threshold σ are removed after many iterations. The performance of this algorithm remains limited by the threshold T. The continuous proportional penalty factor NMS algorithm no longer uses threshold T but directly reduces all detection boxes, except those with the maximum score in each iteration. In the continuous proportional penalty factor NMS algorithm, the threshold slightly affects the performance of the algorithm. The improved NMS algorithm initially calculates the proportional penalty factors the correspond to the detection boxes in accordance with the IoU value of the pre-selection detection box. The improved NMS algorithm multiplies the confidence scores of the detection boxes by the proportional penalty factors and reduces the detection box scores through the proportional penalty factor after many iterations. Moreover, the improved NMS algorithm removes the detection boxes with a score below the threshold after many iterations. The piecewise and the continuous proportional penalty factor NMS algorithms are used in each iteration in a post-processing step of object detection rather than in a region proposal network. The threshold in the continuous proportional penalty factor is less sensitive to the performance of the algorithm than the influence of the threshold in GreedyNMS. In addition, the computational complexity of the improved NMS algorithm is O(n²), which is the same as that of GreedyNMS, where $ n$ is the number of detection boxes. Result This experiment is based on faster RCNN on PASCAL VOC 2007 that has 20 object categories, and the basic network is VGG16. We train the models on the union set of VOC 2007 trainval and evaluate a VOC 2007 test set. Object detection accuracy is measured by the mAP. The improved NMS algorithm obtains significant improvements on standard datasets, such as PASCAL VOC (1.5% for the piecewise proportional penalty factor NMS algorithm and 1.6% for the continuous proportional penalty factor NMS algorithm) using the piecewise and the continuous proportional penalty factor NMS algorithms in a basic faster RCNN. Compared with GreedyNMS, the piecewise proportional penalty factor NMS algorithm has significantly improved by up to 1.5% in the mAP when the threshold is 0.3 or 0.4. However, the performance of the piecewise proportional penalty factor NMS algorithm remains limited by selecting the threshold. Therefore, the influence of the threshold on the performance of the algorithm is weakened in the continuous proportional penalty NMS algorithm. Compared with the GreedyNMS algorithm, the continuous proportional penalty NMS algorithm has significantly improved by up to 1.6% in the mAP, and the threshold is less sensitive to the performance of the algorithm. The missed and misdetection rates decreased by 1.8% and 1.2%, respectively, when the precision and recall rates are 80%. Conclusion The traditional NMS algorithm can easily miss the positive detection boxes and preserve the negative detection boxes. An improved NMS algorithm, which includes the piecewise and the continuous proportional penalty NMS algorithms, is proposed. Compared with the traditional NMS algorithm, the improved NMS algorithm can effectively preserve the object detection boxes and remove the false positive detection boxes. It can also reduce the missed and false detection rates of the NMS algorithm. In addition, the improved and the traditional NMS algorithms have the same time complexity and similar operating efficiency. The experiments show that the detection performance of the faster RCNN has been significantly improved using the improved NMS algorithm. The next step is to continue to improve the algorithm to obtain enhanced generalization capabilities in a single-stage detection model. Simultaneously, the algorithm remains applicable to other object detection models.

Key words

object detection; non-maximum suppression algorithm; detection boxes; scale factor; false positives

0 引言

目标检测作为机器视觉领域的热点研究问题，在视频监控、自动驾驶、智能交通等领域有着广泛的应用^[1-4]。目前，以RCNN^[5]、SSD^[6]、YOLO^[7]家族为主的基于深度卷积神经网络检测模型，在目标检测领域取得了显著的进展^[1]。基于深度卷积神经网络的目标检测算法可分为3步，第1步，提取目标检测框，例如Selective Search^[8]、滑动窗口、RPN (Region Proposal Networks)^[9]等；第2步，卷积神经网络对每个检测框进行分类以及位置回归；第3歩，合并同一检测目标的检测框。其中，第3步作为检测过程中不可或缺的部分常采用NMS算法，并且在文献[5-7, 9-13]中均取得了较好的处理效果。NMS算法最开始被应用于边缘检测^[14]，随后被进一步应用于人脸检测、目标检测等研究领域。然而，传统的NMS^[14]算法是一种贪心NMS算法(GreedyNMS)，在合并检测框时抑制所有${\rm{IoU}}$值大于给定阈值$T$的检测框，导致正样本被漏检。

因此，文献[15]通过卷积神经网络，利用检测框的坐标、宽、高、置信度等信息通过学习的方式为每个目标生成特定的检测框来替代NMS算法。文献[16]直接生成一系列的稀疏建议框，并且训练一个LSTM来代替NMS。文献[17]提出一种真正的端到端的训练方式，在训练阶段将NMS算法包含在内，使分类器在测试阶段对NMS算法有一定的感知。文献[18]提出Tyrolean Network对所有的建议框重新打分，通过学习的方式获得最佳的输出，但是该方法在提升精度的同时大幅增加了计算量，从而使整个系统的检测速度下降。上述方法虽然有着与NMS算法相似的作用，但均未达到超过NMS算法的效果。此外，部分算法所需的计算量要远远超过NMS算法。因此，NMS算法依旧是合并检测框的首选算法。

由于在给定阈值$T$选取不当时，GreedyNMS算法会造成正样本漏检和假正例出现的问题。同时作为目标检测系统的后置处理部分，在解决GreedyNMS算法所存在的缺陷时，应当避免增加额外的计算量和参数。GreedyNMS算法首先选取置信度分数最高的检测框，移除所有${\rm{IoU}}$值大于阈值$T$的检测框，然后对剩余的检测框重复执行该过程。因此，阈值$T$的选取对系统的检测精度有着至关重要的影响。当阈值$T$较小时会导致相邻目标的检测框被抑制造成目标的漏检；当阈值$T$较大时，同一目标的检测框会有部分未被抑制，造成假正例的出现。针对以上问题，本文提出改进的NMS算法，通过迭代的方式根据${\rm{IoU}}$值的大小计算相应检测框的比例因子，从而不断降低检测框的分数，进而有效地降低检测系统的漏检和误检。

1 Faster RCNN目标检测模型

Faster RCNN检测流程主要由RPN模型及Fast RCNN模型两部分组成，Faster RCNN检测流程如图 1所示。

图 1 Faster RCNN检测流程

Fig. 1 The detection process of Faster RCNN

RPN模型的主要作用是替代selective search(SS)算法从图像中提取目标建议框，并对每个建议框进行分类以及位置回归。在分类中，RPN网络仅将建议框分为前景和背景两类。RPN模型首先基于预训练的图像分类模型提取特征，提取预训练网络模型最后一层卷积层的特征图，找出特征图上每一点在输入图像上的对应点，基于该点在原图像上生成9种不同尺度的矩形检测框，然后对每个检测框进行分类以及位置回归，最后使用NMS算法去除重叠较大的检测框，选取分数较高的检测框作为Fast RCNN模型的输入。因此，RPN可完全替代SS算法，并且在提取检测框速度方面有明显的优势。

通常情况下，RPN模型在训练阶段，将矩形建议框与ground-truth box的${\rm{IoU}}$值最大的检测框和与任一ground-truthbox的${\rm{IoU}}$值大于0.7的检测框设置为正样本。将所有与ground-truth的${\rm{IoU}}$值都小于0.3的矩形检测框设置为负样本。RPN使用的是多任务损失函数，分类损失为二分类，只将每个检测框分为前景和背景两类。RPN模型多任务损失函数定义为

$ \begin{array}{l} L(\{ {p_i}\}, \{ {t_i}\} ) = \frac{1}{{{N_{{\rm{cls}}}}}}\sum\limits_i {{L_{{\rm{cls}}}}} ({p_i}, p_i^*) + \\ \;\;\;\;\;\;\;\;\;\;\mu \frac{1}{{{N_{{\rm{reg}}}}}}\sum\limits_i {p_i^*} {L_{{\rm{reg}}}}({t_i}, t_i^*) \end{array} $

(1)

式中，${{N_{{\rm{cls}}}}}$为mini-batch中样本的数目。$i$为每个mini-batch中检测框对应的索引，${p_i}$表示对应的检测框预测为目标的概率，当检测框对应的ground-truth为正样本时，$p_i^*$为1，反之则为0。$μ$为控制前后损失函数比重的平衡因子，${{N_{{\rm{reg}}}}}$为anchor^[6]位置的数目。${L_{{\rm{reg}}}}({t_i}, t_i^*)$为位置回归损失，$p_i^*{L_{{\rm{reg}}}}({t_i}, t_i^*)$表示在位置回归损失中仅计算前景检测框的回归损失。

Fast RCNN由3部分组成：第1部分预训练网络提取特征图，在Fast RCNN检测模型中，首先基于RPN模型提取图像的区域检测框，然后求得区域建议框在特征图上的对应区域；第2部分RoI(region of interest)池化层，大小不同的区域检测框对应的特征图区域具有不同的尺度，使用RoI池化层将不同大小的特征图处理成相同尺度，然后将处理后的特征图输入全连接层；第3部分NMS算法合并同一目标的检测框，在未进行预处理前，同一目标具有多个检测框，使用NMS算法移除多余的检测框。

2 基于改进的NMS算法的目标检测

2.1 非极大值抑制算法^[14]

当一幅待测试图像经过目标检测系统但未进行后置处理时，每个目标均会生成多个检测框。然而，在最终的检测结果中每个目标只需对应一个检测框，因此，GreedyNMS算法被用于移除多余的检测框。GreedyNMS算法如下：

1) 检测框集合$\mathit{\boldsymbol{P}}$、检测框对应的置信度分数集合$\mathit{\boldsymbol{C}}$、最终保留的检测框集合$\mathit{\boldsymbol{K}}$。

2) 将集合$\mathit{\boldsymbol{P}}$按照集合$\mathit{\boldsymbol{C}}$中对应的分数由大到小排序，将第1个检测框作为抑制检测框^[19]并将其并入集合$\mathit{\boldsymbol{K}}$，然后从集合$\mathit{\boldsymbol{P}}$中移除该检测框。

3) 计算其余检测框与该第1个检测框的${\rm{IoU}}$值，从集合$\mathit{\boldsymbol{P}}$中移除所有${\rm{IoU}}$值大于给定阈值$T$的检测框。

4) 重复上述步骤2)和步骤3)直至集合$\mathit{\boldsymbol{P}}$中检测框的个数为0。

对GreedyNMS算法进行分析可以得出，检测框移除与否取决于阈值$T$的设置。因此，阈值$T$的选取对整个算法执行效果有着至关重要的影响。然而，NMS算法在实际应用过程中很难找到一个合适的阈值$T$，可以同时获得高准确率和高召回率。

图 2为基于Faster RCNN的目标检测效果图，后置处理算法均为GreedyNMS。图 2(a)表明了由于阈值$T$的设置不当造成了目标漏检，图 2(b)表明了由于阈值$T$的设置不当造成了目标误检，出现了假正例。

图 2 GreedyNMS算法检测效果

Fig. 2 The detection performance of GreedyNMS algorithm

((a) the detection of missing the positive detection box; (b) the detection of the false positive detection box)

2.2 改进的非极大值抑制算法

如上所述，相邻的检测框都具有检测同一目标的可能性，然而GreedyNMS算法直接移除与预选取检测框${\rm{IoU}}$值大于给定阈值$T$的检测框会导致目标漏检和目标误检。因此，本文提出了改进的NMS算法，首先，根据检测框的${\rm{IoU}}$值大小，赋予检测框相应的比例惩罚因子；然后，通过比例因子逐轮降低检测框的分数；最后，经过多轮迭代后移除分数较低的检测框。

2.2.1 分段比例惩罚因子NMS(Seg-NMS)算法

分段比例惩罚因子NMS(Seg-NMS)算法的比例惩罚因子为${\mu _i}$，当检测框与预选取检测框的${\rm{IoU}}$值大于给定阈值$T$时，根据${\rm{IoU}}$值的大小计算出相应的比例惩罚因子，当${\rm{IoU}}$值小于给定阈值$T$时，检测框的比例惩罚因子为1。

$ \begin{array}{l} \;\;\;\;\;\;\;\;\;\;\;\;\;\;{\mu _i} = f({\rm{IoU}}({p_m}, {p_i})) = \\ \left\{ \begin{array}{l} 1\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;{\rm{IoU}}({p_m}, {p_i}) < T\\ 1-{\rm{lg}}\left( {{\rm{IoU}}\left( {{p_m}, {p_i}} \right) + 1} \right)\;\;\;\;{\rm{IoU}}\left( {{p_m}, {p_i}} \right) \ge T \end{array} \right. \end{array} $

(2)

式中，${\mu _i}$为检测框$i$的比例惩罚因子，${\rm{IoU}}({p_m}, {p_i})\left({0 \le {\rm{IoU}}({p_m}, {p_i}) \le 1} \right)$为检测框${p_i}$与置信度分数最大检测框${p_m}$的IoU值。式(2)将GreedyNMS中$\; \; {\rm{IoU}}\left({{p_m}, {p_i}} \right) \ge T$的检测框置信度分数由0变为$\left({1-{\rm{lg}}\left({{\rm{IoU}}\left({{p_m}, {p_i}} \right) + 1} \right)} \right) \cdot S$。其中，$S$为对应检测框原有的置信度分数。然而，Seg-NMS算法依旧需要选定对算法性能影响较大的阈值$T$，算法的性能受限于阈值$T$。此外，GreedyNMS算法可以看做是Seg-NMS算法的特殊形式。

2.2.2 连续比例惩罚因子NMS(Con-NMS)算法

在2.2.1节中，本文提出的Seg-NMS算法依旧受限于阈值$T$的选取。在此基础之上，本文进一步提出了连续比例惩罚因子NMS(Con-NMS)算法。首先，找出检测框集合$\mathit{\boldsymbol{P}}$中置信度分数最大的检测框${p_m}$并将其从集合$\mathit{\boldsymbol{P}}$中移除，将${p_m}$并入最终保留的检测框集合$\mathit{\boldsymbol{K}}$；然后，计算集合$\mathit{\boldsymbol{P}}$中剩余的检测框与${p_m}$的${\rm{IoU}}$值，根据每个检测框${p_i}$的${\rm{IoU}}$值由式(3)计算出检测框对应的比例惩罚因子${\mu _i}$，更新检测框的置信度分数为$\; {\mu _i}{c_i}$；最后，移除置信度分数小于$σ$的检测框，重复上述流程直至集合$\mathit{\boldsymbol{P}}$中检测框的个数为0。Con-NMS算法过程如下：

输入：检测框集合$\mathit{\boldsymbol{P}} = \{ {p_1}, {p_2}, \ldots, {p_N}\} $，置信度分数集合$\mathit{\boldsymbol{C}} = \{ {c_1}, {c_2}, \ldots, {c_N}\} $。

输出：$\mathit{\boldsymbol{K}}$为最终保留的检测框集合。

while:$\mathit{\boldsymbol{P}} \ne \emptyset $//当集合非空

${p_m} = {\rm{max}}\left(\mathit{\boldsymbol{P}} \right)$ //分数最大检测框

$\mathit{\boldsymbol{P}} = \mathit{\boldsymbol{P}}-{p_m}$ //移除检测框${p_m}$

$\mathit{\boldsymbol{K}} = \mathit{\boldsymbol{K}} \cup {p_m}$ //检测框${p_m}$并入集合$\mathit{\boldsymbol{K}}$

for ${p_i}$ in $\mathit{\boldsymbol{P}}$ //执行循环

${\mu _i} = f({\rm{IoU}}({p_m}, {p_i}))$//计算比例惩罚因子

${c_i} = {\mu _i}{c_i}$//更新置信度分数

if ${c_i} < \sigma $//分数小于$σ$时

$\mathit{\boldsymbol{P}} = \mathit{\boldsymbol{P}}-{p_i}$ //移除检测框${p_i}$

return $\mathit{\boldsymbol{K}}$//返回最终检测框集合$\mathit{\boldsymbol{K}}$

其中，$N$为检测框的个数，$\phi $为空集，${p_m}$为集合$\mathit{\boldsymbol{P}}$中置信度分数最大的检测框，${p_i}$和${c_i}$分别为第$i$个检测框以及对应的置信度分数，${{\rm{IoU}}\left({{p_m}, {p_i}} \right)}$为检测框${p_i}$和${p_m}$的${\rm{IoU}}$值，${\mu _i}$为检测框${p_i}$的比例惩罚因子，$σ$为Con-NMS算法的阈值。

Con-NMS算法的比例惩罚因子如式(3)所示。在Con-NMS算法中，检测框对应的比例惩罚因子是连续的过程，根据${\rm{IoU}}$值的大小计算出对应的比例惩罚因子，当${\rm{IoU}}$值为0时，比例惩罚因子为1，保留检测框原有的置信度分数。

$ \begin{array}{l} {\mu _i} = f({\rm{IoU}}({p_m}, {p_i})) = \\ 1-{\rm{lg}}({\rm{IoU}}({p_m}, {p_i}) + 1) \end{array} $

(3)

式中，${\mu _i}$为检测框$i$的比例惩罚因子，${\rm{IoU}}({p_m}, {p_i})\left({0 \le {\rm{IoU}}({p_m}, {p_i}) \le 1} \right)$为检测框${p_i}$与置信度分数最大检测框${p_m}$的${\rm{IoU}}$值。

本文所提出的Seg-NMS算法和Con-NMS算法的比例惩罚因子应用于后置处理的每一轮迭代中，经过多轮迭代移除置信度分数小于$σ$的检测框。与GreedyNMS算法相比较，本文所提算法中的参数$σ$对于算法性能的影响远低于GreedyNMS算法中阈值$T$的影响。此外，Seg-NMS算法和Con-NMS算法时间复杂度与GreedyNMS一致均为${\rm{O}}\left({{n^2}} \right)$，$n$为检测建议框的个数。

3 实验分析

采用4个阶段交替训练^[6]的方式训练Faster RCNN目标检测模型，训练过程与文献[6]训练过程完全一致。

Faster RCNN的训练过程共分为4个阶段：

1) 使用预训练网络模型训练RPN模型，输出大量的检测框；

2) 使用与阶段1)中相同的预训练网络以及输出的检测框训练FastRCNN模型。由于阶段1)和阶段2)是分开训练地，此时阶段1)和阶段2)中预训练网络的参数一定是不同的；

3) 使用阶段2)中的预训练网络参数初始化阶段1)中的预训练网络参数，固定该网络参数后重新训练RPN模型，仅仅微调RPN模型独有的层的参数；

4) 使用阶段3)中的预训练网络参数初始化Fast RCNN，并固定该网络参数，仅仅微调Fast RCNN独有的层的参数。

3.1 实验数据及实验参数

本文所用实验数据是PASCAL VOC 2007^[20]数据集，该数据集具有20类目标，分为训练集VOC 2007 trainval、验证集VOC 2007 val和测试集VOC 207 test 3个部分。

本文选用的预训练网络模型为VGG16^[21]，训练数据集为VOC 2007 trainval，评估模型的测试集为VOC 2007 test，以mAP作为目标检测精度的评价标准。本文建立Faster RCNN模型的过程与文献[9]的建立过程完全一致。

RPN在训练阶段可以进行端到端的训练，每个mini-batch的大小为256，正负样本比例为1 :1，当正样本数量不足时填充负样本。网络中各层初始化的参数满足均值为0、标准差为0.01的高斯分布。初始学习率为0.001迭代60 k次，然后学习率改为0.000 1迭代20 k次，动量以及权重衰减分别为0.9和0.000 5。

3.2 Seg-NMS算法和Con-NMS算法实验结果

表 1为Seg-NMS算法在参数$σ$为0.001时的实验结果，由表 1可以看出，Seg-NMS算法的检测性能依赖于阈值$T$的设定。当阈值$T$的值为0.3、0.4时，Seg-NMS算法具有较高的检测精度；当阈值$T$≥0.5时，Seg-NMS算法检测精度有了明显的下降。表 2为Con-NMS算法在参数$σ$取不同值的情况下的实验结果，由表 2可以看出，Con-NMS算法在$σ$取不同值的情况下，相较于Seg-NMS算法，Con-NMS算法依旧具有较高的检测精度，使Faster RCNN模型的检测精度保持在71.1%及以上。此外，Seg-NMS算法相较于GreedyNMS算法增加了参数$σ$，而Con-NMS算法的参数量与GreedyNMS算法保持一致，并且参数$σ$的变化对于算法性能的影响并不明显。

表 1 Seg-NMS算法在参数$σ$为0.001时的实验结果
Table 1 Experimental results of the Seg-NMS algorithm when the parameter $σ$ is 0.001

下载CSV

	$ T$
	0.3	0.4	0.5	0.6	0.7
Seg-NMS/%	71.5	71.4	68.8	64.6	56.4

表 2 Con-NMS算法的实验结果
Table 2 Experimental results of the Con-NMS algorithm

下载CSV

	$ σ$
	0.001	0.002	0.003	0.004	0.005
Con-NMS/%	71.6	71.3	71.2	71.2	71.1

3.3 性能分析

表 3列出了GreedyNMS算法、Seg-NMS算法在不同阈值$T$下mAP值的变化。Seg-NMS算法相较于GreedyNMS算法检测精度有了明显的提高，提升幅度为1.5%。然而，当阈值$T$≥0.5时，Seg-NMS算法并未体现出明显的性能提升，甚至当阈值$T$为0.5时，Seg-NMS算法的检测精度低于传统的NMS算法。但是，当阈值$T$为0.6和0.7时，Seg-NMS算法与GreedyNMS算法相比，依然具有一定的优势。造成这种情况的主要原因为：当提升阈值$T$时，相邻检测框中本应被抑制的正样本检测框被保留，降低了正样本的漏检，达到了与本文算法相似的效果，但是这种正样本数量的增加远远小于假正例的增加，因此造成了检测精度的不断降低。

表 3 GreedyNMS算法和Seg-NMS算法的实验结果
Table 3 Experimental results of GreedyNMS^[14]algorithm and Seg-NMS algorithm

下载CSV

算法	$T$
算法	0.3	0.4	0.5	0.6	0.7
GreedyNMS ^[14]/%	69.6	70.0	69.1	64.4	56.6
Seg-NMS/%	71.5	71.4	68.8	64.6	57.0

图 3分别为GreedyNMS算法在阈值$T$为0.6和0.7时的检测效果，相较于图 2(a)的检测效果，GreedyNMS算法虽然保留了正确的检测框，但是也增加了较多的假正例，提高了目标误检率。图 4(a)和图 4(b)为Seg-NMS算法的检测效果，对比图 3和图 4可以看出，当阈值$T$较大时，GreedyNMS算法和Seg-NMS算法均可较好地保留目标检测框，但同时也导致较多的假正例未被抑制。

图 3 给定阈值$T$的GreedyNMS算法检测效果

Fig. 3 GreedyNMS algorithm detection performance for a given threshold $T$ ((a) 0.6; (b) 0.7)

图 4 给定阈值$T$的Seg-NMS算法检测效果

Fig. 4 Seg-NMS algorithm detection performance for a given threshold $T$ ((a) 0.6; (b) 0.7)

由于篇幅所限，本文仅列出GreedyNMS算法、Seg-NMS算法和Con-NMS算法在VOC 2007 test数据集上部分类别的测试结果，如表 4所示。其中，在GreedyNMS算法和Seg-NMS算法中，阈值$T$为0.4，在Seg-NMS算法和Con-NMS算法中，参数$σ$为0.001。从表 4可以看出，本文所提出的Seg-NMS算法和Con-NMS算法较GreedyNMS算法在检测准确率方面有明显的提高。图 5为以Person和Train两类为例绘制的Con-NMS算法和GreedyNMS算法的Precision和Recall曲线，由图 5可以看出，在相同召回率的情况下，Con-NMS算法具有更高的准确率。

表 4 本文所提出的算法和传统NMS算法在VOC 2007 test数据集上测试结果
Table 4 Detection results of the improved NMS algorithm and the traditional NMS^[14] algorithm on the VOC 2007 test data set

下载CSV

算法	mAP	自行车	船	汽车	椅子	桌子	摩托车	人	植物	羊	火车
GreedyNMS ^[19]/%	70.0	78.7	51.5	80.1	50.0	64.7	76.1	77.0	38.4	68.7	75.1
Seg-NMS/%	71.5	81.4	60.2	82.6	52.8	66.3	76.2	79.2	43.8	70.3	79.2
Con-NMS/%	71.6	82.0	57.9	81.6	52.4	66.8	78.7	79.3	43.0	71.5	80.9
注：加粗字体为Seg-NMS和Con-NMS相较于GreedyNMS检测精度有明显提升的类别。

图 5 Person和Train两类的Precision和Recall曲线

Fig. 5 Precision and Recall curves for Person and Train((a) Person; (b) Train)

其中，当准确率和召回率均为80%时，火车类检测的漏检率和误检率分别降低了1.8%和1.2%。

图 6为GreedyNMS算法和Con-NMS算法的检测结果图，图 6(a)(c)为GreedyNMS算法检测效果。其中，从图 6(a)的目标检测结果中可以看出，GreedyNMS算法出现了假正例，造成了目标的误检，从图 6(c)的目标检测结果中可以看出，GreedyNMS算法抑制了正样本检测框，造成了目标的漏检。图 6(b)和图 6(d)为Con-NMS算法的检测结果，相较于GreedyNMS算法造成的目标误检和目标漏检，从图 6(b)的检测结果中可以看出，Con-NMS算法抑制了假正例的出现(目标误检)，同时，从图 6(d)的检测结果中可以看出，Con-NMS算法保留了被GreedyNMS算法所抑制的正样本检测框，从而避免了目标的漏检。

图 6 PASCAL VOC 2007数据集上GreedyNMS算法和NMS算法的检测效果

Fig. 6 Detection performance of GreedyNMS algorithm and NMS algorithm on PASCAL VOC 2007 dataset ((a) false positives in GreedyNMS algorithm; (b) suppressing false positives in Con-NMS algorithm; (c) suppressing positive detection boxes in GreedyNMS algorithm; (d) preserving the positive detection boxes in Con-NMS algorithm)

4 结论

由于传统NMS算法容易造成目标的漏检和误检，本文提出了改进的NMS算法，分别为分段比例惩罚因子NMS(Seg-NMS)算法和连续比例惩罚因子NMS(Con-NMS)算法。通过多轮迭代的方式降低检测框的分数，从而避免了GreedyNMS算法由于阈值$T$的不当设定造成的漏检和误检。在时间复杂度相同和运行效率一致的情况下，与传统的NMS算法相比，Seg-NMS算法和Con-NMS算法都明显提高了检测的准确率。同时，Seg-NMS算法、Con-NMS算法与GreedyNMS算法相比可有效地避免漏检和误检。本文的下一步工作旨在继续改进算法使其在单阶段的检测模型中具有更好的泛化能力。

参考文献

[1] Zhang H, Wang K F, Wang F Y. Advances and perspectives on applications of deep learning in visual object detection[J]. Acta Automatica Sinica, 2017, 43(8): 1289–1305. [张慧, 王坤峰, 王飞跃. 深度学习在目标视觉检测中的应用进展与展望[J]. 自动化学报, 2017, 43(8): 1289–1305. ] [DOI:10.16383/j.aas.2017.c160822]

[2] Cai N, Zhou Y, Liu G, et al. Survey of robust principal component analysis methods for moving-object detection[J]. Journal of Image and Graphics,, 2016, 21(10): 1265–1275. [蔡念, 周杨, 刘根, 等. 鲁棒主成分分析的运动目标检测综述[J]. 中国图象图形学报, 2016, 21(10): 1265–1275. ] [DOI:10.11834/jig.20161001]

[3] Zhang S F, Huang X H, Wang M. Algorithm of infrared background suppression and small target detection[J]. Journal of Image and Graphics, 2016, 21(8): 1039–1047. [张世锋, 黄心汉, 王敏. 红外背景抑制与小目标检测算法[J]. 中国图象图形学报, 2016, 21(8): 1039–1047. ] [DOI:10.11834/jig.20160808]

[4] Shi X B, Zhang J, Dai Q, et al. A deformed object tracking method utilizing saliency segmentation and target detection[J]. Journal of Computer-Aided Design & Computer Graphics, 2016, 28(4): 644–652. [石祥滨, 张健, 代钦, 等. 采用显著性分割与目标检测的形变目标跟踪方法[J]. 计算机辅助设计与图形学学报, 2016, 28(4): 644–652. ] [DOI:10.3969/j.issn.1003-9775.2016.04.015]

[5] Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, OH, USA: IEEE, 2014: 580-587.[DOI: 10.1109/CVPR.2014.81]

[6] Liu W, Anguelov D, Erhan D, et al. SSD: single shot multiBox detector[C]//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, The Netherlands: Springer, 2016: 21-37.[DOI: 10.1007/978-3-319-46448-0_2]

[7] Redmon J, Divvala S, Girshick R, et al. You only look once: unified, real-time object detection[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2016: 779-788.[DOI: 10.1109/CVPR.2016.91]

[8] van de Sande K E A, Uijlings J R R, Gevers T, et al. Segmentation as selective search for object recognition[C]//Proceedings of 2011 International Conference on Computer Vision. Barcelona, Spain: IEEE, 2011: 1879-1886.[DOI: 10.1109/ICCV.2011.6126456]

[9] Ren S Q, He K M, Girshick R, et al. Faster R-CNN:towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137–1149. [DOI:10.1109/TPAMI.2016.2577031]

[10] Girshick R. Fast R-CNN[C]//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE, 2015: 1440-1448.[DOI: 10.1109/ICCV.2015.169]

[11] He K M, Zhang X Y, Ren S Q, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1904–1916. [DOI:10.1109/TPAMI.2015.2389824]

[12] Shen Z Q, Liu Z, Li J G, et al. DSOD: learning deeply supervised object detectors from scratch[C]//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE, 2017: 1937-1945.[DOI: 10.1109/ICCV.2017.212]

[13] Hu H, Gu J Y, Zhang Z, et al. Relation networks for object detection[J]. arXiv: 1711.11575, 2018.

[14] Rosenfeld A, Thurston M. Edge and curve detection for visual scene analysis[J]. IEEE Transactions on Computers, 1971, C-20(5): 562–569. [DOI:10.1109/T-C.1971.223290]

[15] Hosang J, Benenson R, Schiele B. Learning non-maximum suppression[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE, 2017: 6469-6477.[DOI: 10.1109/CVPR.2017.685]

[16] Stewart R, Andriluka M, Ng A Y. End-to-end people detection in crowded scenes[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2016: 2325-2333.[DOI: 10.1109/CVPR.2016.255]

[17] Henderson P, Ferrari V. End-to-end training of object class detectors for mean average precision[C]//Proceedings of the 13th Asian Conference on Computer Vision. Taipei, China: Springer, 2016: 198-213.[DOI: 10.1007/978-3-319-54193-8_13]

[18] Hosang J, Benenson R, Schiele B. A convnet for non-maximum suppression[C]//Proceedings of the 38th German Conference on Pattern Recognition. Hannover, Germany: Springer, 2016: 192-204.[DOI: 10.1007/978-3-319-45886-1_16]

[19] Chen J H, Ye X N. Improvement of non-maximum suppression in pedestrian detection[J]. Journal of East China University of Science and Technology:Natural Science Edition, 2015, 41(3): 371–378. [陈金辉, 叶西宁. 行人检测中非极大值抑制算法的改进[J]. 华东理工大学学报:自然科学版, 2015, 41(3): 371–378. ] [DOI:10.3969/j.issn.1006-3080.2015.03.015]

[20] Everingham M, van Gool L, Williams C K I, et al. The PASCAL visual object classes (VOC) challenge[J]. International Journal of Computer Vision, 2010, 88(2): 303–338.

[21] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[C]//Proceedings of 2015 International Conference on Learning Representations. 2015: 150-163.[DOI: 10.1109/CVPR.2014.81]

摘要

关键词

Abstract

Key words

0 引言

1 Faster RCNN目标检测模型

2 基于改进的NMS算法的目标检测

2.1 非极大值抑制算法[14]

2.2 改进的非极大值抑制算法

2.2.1 分段比例惩罚因子NMS(Seg-NMS)算法

2.2.2 连续比例惩罚因子NMS(Con-NMS)算法

3 实验分析

3.1 实验数据及实验参数

3.2 Seg-NMS算法和Con-NMS算法实验结果

3.3 性能分析

4 结论

参考文献

2.1 非极大值抑制算法^[14]