发布时间: 2019-09-16
摘要点击次数:
全文下载次数:
DOI: 10.11834/jig.180636
2019 | Volume 24 | Number 9

图像分析和识别

结合网格密度聚类的行人检测候选域生成

成科扬^1,2, 周博文¹, 李世超¹, 孙爽¹

1. 江苏大学计算机科学与通信工程学院, 镇江 212000;

2. 社会安全风险感知与防控大数据应用国家工程实验室, 北京 100846

收稿日期: 2018-12-06; 修回日期: 2019-03-20

基金项目: 国家自然科学基金项目（61602215，61672268）；社会安全风险感知与防控大数据应用国家工程实验室主任基金项目

第一作者简介: 成科扬, 1982年生, 男, 副教授, 主要研究方向为计算机视觉、模式识别。E-mail:kycheng@ujs.edu.cn;
李世超, 男, 硕士研究生, 主要研究方向为计算机视觉、模式识别、图像处理。E-mail:seancharlie@foxmail.com;
孙爽, 男, 硕士研究生, 主要研究方向为背景建模、运动目标检测。E-mail:1432874828@qq.com.

中图法分类号: TP391

文献标识码: A

文章编号: 1006-8961(2019)09-1482-11

摘要

目的行人检测是计算机视觉领域中的重点研究问题。经典的可变形部件模型（DPM）算法在行人检测领域素有高检测精度的优点，但由于在构建特征金字塔前处理过多召回率低的候选区域，导致计算速度偏慢，严重影响系统的实时性。针对该问题，本文对模型中选取候选检测区域的流程进行了改进，提出一种结合网格密度聚类算法和选择性搜索算法的行人检测候选对象生成方法来改进DPM模型。方法首先使用三帧差法和高斯混合模型收集固定数量的运动物体坐标点，然后结合基于网格密度的聚类算法构建网格坐标模型，生成目标频繁运动区域，同时进行动态掩层处理。随后引入改进的选择性搜索算法，结合支持向量机（SVM）训练得到的行人轮廓宽高比，提取该区域中高置信度的行人候选检测窗口，从而排除大量冗余的区域假设，完成对候选行人检测区域的精筛选，最后融合至DPM算法进行行人检测。结果所提方法在PETS 2009 Bench-mark数据集上进行检测，实验结果表明，该方法对复杂背景下的检测有较强的稳定性，与传统DPM模型相比，精度提高了1.71%、平均对数漏检率降低2.2%、检测速度提高为3.7倍左右。结论本文提出一种基于网格密度聚类的行人检测候选域生成算法，能够有效表达行人信息，与其他行人检测算法相比，有更好的精度和更快的速度，在检测率、检测时间方面均有提高，能够实现有效、快速的行人检测，具有实际意义。

关键词

可变形部件模型; 网格密度; 选择性搜索; 行人检测; 候选窗口

Candidate domain generation for pedestrian detection combined with grid density clustering

Cheng Keyang^1,2, Zhou Bowen¹, Li Shichao¹, Sun Shuang¹

1. School of Computer Science and Communication Engineering, Jiangsu University, Zhenjiang 212000, China;

2. National Engineering Laboratory for Public Safety Risk Perception and Control by Big Data, Beijing 100846, China

Supported by: National Natural Science Foundation of China(61602215, 61672268)

Abstract

Objective Pedestrian detection is an important research topic in computer vision and plays a crucial role in all parts of life. The deformable part model (DPM) algorithm is a graphical model (Markov random field) that uses a series of parts and the spatial positional relationship of parts to represent an object. The DPM algorithm achieves superior detection accuracy in the field of pedestrian detection. However, because the DPM algorithm uses a sliding window strategy to search for objects in images, it handles a large number of candidate areas with low recall rates before constructing the feature pyramid. This property restricts the detection efficiency of the DPM algorithm. In view of this problem, this study proposes a novel model to improve the process of selecting candidate detection regions and puts forward the DPM algorithm integrated with a clustering algorithm with grid density and a selective search algorithm. Compared with the sliding window search method, the proposed model can provide fewer proposal windows and thus reduces computational complexity. Therefore, this study focuses on the advantages of the proposed model and the DPM algorithm to improve detection efficiency and accuracy. Method The proposed model contains three modules:the collection module of the coordinate points of moving targets, the generation module of frequently moving regions of targets, and the generation module of candidate detection windows. The modules are executed in series. Three frame difference methods and Gauss mixture models are used to detect moving targets in the first module. The centroid coordinates of each effective target are calculated by the obtained object contour, and a certain number of moving object coordinate points are collected and stored in the queue. In the second module, a G-cluster clustering algorithm based on grid structure and DBSCAN (density-based spatial clustering of applications with noise) clustering is proposed. The greatest advantage of the DBSCAN clustering algorithm is that it can find clusters of different sizes and shapes. However, this algorithm requires traversing every data point in the dataset, thereby leading to a high running time cost. Therefore, we draw lessons from the idea of the grid clustering algorithm based on the DBSCAN clustering algorithm. We develop a grid coordinate model and use the sliding window search method with an adaptive step size instead of neighborhood search to improve the method. This method can greatly reduce the number of data point searches and accelerate the speed of clustering. The specific experimental steps are as follows. The data points in the queue (QUE) are read, and the grid coordinate model is constructed. Frequently moving regions of targets are found by the sliding window strategy with an adaptive step size. Each region is dynamically adjusted with the aid of the moving object detection algorithm, and then the non-frequently moving regions are processed by masking to improve the effects of the candidate window generation module following the generation of frequently moving regions. Finally, the processed images are entered into the next module. In the third module, an improved selective search algorithm is introduced. As a mature image segmentation algorithm, the selective search algorithm can detect object proposals rapidly and effectively. It can also satisfy the accuracy and real-time requirements of pedestrian detection. Therefore, this thesis uses the selective search algorithm to extract a target from an image, and obtains a series of windows with high probability of complete target extraction. To further exclude the candidate windows with low credibility, we count and adjust the width-height ratio of the pedestrian contours in the INRIA dataset of public pedestrian image datasets. The range of the target width-height ratio is considered the condition for further screening candidate windows. Then, according to the coordinates of the final candidate detection window on the image, the corresponding features are extracted and entered into the classifier of the DPM algorithm. The final pedestrian detection window is obtained by classification. Result Experiments on the PETS 2009 Bench-mark dataset were conducted to evaluate the performance of the proposed algorithm. Results indicated that compared with the sliding window search strategy, our algorithm effectively reduced some redundant windows. Moreover, the detection efficiency of the DPM algorithm showed improvement. The average precision of the proposed method increased by 1.71%, the LAMR (log-average miss rate) decreased by 2.2%, and speed increased by more than threefold. Conclusion To deal with the problems of high computational complexity and high LAMR of the classical DPM algorithm, this study proposes a candidate field generation algorithm for pedestrian detection based on grid density clustering to improve the DPM model. This algorithm can realize effective candidate detection with a high recall rate, effectively improve the detection accuracy of the model, reduce the LAMR, and accelerate the detection speed. Furthermore, the proposed algorithm can effectively improve the pedestrian detection performance of the DPM model. However, the processing speed of the three-frame difference method and the Gauss model for background migration still requires improvement in the detection process, and further research is required for background migration processing.

Key words

deformable part model; grid density; selective search; pedestrian detection; candidate window

0 引言

近年来，人工智能开始服务于社会。行人检测技术是人工智能方法在视频监控中的核心内容，在自动驾驶、智能交通、公共安全监控等领域都是不可或缺的重要组成部分，具有重要的研究意义。目前实际应用中，行人检测主要采用基于特征提取的统计学习识别方法，该方法对硬件配置要求不高，在保证较好检测精度情形下，检测速度能够满足实时性要求，适用范围较广。常见的特征提取方法是通过不同的算法提取行人的外观、形状、纹理、颜色等特征，然后利用模式识别方法进行分类判别。不同于类似自行车、汽车等刚性目标，行人是非刚性物体，会发生形状变化，导致检测难度大，准确度不高。针对这一问题，Felzenswalb等人^[1]提出了基于统计学习的可变形部件模型(DPM)，该模型采用整体与部件联合进行检测，具有很高的检测精度，并在一定程度上能够克服行人部分遮挡问题，基本结构如图 1所示。

图 1 可变形部件模型的基本结构

Fig. 1 Basic structure of deformable part model

但该模型存在着目标特征维数较高导致检测时间过长问题，影响检测速度的主要因素是特征计算和对象定位。为解决这一问题，学者们提出了诸多改进方法。有学者着眼于对象定位的优化，杨扬等人^[2]提出一种可变形部件模型候选点检测算法，利用分割方法对检测的图像进行预处理，以得到的分割区域的左上角附件为候选点来减少候选检测区域，与运用滑动窗口进行行人检测的搜索相比，提高了检测速度；Wu等人^[3]采用经典的DPM模型，通过二值范数梯度(BING)^[4]及Edge-Boxes^[5]模型得到目标区域，构建每一个目标区域的特征金字塔，使用DPM模型进行窗口确认，但该方法较为繁琐，难以运用到实际中。Pedersoli等人^[6]提出一个多分辨率的分层部件模型和一个相应的从粗到精的推理过程，该过程从搜索空间中递归地消除未发现部分的位置，虽然达到了一定的提速效果，但影响了检测精度。也有学者通过算法优化特征计算来提高检测速度，Felzenszwalb等人^[7]在DPM基础上，提出了一种star-cascade DPM模型来加速检测，在较低的精度损失下，极大提高了模型的检测效率，但是存在较高的误检数(FP)。Song等人^[8]对部件中的稀疏线性组合进行编码，用来压缩部件数量，但对检测精度影响较大。

近年来，深度学习技术被应用于目标识别和行人检测领域，Girshick等人^[9-10]分析了传统的DPM方法和深度卷积神经网络(CNN)的特点和共性。Ren等人^[11]和Agrawal等人^[12]避开人工设计特征的提取，提出了基于区域卷积神经网络(R-CNN)的方法，通过调整深度网络的层数和卷积核大小等优化网络参数，让网络自动提取特征，但对计算机资源消耗较大，尚不能适用于分布式前端视频处理的要求。

1 可变形部件模型

完整的DPM模型由两层滤波器和变形关系组成，分别为一个根滤波器、部件滤波器和弹性变形关系。该模型采用了多组件模型的策略，将检测目标拆分为主体和部件，以行人为例，通过根滤波器检测行人的整体轮廓，匹配行人的全局特征，而部件滤波器则用来更加细致地检测行人某个具有明显判别作用的局部特征，例如头部、手臂等。由于行人这类非刚性目标的姿势和动作会发生显著变化，DPM模型这种基于星型结构的部件模型相应地采用的是弹性变形关系，即弹性模板，并通过该模板计算根部与部件之间的弹性变形损耗，增加检测的鲁棒性。

DPM的行人检测模型如图 2所示，图 2(a)为行人目标的根滤波器模型，图 2 (b)为行人目标的部件滤波器模型，该模型的分辨率是根滤波器的两倍，图 2 (c)为根滤波器与部件滤波器的变形关系，颜色越浅表示该部件偏离锚点位置的距离越大，产生的形变消耗越多，反之，形变消耗越少。

图 2 DPM的行人检测模型

Fig. 2 DPM pedestrian detection model((a)root filter; (b)part filter; (c)deformation relation)

通过可变形部件模型对图像进行检测，目标检测的最终得分是根滤波器和部件滤波器在特定位置上的得分加上偏差变量的总和，每个部件滤波器的得分等于该部件模型在图像的多个尺度上的所有位置的最大得分值，同时减去该部件滤波器相对于根滤波器的相对位置造成的偏移损耗，即为最终的部件滤波器分值，其得分计算为

$\begin{array}{l} G(z) = \sum\limits_{i = 0}^n {{F_i}} \cdot \varphi \left( {H, {p_i}} \right) - \\ \;\;\;\;\sum\limits_{i = 1}^n {{d_i}} \cdot {\phi _d}\left( {{\rm{d}}{x_i}, {\rm{d}}{y_i}} \right) + b \end{array} $

(1)

式中，$\left({{\rm{d}}{x_i}, {\rm{d}}{y_i}} \right) = \left({{x_i}, {y_i}} \right) - \left({2\left({{x_0}, {y_0}} \right) + {v_i}} \right)$，$z = \left({{p_0}, \cdots, {p_n}} \right)$。式(1)第1项表示根滤波器和部件滤波器的响应分数，通过$\varphi \left({H, {p_i}} \right)$函数计算每个部件滤波器相对于特征金字塔中的位置，其中$H$表示HOG特征金字塔，${F_i}$, $i$>0表示检测目标的各个部件滤波器，$F$₀表示根滤波器，${p_i} = \left({{x_i}, {y_i}, {s_i}} \right)$表示第$i$个滤波器的金字塔层数和位置坐标。第2项表示部件滤波器相对于根滤波器的相对位置的偏移损耗，${\phi _d}\left({{\rm{d}}{x_i}, {\rm{d}}{y_i}} \right)$是形变特征，$\left({{\rm{d}}{x_i}, {\rm{d}}{y_i}} \right)$表示第$i$个部件相对于根滤波器位置的坐标偏移，${d_i}$表示对应部件的变形系数，$b$表示偏差变量，保证不同模型之间的得分可比性。

假设根滤波器的位置为${p_0}$，接下来就要为每个部件滤波器寻找最优位置使得总分最大，即

$s\left( {{p_0}} \right) = {\max\limits_{{p_1}, \cdots , {p_n}}}s\left( {{p_0}, \cdots , {p_n}} \right) $

(2)

DPM模型是应用滑动窗口方法的一个典型算法，该模型使用滑动窗口在整个特征金字塔中计算分值，当根位置的分值超过阈值，说明该目标通过检测，该根模型和所用的部件模型覆盖的区域就是目标的范围。为了提高模型的检测速度，有很多研究者进行改进、优化检测过程，但是由于滑动窗口会检测每一个可能的位置，所以DPM模型的检测速度是较慢的。

2 基于网格密度聚类的检测候选区域生成算法

经典的可变形部件模型(DPM)通过传统的滑动窗口方式获得候选检测窗口来构建特征金字塔。但通过该方式将得到数量繁多的、召回率低的候选窗口，会降低检测精度，增加检测时间。

为解决这一问题，本文提出一种基于网格密度聚类的检测候选区域生成算法来改进可变形部件模型(DPM)的候选检测窗口，具体流程如图 3所示。

图 3 整体流程图

Fig. 3 Flow chart of DPM detection algorithm based on grid density clustering

该算法由4部分组成：1)读取视频流，利用三帧差法和高斯混合模型得到运动目标；2)保存运动目标移动的所有坐标点，采用基于网格密度的聚类方法(G-Cluster)得到潜在的目标区域；3)根据得到的潜在目标区域再次筛选，采用选择性搜索算法进行似物性检测，通过限制尺寸和宽高比对候选窗口进行二次筛选，获得少量的行人候选窗口；4)利用可变形部件模型完成检测目标验证，实现目标准确定位。

2.1 运动目标检测

三帧差分法^[13]可以获取前景目标，且计算复杂度低，满足实时性要求，本文采用三帧差分法获取前景目标，进行动态掩层处理、形态学处理，计算出运动目标轮廓，并获得有效目标的质心坐标，具体实现步骤如下:

1) 采用三帧差分法获取运动目标。当前帧的前景帧差为

$d = \left| {\frac{{{I_L}(x, y, i - 2) + {I_L}(x, y, i - 1)}}{2} - {I_L}(x, y, i)} \right| $

(3)

式中，${I_L}(x, y, i)$，${I_L}(x, y, i - 1)$，${I_L}(x, y, i - 2)$分别为第$i$、$i$-1和$i$-2帧的亮度分量，$d$为帧间图像对应像素点亮度差的绝对值。

2) 对运动目标进行二值化和形态学处理。二值化处理为

$D(x, y) = \left\{ {\begin{array}{*{20}{l}} {255}&{d(x, y)T}\\ 0&{d(x, y) < T} \end{array}} \right. $

(4)

式中，$D(x, y)$表示输出的灰度值，$d(x, y)$表示当前坐标$(x, y)$处的灰度值，$T$为二值化的阈值。阈值的选择关系着二值化的效果，对于光照均匀的图片，使用经典的全局阈值方法能取得不错的效果，但是由于实际应用中的监控场景常会遇到光照不均匀的情况，阈值$T$选取方法采用文献[14]的自适应阈值选取方法，使其对光照不均匀且目标与背景差别较小的弱目标图像的特征有较好的鲁棒性。随后通过膨胀和腐蚀，消除噪点，进一步排除一些无效目标。

3) 获取目标的轮廓，计算得到各个有效目标的质心坐标，保存至Que队列。

2.2 采用G-Cluster算法获取目标频繁运动区域

由于本研究是根据前景图像提取的坐标点来确定密度较大区域，需要借助聚类的方式找出坐标点的最大密集区域，因此采用聚类算法来解决这一问题。聚类算法有很多，大致可分为基于层次的聚类算法、基于划分的聚类算法、基于模型的聚类算法和基于密度^[15]的聚类算法，并且同一组数据使用不同的方法进行聚类分析，常常会得到不同的聚类结果。如图 5(b)所示，根据前景图像提取到的坐标点的分布情况，可知运动目标运动轨迹的坐标点连续性强，并且在某些位置分布比较密集，针对这一情况，运用基于密度的聚类方法最为合适。基于密度的聚类算法有$K$均值聚类算法和DBSCAN(density-based spatial clustering of applications with noise)算法^[16]等，其中DBSCAN算法通过不断地将密度足够高的对象压入簇群中，形成密度相连的最大集合，与其他聚类算法相比，该算法的最大优势在于可以发现不同大小的、任意形状的簇群，但是这一算法的缺点是需要遍历数据集中的每个数据点，时间复杂度高达${\rm{O}}\left({{n^2}} \right)$。由于本研究需要以更快的速度得到聚类的结果，因此结合网格聚类方法^[17-18]思想改进了DBSCAN算法，提出了基于网格的密度聚类算法，即G-Cluster算法。传统的DBSCAN聚类方法需要区分各个簇，所以会从一个没有被访问的数据点开始，在设定的邻域内循环查找密度可达的数据点，进行连接、聚类。但由于本研究仅需将所有网格分为属于簇和不属于簇两种类别，因此采用自适应步长的滑动窗口搜索方式替代邻域查找的方式进行改进，这种方法独立于对象数目，仅依赖于每个网格单元内的数据点，可以大大减少查找次数，提高聚类的速度。G-Cluster算法的具体内容为：将视频图像帧量化为有限数目的网格单元，形成网状结构，通过自适应步长的滑动窗口搜索数据对象，在网格结构的基础上进行聚类，得到最小描述区域。算法涉及以下4个基本定义：

图 5 G-Cluster提取目标频繁运动区域

Fig. 5 G-Cluster method extraction frequent movement area((a)original image; (b)get coordinates of moving target; (c)get frequent movement areas; (d) mask infrequent movement areas)

1) 网格单元。划分后的网格单元表示为${\mathit{\boldsymbol{d}}_i} = d\left({S{t_i}, N{m_i}, D{s_i}} \right)$，结构如图 4所示。其中，$Nm$表示当前网格单元含有的坐标点数量，$Ds$表示网格的稠密状态，$St$表示最小描述区域包含状态。

图 4 网格单元结构图

Fig. 4 Grid unit structure

2) 自适应步长滑动窗口。以3×3网格单元作为滑动窗口，通过自适应步长$L$进行滑动。默认步长为1个网格单元，若当前窗口被判定为簇，则将下一步长设置为3个网格单元。

3) 稠密网格。当一个网格单元${\mathit{\boldsymbol{d}}_i}$内数据点的数量超过阈值$\rho $($\rho = {N_{{\rm{sum}}}}/n$，${N_{{\rm{sum}}}}$为数据点的数量，$n$为网格数量)，就称为稠密网格，则$\mathit{\boldsymbol{d}}\left({D{s_i}} \right) = true$表示稠密网格单元。

4) 簇。稠密网格的最小描述区域，即频繁运动区域。若当前滑动窗口$\mathit{\boldsymbol{b}}$内稠密网格数量超过阈值$T$(经测试，$T$=6时效果最好)，则当前窗口内的网格单元都属于簇，即$\mathit{\boldsymbol{d}}\left({S{t_i}} \right) = \left\{ {{{ true }}|{\mathit{\boldsymbol{d}}_i} \in \mathit{\boldsymbol{b}}} \right\}$。

G-Cluster算法具体可分为以下几个步骤：图像坐标系上网格的划分、将数据分箱处理、稠密网格的识别和网格上聚类。

1) 输入视频图像帧，将图片分割成若干个互不相交的网格单元集合$\mathit{\boldsymbol{D}} = \left\{ {{\mathit{\boldsymbol{d}}_i}|i = 1, 2, \cdots, n} \right\}$，网格单元为${\mathit{\boldsymbol{d}}_i}$。

2) 采用三帧差分法结合高斯混合模型获取运动目标的轮廓，为了避免计算得到的目标频繁区域损失目标的上半部分、下半部分，计算采集轮廓上边界的中心坐标、质心坐标和下边界的中心坐标，存入Que队列，以网格左上角的坐标为参考，若坐标点包含网格单元${\mathit{\boldsymbol{d}}_i}$，则将$d\left({N{m_i}} \right)$加1。

3) 遍历网格单元集合，若$d\left({N{m_i}} \right) \ge \rho $(密度阈值)，则认为该网格是稠密网格，$d\left({D{s_i}} \right)$置为$true$；否则置为$false$。

4) 将滑动窗口$b$从左至右、从上至下扫描。若窗口$b$内稠密网格数量超过阈值$T$，则判定当前窗口$b$内的所有网格属于簇，将窗口内$b$的所有网格单元${d_i}$的$d\left({S{t_i}} \right)$置为$true$，同时将下一次的滑动步长设置为3个网格单元。当扫描结束时，所有$d\left({S{t_i}} \right)$为$true$的网格单元${\mathit{\boldsymbol{d}}_i}$为最小描述区域。

由于选择性搜索方法对纯色区域的采样是无效的，因此将非频繁运动区域填充为纯色，再输入到后续的算法中。具体操作如下：确定最小描述区域后，将不属于簇的网格单元设置黑色掩层，为了避免目标出现在非频繁网格区域造成漏检，本文采用自适应策略。当确定频繁运动网格区域后，若通过三帧差分法获得运动目标的质心坐标出现在非频繁运动网格区域内，且轮廓大于设置的最小目标轮廓$\mathit{\boldsymbol{are}}{\mathit{\boldsymbol{a}}_{\min }}$，则认为是有效目标，获取检测目标的轮廓，将包含轮廓区域的网格单元暂时设置为非频繁运动网格区域，即将网格单元${\mathit{\boldsymbol{d}}_i}$的$d\left({S{t_i}} \right)$置为$false$, 取消设置黑色掩层。G-Cluster方法提取目标频繁运动区域的关键步骤如图 5所示。

2.3 基于选择性搜索的ROI区域选取

传统的物体识别多数是根据滑动窗口的方法，在不同尺度的窗口中以一定的步长扫描整幅图像，往往会得到较多的无用窗口，耗时高、效率低。文献[19]提出一种选择性搜索方法来过滤掉一些无用的窗口，节省大量时间。选择性搜索综合了穷举搜索和图像分割的方法，采取组合策略保证了搜索的精确性，得到数量较少但召回率较高的目标区域，从而缩小了检测范围。但该算法搜索策略中融合了各分割区域之间的颜色、纹理、尺寸等相似度加权匹配，虽然整体搜索精度有所提高，但计算复杂度较高。为解决此问题，本文提出了使用均值感知哈希算法进行相似度计算的方法^[20]。选择性搜索的具体计算过程如下：

1) 输入(彩色)图像，使用文献[21]的方法将图像分割为若干区域$\mathit{\boldsymbol{R}} = \left\{ {{\mathit{\boldsymbol{d}}_1}, {\mathit{\boldsymbol{d}}_2}, \cdots, {\mathit{\boldsymbol{d}}_n}} \right\}$。

2) 初始化相似度集合$\boldsymbol{E}=\boldsymbol{\phi}$，从哈希指纹集合$\mathit{\boldsymbol{H}}$中取出对应的子区域的哈希指纹$c\left({{\mathit{\boldsymbol{d}}_n}} \right)$；计算相邻区域$\left({{\mathit{\boldsymbol{d}}_i}, {\mathit{\boldsymbol{d}}_j}} \right)$的相似度$\left. {{e_i}\left({c\left({{\mathit{\boldsymbol{d}}_i}} \right), c\left({{\mathit{\boldsymbol{d}}_j}} \right)} \right)} \right)$，添加到集合$\mathit{\boldsymbol{E}}$中，相似度集合为

$\mathit{\boldsymbol{E}} = {e_1}\left( {c\left( {{\mathit{\boldsymbol{d}}_1}} \right), c\left( {{\mathit{\boldsymbol{d}}_j}} \right)} \right) \cup \cdots \cup {e_k}\left( {\left( {c\left( {{\mathit{\boldsymbol{d}}_i}} \right), c\left( {{\mathit{\boldsymbol{d}}_j}} \right)} \right)} \right. $

(5)

从相似度集合$\mathit{\boldsymbol{E}}$中找出最大相似度$\max \left({{e_i}} \right)$的相邻区域对$\left({{\mathit{\boldsymbol{d}}_i}, {\mathit{\boldsymbol{d}}_j}} \right)$，将其合并为一个区域${\mathit{\boldsymbol{d}}_t} = {\mathit{\boldsymbol{d}}_i} \cup {\mathit{\boldsymbol{d}}_j}$。然后从相似度集合$\mathit{\boldsymbol{E}}$中剔除原先与区域${\mathit{\boldsymbol{d}}_i}$和${\mathit{\boldsymbol{d}}_j}$相邻区域之间计算的相似度。计算新的${\mathit{\boldsymbol{d}}_t}$区域与其相邻区域的相似度，将结果添加到相似度集合$\mathit{\boldsymbol{E}}$中，同时将${\mathit{\boldsymbol{d}}_t}$区域添加到区域集合$\mathit{\boldsymbol{R}}$中。

4) 重复步骤3)，直至相似度集合$\mathit{\boldsymbol{E}}$为空，即将可合并的区域都已合并完毕。

5) 获取区域集合$\mathit{\boldsymbol{R}}$中每个区域的边界区域，即图像中物体可能位置的集合$\mathit{\boldsymbol{L}}$。

相似度计算采用均值感知哈希算法，将图像子区域转换成灰度图，计算平均灰度值，将子区域内的所有像素点与平均灰度值比较，大于等于平均灰度值则置为1，否则置为0，从而得到哈希指纹。

采用图像的低频信息计算得出哈希指纹，利用了图像的大部分信息集中在低频信息中的特点，通过剔除掉图像的边缘、角点等高频信息，更好地表达图片的特征，与其他感知哈希算法相比，具有更快的运行速度。通过汉明距离对图像子区域的哈希指纹计算得出图像内容相似度。计算哈希特征$M, N$的相似度的汉明距离为

$d(M, N) = \sum\limits_{i = 1}^{{N_{{\rm{conth}}}}} {(mi \oplus ni)} $

(6)

式中，$m, n$分别为哈希特征$M, N$的哈希指纹值，本文取${N_{{\rm{count}}}} = 64$，表示64等级灰度信息；⊕为异或操作。为了进一步排除无用的候选框，本文方法对得到的候选窗口进行非最大抑制操作^[22]，然后通过行人轮廓高宽比的范围来进一步筛选候选对象。INRIA数据集包含3 548幅行人图像，计算各个行人轮廓的高宽比，如图 6所示，显示了各个行人的轮廓的高宽比的范围，即${T_{{\rm{high}}}} \ge 2.4$, ${T_{{\rm{low}}}} \le 1.6$。为降低漏检率，本文方法增加20%的容差，取行人轮廓高宽比范围为${T_{{\rm{high}}}} \ge 2.8$, ${T_{{\rm{low}}}} \le 1.3$。在得到的目标频繁运动区域的基础上，采用选择性搜索结合二次筛选得到的部分结果如图 7所示。可以看出，采用选择性搜索结合均值感知哈希算法获得数量较少但召回率高的候选目标窗口。在提高检测速度，满足实时性要求的同时，能够更精准地定位目标区域，减少检测范围。

图 6 行人轮廓高宽比分布

Fig. 6 Pedestrian contour aspect ratio distribution

图 7 获取候选窗口流程图

Fig. 7 Get candidate detection window ((a) original image; (b) image segmentation; (c) initial candidate detection window; (d) final candidate detection window)

2.4 算法复杂度分析

由于本文算法是对数据空间采用网格化处理，每当获取新的坐标数据都将快速记录到对应网格，完成坐标点的定位操作，时间复杂度为${\rm{O}}$(1)；待坐标点数据采集结束，将进行两次遍历图像网格的操作，第1次遍历是结合密度阈值, 依此判断各个网格是否为稠密网格，时间复杂度为${\rm{O}}\left(n \right)$，$n$为网格单元的数量；在第2次遍历中，设置3×3的网格块作为单元窗口遍历整个图像网格，根据形成簇原则，依此判断当前窗口是否可以合并，即是否形成簇，最终得到目标频繁运动区域，时间复杂度也是${\rm{O}}\left(n \right)$，综合而言，本算法的时间复杂度是${\rm{O}}\left(n \right)$。

3 实现结果与分析

实验环境为Intel(R)Pentium(R)CPU，3.60 GHz主频，8 GB内存，win7操作系统64位，开发平台为Pycharm+opencv3.3.0。文中训练数据选取的是PETS 2009 Bench-mark数据集^[23]的部分视频序列，该数据集为多个场景、不同光照条件下固定摄像头拍摄的视频序列，每帧图像分辨率为768×576像素，该数据集广泛应用于运动目标检测、行人检测以及敏感区域入侵的验证。本文选取场景不拥堵的S2L2视频序列(共795帧，包含8个场景，8个行人)，其中270帧作为本文方法提取目标频繁运动区域的训练集，其余525帧作为测试视频。

3.1 评价指标

实验使用平均精度(AP)、平均对数漏检率(LAMR)、检测速度作为评价的3个指标。其中，平均精度使用检测的窗口和标记的真实窗口之间的交集和并集的比(IOU)作为检测行人是否准确定位的衡量标准，即

${\rm{IOU}} = \frac{{{\mathop{ Area}\nolimits} \left( {{\mathit{\boldsymbol{R}}_d} \cap {\mathit{\boldsymbol{R}}_g}} \right)}}{{{\mathop{ Area}\nolimits} \left( {{\mathit{\boldsymbol{R}}_d} \cup {\mathit{\boldsymbol{R}}_g}} \right)}} $

(7)

式中，${\mathit{\boldsymbol{R}}_d}$表示对同一目标使用非极大值抑制^[19]合并后的窗口，${\mathit{\boldsymbol{R}}_g}$表示数据集标注的检测窗口。当IOU为阈值$I$时，才认为该行人目标被正确检测，本文设置$I$= 50%。平均对数漏检率是指在检测中未被检测出的行人与行人总数的比；检测速度是每秒传输的帧数，即视频的画面数。

3.2 实验结果

实验选取了包括本文方法在内的6种方法以平均精度、检测速度、平均对数漏检率3个指标进行比较，实验结果如表 1所示。

表 1 各算法性能比较
Table 1 Performance comparison of each algorithm

下载CSV

方法	平均精度/%	速度/ (帧/s)	平均对数漏检率/%
HOG + SVM	80.50	2.25	26.45
Haar + AdaBoost	77.60	71.20	28.65
DPM	84.96	0.49	24.86
DPM-选择性搜索	86.32	0.28	22.80
深度学习方法	88.42	0.58	23.02
本文方法	86.67	1.81	22.68
注：加粗表示最优结果，加下划线表示次优结果。

在本实验设置的环境下，传统的DPM检测耗时是HOG+SVM方法^[24]的5倍左右，检测速度非常慢，但是检测的平均精度比该方法高4.46%，同时平均对数漏检率降低了1.59%。Haar + AdaBoost系统^[25]的检测速度虽然很快，但是精度是所有算法中最低的，与传统的DPM相比，检测精度相差7.36%，平均对数漏检率也相差3.79%。DPM-选择性搜索法将传统的DPM与选择性搜索算法相结合进行检测，通过选择性搜索算法筛选得到少量的候选的检测窗口，通过该算法的似物性检测可以得到高召回率的候选窗口，结合DPM算法可以降低漏检率，平均对数漏检率比传统的DPM降低了2.06%，平均精度提高1.36%，但是速度降低为0.28帧/s。尽管选择性搜索可以得到更优质的候选窗口，降低了平均对数漏检率，提高了检测精度，但是由于该算法中相似度计算过于复杂，导致整体的检测速度比传统的DPM降低了57%。深度学习方法^[26]采用卷积神经网络进行行人检测，检测精度为88.42%，高于传统DPM方法3.46%，同时平均对数漏检率比传统的DPM算法降低了1.84%。该方法虽然检测精度、平均对数漏检率高于一般方法，但时间复杂度较高，会在一定程度上降低检测速度，不能满足实时性要求，仅为本文方法检测速度的32%，即本文检测速度是该深度学习方法的检测速度的3.1倍左右。

本文方法通过三帧差分法结合高斯混合模型得到一段时间的运动目标坐标点，使用G-Cluster方法计算得到频繁运动区域，并对该区域进行动态掩层处理，由于选择性搜索利用了同一物体在像素点尺度范围的相似性，并通过训练得到行人轮廓高宽比进行二次筛选，得到少量召回率高的候选检测区域，用于DPM行人检测，可减少检测区域，提高速度和召回率。实验表明，本文算法与传统DPM相比，精度提高了1.71%，比DPM-选择性搜索方法提高了0.35%，平均对数漏检率比传统DPM降低2.2%，检测速度提高为传统DPM的3.7倍左右，是DPM-选择性搜索的6.5倍左右。

本文算法与DPM算法比较的实验示例结果如图 8所示，分别选取第279、347、440、605帧作为测试样本进行分析，绿色框表示检测的结果。

图 8 本文算法与DPM算法比较

Fig. 8 Comparison between our algorithm and DPM algorithm ((a) DPM; (b) ours)

通过对比效果可以看出，第279帧图像中行人被灯柱遮挡，DPM算法出现漏检，虽然本文算法将多余部分也框出来，但是能够检测到行人的大体位置；第347帧图像中的行人发生遮挡，分析原因，虽然DPM算法能够较好地检测出每个行人，但由于选择性搜索算法针对颜色相近且互相遮挡的目标的鲁棒性较弱，本文算法将互相遮挡的两个行人识别为一个行人。后续研究将通过纹理特征与选择性搜索算法的相似度评估函数相融合来解决此类问题。第440帧图像中的行人站位分散，两种算法的检测效果都很优异；第605帧图像中DPM算法出现侧身行人目标的漏检情况，但本文方法可以正确检测到该行人目标，没有发生漏检情况，同时，在非频繁运动区域的行人(如图 8(b)右下图最下方的行人)也可以检测成功，说明动态更新频繁运动区域是有效的。实验表明，对于被遮挡的行人本文算法具有更好的检测效果，与传统的DPM相比，在一定程度上可以降低漏检率，提高检测精度和检测速度。

4 结论

针对经典的DPM算法计算复杂度高、漏检率较高的问题，本文提出一种基于网格密度聚类的行人检测候选域生成算法，首先使用三帧差法、高斯混合模型得到可接受的运动目标、保存质心和垂直边界中心坐标点，借鉴基于网格密度聚类思想，划分视频帧为网格结构，对网格单元进行聚类分析得到目标频繁运动区域。之后引入改进的选择性搜索算法，对目标频繁运动区域进行动态掩层处理后，输入至选择性搜索算法中进行似物性采样，得到候选检测窗口，然后根据INRIA数据集统计的高宽比对候选检测窗口进行精筛选，最后将窗口输入到DPM算法中进行行人检测。

本文算法可以得到数量较少、召回率高的候选检测，作为DPM检测算法的输入，有效地提高了检测精度，降低了漏检率，加快了检测速度，具有实际意义。但是本文算法采用三帧差法和高斯建模检测运动目标，对背景发生偏移情况的稳定性处理速度有待进一步提高。未来可以利用SuBSENSE算法加速背景处理，以期能进一步提高行人检测的性能。

参考文献

[1] Felzenszwalb P, McAllester D, Ramanan D. A discriminatively trained, multiscale, deformable part model[C]//Proceedings of 2008 IEEE Conference on Computer Vision and Pattern Recognition. Anchorage, AK, USA: IEEE, 2008: 1-8.[DOI: 10.1109/CVPR.2008.4587597]

[2] Yang Y, Li S P. Fast object detection with deformable part models and segment locations' hint[J]. Acta Automatica Sinica, 2012, 38(4): 540–548. [杨扬, 李善平. 分割位置提示的可变形部件模型快速目标检测[J]. 自动化学报, 2012, 38(4): 540–548. ] [DOI:10.3724/SP.J.1004.2012.00540]

[3] Wu X T, Kim K Y, Wang G Y, et al. Fast human detection using deformable part model at the selected candidate detection positions[M]//Ciucci D, Wang G Y, Mitra S, et al. Rough Sets and Knowledge Technology. Cham: Springer, 2015: 502-512.[DOI: 10.1007/978-3-319-25754-9_44]

[4] Cheng M M, Zhang Z M, Lin W Y, et al. BING: binarized normed gradients for objectness estimation at 300 fps[C]//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, OH, USA: IEEE, 2014: 3286-3293.[DOI: 10.1109/CVPR.2014.414]

[5] Zitnick C L, Dollár P. Edge boxes: locating object proposals from edges[C]//Proceedings of the 13th European Conference on Computer Vision. Zurich, Switzerland: Springer, 2014: 391-405.[DOI: 10.1007/978-3-319-10602-1_26]

[6] Pedersoli M, Vedaldi A, Gonzàlez J, et al. A coarse-to-fine approach for fast deformable object detection[J]. Pattern Recognition, 2015, 48(5): 1844–1853. [DOI:10.1016/j.patcog.2014.11.006]

[7] Felzenszwalb P, Girshick R, McAllester D, et al. Visual object detection with deformable part models[J]. Communications of the ACM, 2013, 56(9): 97–105. [DOI:10.1145/2494532]

[8] Song H O, Girshick R, Zickler S, et al. Generalized sparselet models for real-time multiclass object recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(5): 1001–1012. [DOI:10.1109/TPAMI.2014.2353631]

[9] Girshick R, Iandola F, Darrell T, et al. Deformable part models are convolutional neural networks[C]//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA, USA: IEEE, 2015: 437-446.[DOI: 10.1109/CVPR.2015.7298641]

[10] Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, OH, USA: IEEE, 2014: 580-587.[DOI: 10.1109/CVPR.2014.81]

[11] Ren S Q, He K M, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems. Montreal, Canada: ACM, 2015: 91-99.

[12] Agrawal R, Gehrke J, Gunopulos D, et al. Automatic subspace clustering of high dimensional data for data mining applications[C]//Proceedings of 1998 ACM SIGMOD International Conference on Management of Data. Seattle, Washington, USA: ACM, 1998: 94-105.[DOI: 10.1145/276304.276314]

[13] Han X W, Gao Y, Lu Z, et al. Research on moving object detection algorithm based on improved three frame difference method and optical flow[C]//Proceedings of the 5th International Conference on Instrumentation and Measurement, Computer, Communication and Control. Qinhuangdao, China: IEEE, 2015: 580-584.[DOI: 10.1109/IMCCC.2015.420]

[14] Guo J, Liu X Y, Wu B, et al. Binarisation method for images acquired under non-uniform illumination[J]. Computer Applications and Software, 2014, 31(3): 183–186, 202. [郭佳, 刘晓玉, 吴冰, 等. 一种光照不均匀图像的二值化方法[J]. 计算机应用与软件, 2014, 31(3): 183–186, 202. ] [DOI:10.3969/j.issn.1000-386x.2014.03.048]

[15] Sun T Y, Sun W, Xue M. Tracking multiple moving objects based on OPTICS and object probability model[J]. Journal of Image and Graphics, 2015, 20(11): 1492–1499. [孙天宇, 孙炜, 薛敏. OPTICS聚类与目标区域概率模型的多运动目标跟踪[J]. 中国图象图形学报, 2015, 20(11): 1492–1499. ] [DOI:10.11834/jig.20151108]

[16] Ester M, Kriegel H P, Sander J, et al. A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise[C]//Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining. Portland, Oregon, USA: ACM, 1996: 226-231.

[17] Li G X, Tang J, Yi L, et al. A multi-density grid clustering algorithm to calculate parameters automatically[J]. Computer & Digital Engineering, 2014, 42(7): 1141–1145. [李光兴, 唐俊, 易林, 等. 一种自动计算参数的多密度网格聚类算法[J]. 计算机与数字工程, 2014, 42(7): 1141–1145. ] [DOI:10.3969/j.issn1672-9722.2014.07.007]

[18] Olarte R, Bared J G, Sutherland L F, et al. Density models and safety analysis for rural unsignalised restricted crossing U-turn intersections[J]. Procedia - Social and Behavioral Sciences, 2011, 16: 718–728. [DOI:10.1016/j.sbspro.2011.04.491]

[19] Uijlings J R R, Van De Sande K E A, Gevers T, et al. Selective search for object recognition[J]. International Journal of Computer Vision, 2013, 104(2): 154–171. [DOI:10.1007/s11263-013-0620-5]

[20] Li Z Y, Zhu M L, Chen Z. Object tracking algorithm based on perception Hash technology[J]. Journal of Image and Graphics, 2015, 20(6): 795–804. [李子印, 朱明凌, 陈柱. 融合图像感知哈希技术的运动目标跟踪[J]. 中国图象图形学报, 2015, 20(6): 795–804. ] [DOI:10.11834/jig.20150609]

[21] Van De Sande K E A, Uijlings J R R, Gevers T, et al. Segmentation as selective search for object recognition[C]//Proceedings of 2011 IEEE International Conference on Computer Vision. Barcelona, Spain: IEEE, 2011: 1879-1886.[DOI: 10.1109/ICCV.2011.6126456]

[22] Chen J H, Ye X N. Improvement of non-maximum suppression in pedestrian detection[J]. Journal of East China University of Science and Technology:Natural Science Edition, 2015, 41(3): 371–378. [陈金辉, 叶西宁. 行人检测中非极大值抑制算法的改进[J]. 华东理工大学学报:自然科学版, 2015, 41(3): 371–378. ] [DOI:10.3969/j.issn.1006-3080.2015.03.015]

[23] Ferryman J, Shahrokni A. PETS2009: dataset and challenge[C]//Proceedings of the 12th IEEE International Workshop on Performance Evaluation of Tracking and Surveillance. Snowbird, UT, USA: IEEE, 2009: 1-6.[DOI: 10.1109/PETS-WINTER.2009.5399556]

[24] Ahmed A H, Kpalma K, Guedi A O. Human detection using HOG-SVM, mixture of Gaussian and background contours subtraction[C]//Proceedings of 2017 International Conference on Signal-Image Technology & Internet-Based Systems. Jaipur, India: IEEE, 2017: 334-338.[DOI: 10.1109/SITIS.2017.62]

[25] Monteiro G, Peixoto P, Nunes U. Vision-based pedestrian detection using Haar-like features[J]. Robotica, 2006, 24: 46–50.

[26] Tian Y L, Luo P, Wang X G, et al. Pedestrian detection aided by deep learning semantic tasks[C]//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA, USA: IEEE, 2015: 5079-5087.[DOI: 10.1109/CVPR.2015.7299143]