Pedestrian detection based on improved feature and GPU acceleration
Qi Meibin, Li Ji, Jiang Jianguo, Wang Cichun
School of Computer and Information, Hefei University of Technology, Hefei 230009, China
Supported by: National Natural Science Foundation of China(61771180)

# Abstract

Objective With the growing attention paid to public safety and security, video surveillance systems become increasingly important. Pedestrian detection is the first step in video surveillance systems when analyzing behaviors of pedestrians, so it plays a critical role in computer vision research area. Although pedestrian detection has realized several achievements in recent years, speed and accuracy continue to show rooms for improvement. On the one hand, pedestrian detection is time consuming as it requires extensive calculation and high dimension of features. On the other hand, the detection results are easily influenced by different environmental factors, such as the changes of illumination and background, different postures of pedestrians, and occlusion among pedestrians. As detection results depend on the performances of the feature set and classifier, the features should be adequately representative to distinguish a pedestrian from other objects and background. This study proposes an algorithm for pedestrian detection based on improved feature and graphic processing unit (GPU) acceleration to reduce detection time and improve detection rate. Method First, Canny operator is used to process the original images and obtain images with enhanced edge information. Second, the images are processed in three scales to reduce the interference of background and deformation effect of unified standardized. Third, the images are divided into six regions to address the occlusion problem among pedestrians. These regions are head, left arm, upper body, right arm, left leg, and right leg, which are divided according to the characteristics of pedestrian action. Thereafter, the scale invariant local ternary pattern (SILTP) feature is used instead of the local binary pattern (LBP) feature to improve the pixel of low resolution and varied illuminations of images. The SILTP feature in parallel is extracted as the texture feature in GPU space to reduce the time of calculation. At the same time, the gradient and amplitude information of the six regions is calculated in the GPU space, and the value of the gradient is weighed with the value of the distribution characteristics. Therefore, the improved histogram of oriented gradient (HOG) features with 180 dimensions are obtained. The dimensions are much lower compared with traditional HOG feature and the calculation time is reduced. Finally, the features extracted in the three scales is concatenated, including HOG and SILTP feature. All features are outputted to central processing unit (CPU) space from GPU space, achieving pedestrian detection by the linear support vector machine (SVM) classifier. Result The proposed algorithm is demonstrated on two datasets, namely, INRIA and NICTA. The INRIA dataset is presently the most widely used static pedestrian dataset. The backgrounds, pedestrians' postures, and occlusions among pedestrians in this dataset are complex. By contrast, the NICTA dataset contains a large number of pedestrian images in different sizes. Therefore, the detection results of INRIA and NICTA datasets are representative. The INRIA dataset contains 2 416 positive training and 1 218 negative training samples and 1 126 positive detecting and 453 negative detecting samples. The NICTA dataset contains 142 598 positive training and 90 605 negative training samples and 34 416 positive detecting and 42800 negative detecting samples. The proposed method achieves a detection rate of 99.80% and 99.91% on INRIA dataset and NICTA dataset, respectively. On the INRIA dataset, the acceleration ratio is 12.19 compared with the algorithm based on traditional HOG and LBP, and the acceleration ratio of feature extraction is more than 8.19. On the NICTA dataset, the acceleration ratio is 13.49 compared with the algorithm based on traditional HOG and LBP. Therefore, the proposed algorithm based on improved feature and GPU acceleration enhances the detection rate and reduces the detection time. Conclusion Experiment results show that the proposed algorithm based on improved feature and GPU acceleration perform better than other algorithms in terms of accuracy and speed. The improved feature exhibits strong robustness to the changes of illumination and environment and performs well despite the occlusion among pedestrians. Apart from its lower dimension which improves the speed of pedestrian detection, the algorithm provides accurate information on pedestrians. The proposed algorithm is suitable for most situations involving pedestrian detection, especially for images or videos in an environment with different illuminations and occlusions. Its speed also performs well, especially with large amount of calculation and high repeatability. The proposed algorithm based on improved feature and GPU acceleration can achieve effective and fast pedestrian detection and has practical value.

# Key words

pedestrian detection; graphic processing unit(GPU) acceleration; scale invariant local ternary pattern (SILTP)feature; histogram of oriented gradient(HOG) feature; support vector machine(SVM) classifier

# 1.2 SILTP特征提取

SILTP特征是一种改进的LBP描述算子，可以较好地克服光照引起的直方图不稳定问题，对于光线的突然变化有较好的适应性。同时对区域范围噪声具有良好的鲁棒性，特别是阴影、局部噪声等。因此实验采用SILTP特征与HOG特征串接来实现行人检测。SILTP算子采用00、01、10表示像素纹理，相比LBP算子，增加了1位。假设图像块中心像素点位置为$({x_{\rm{c}}}, {y_{\rm{c}}})$，中心像素点的灰度值为${I_{\rm{c}}}$，编码函数定义为

 ${s_\tau }({I_{\rm{c}}}, {I_k}) = \left\{ \begin{array}{l} 01\;\;{I_k} > (1 + \tau ){I_{\rm{c}}}\\ 10\;\;{I_k} < (1-\tau ){I_{\rm{c}}}\\ 00\;\;其他 \end{array} \right.$ (1)

 ${\rm{SILTP}}_{Q, R}^\tau ({x_{\rm{c}}}, {y_{\rm{c}}}) = \mathop \oplus \limits_{k = 0}^{Q-1} {s_\tau }({I_{\rm{c}}}, {I_k})$ (2)

# 1.3 改进HOG特征的提取

 $\left\{ \begin{array}{l} {G_x}(x, y) = I(x + 1, y)-I(x-1, y)\\ {G_y}(x, y) = I(x, y + 1)-I(x, y - 1) \end{array} \right.$ (3)

$(x, y)$处梯度幅值$G(x, y)$、梯度方向$\alpha (x, y)$

 $G(x, y) = \sqrt {{G_x}{{(x, y)}^2} + {G_y}{{(x, y)}^2}}$ (4)

 $\alpha (x, y) = {\tan ^{-1}}\left( {\frac{{{G_x}(x, y)}}{{{G_x}(x, y)}}} \right)$ (5)

Table 1 Detection rate based on HOG of different feature dimensions

 维数 4 5 6 7 8 9 10 11 12 13 14 15 检测率/% 83.66 86.31 87.99 88.42 88.96 89.31 91.60 91.42 91.17 90.96 90.74 90.90

 $\boldsymbol{F} = \{ {H_i} \times {t_i}{\rm{|}}1 \le i \le 10, i \in {\boldsymbol{\rm{N}}}\}$ (6)

${t_i}$为梯度方向第$i$个量化区间所对应的权值，与该量化区间对应的梯度值进行加权。接着串接6个区域加权后的特征，得到1个尺度下的HOG特征，同样提取另外两尺度下的特征，最终得到180维完整的HOG特征。

# 1.4 SILTP、HOG特征的GPU并行设置

SILTP特征提取基于像素点进行，HOG特征中Gamma校正只对各自像素点处理，水平垂直方向的梯度计算独立进行，因此可按像素点拆分，进行并行。虽然计算量大，但大部分为基础运算，计算过程并不复杂，而且重复性高，适合并行。因此利用GPU加速，分配1个像素点1个线程。线程块block大小取16×16像素，获得最大的利用率。以128×64像素图像为例，申请128×64线程，共有8×4个线程组。

GPU虽然计算能力强，但也存在一系列的延迟。主机与设备之间的数据传输非常耗时，且难以避免，影响整个算法的速度。为此需要减少数据传递，同时利用流技术，隐藏数据拷贝的延迟。GPU任务开始之前，先开辟整体样本的空间，将大量待检测数据一次性拷贝到显存上，计算结束后一次性输出到CPU，避免频繁的数据交互。

# 1.5 行人检测操作步骤

1) 利用canny算子对行人样本预处理，将样本分成6个区域；统计行人6个区域的梯度方向分布情况，得到对应的梯度权值。

2) 输入样本，将待检测样本串接，上传至GPU显存，并规格化为3个尺度。

3) GPU空间内，3个尺度下，分别提取6个区域HOG特征，并与梯度方向对应区间的权值加权，得到加权后的HOG特征。同时提取SILTP特征。

4) 串接提取后的SILTP特征、HOG特征，输出到CPU内存。

5) 将所提取特征输入到SVM训练得到的行人检测分类器中，进行行人检测。

# 2 实验与结果

 ${R_{{\rm{TP}}}} = \frac{{{D_{{\rm{TP}}}}}}{{{D_{{\rm{TP}}}} + {D_{{\rm{FN}}}}}}$ (7)

 ${R_{{\rm{FP}}}} = \frac{{{D_{{\rm{FP}}}}}}{{{D_{{\rm{FP}}}}{\rm{ + }}{D_{{\rm{TN}}}}}}$ (8)

 ${R_{MISS}} = \frac{{{D_{{\rm{FN}}}}}}{{{D_{{\rm{TP}}}} + {D_{{\rm{FN}}}}}}$ (9)

# 2.1.1 INRIA数据集上的实验结果

INRIA数据集是目前使用最广泛的静态行人检测数据库，存在背景复杂、光强变化、行人遮挡等情形，检测难度较大。训练集包括2 416个正样本，1 218个负样本，测试集包括1 126个正样本，453个负样本，检测针对所有行人目标。

Table 2 Detection rate of algorithm whether using canny operator

 canny算子 不使用 使用 检测率/% 97.42 99.80

Table 3 Detection rate of algorithm based on different texture features

 无纹理特征 LBP 不同维数SILTP 243 486 1 458 检测率/% 91.26 96.87 99.80 98.74 98.21

Table 4 Detection rate of algorithm based on different division methods

 无分区 3个区域 5个区域 6个区域 检测率/% 95.72 97.69 98.04 99.80

Table 5 Detection results of different algorithm on INRIA

 /% 算法 检测率 误检率 HOG 91.01 5.67 LBP 95.36 5.16 HOG+LBP 97.32 4.73 本文 99.80 3.91

# 2.1.2 NICTA数据集上的实验结果

NICTA是规模较大的静态行人数据库，包括25 551个行人样本，5 207个非行人样本。由于64×80像素规格的样本信息过于丰富，因此选择32×80像素规格样本。训练集包括142 598个正样本，90 605个负样本，测试集包括34 416个正样本，42 800个负样本。

Table 6 Detection results of different algorithm on NICTA

 /% 算法 检测率 误检率 HOG 92.82 5.40 LBP 95.23 5.19 HOG+LBP 97.82 4.68 本文 99.91 3.61

Table 7 Miss rate of different algorithms

 /% 算法 误检率 10-3 10-2 HOG 0.088 0.033 LBP 0.159 0.052 改进HOG 0.141 0.065 HOG+LBP 0.092 0.018 改进HOG+LBP 0.066 0.010 本文 0.052 0.007

# 2.2.1 INRIA数据集上的实验结果

Table 8 The time of feature extraction at different resolution

 分辨率/像素 CPU时间/s GPU时间/s 加速比 320×240 780.71 95.31 8.19 640×480 3 036.17 298.04 10.19 1 280×720 10 946.79 770.19 15.23 1 980×1 080 29 063.27 1 570.23 18.51

Table 9 Detection time of different algorithms on INRIA

 算法 检测时间/s 加速比 HOG 94.49 4.10 LBP 149.78 6.50 HOG+LBP 280.85 12.18 本文 23.05 1.00

# 2.2.2 NICTA数据集上的实验结果

Table 10 Detection time of different algorithms on NICTA

 算法 检测时间/s 加速比 HOG 427.56 5.12 LBP 558.38 6.68 HOG+LBP 1 126.42 13.49 本文 83.49 1.00

