发布时间: 2018-12-16 摘要点击次数: 全文下载次数: DOI: 10.11834/jig.180299 2018 | Volume 23 | Number 12 图像分析和识别

1. 东北大学信息科学与工程学院, 沈阳 110819;
2. 东北大学机器人科学与工程学院, 沈阳 110819
 收稿日期: 2018-05-10; 修回日期: 2018-07-07 基金项目: 国家自然科学基金项目（U1713216，61701101，U1613214）；沈阳市科研基金项目（17-87-0-00） 第一作者简介: 许茗, 1991年生, 男, 博士研究生, 主要研究方向为计算机视觉。E-mail:xuming_neu@foxmail.com;于晓升, 男, 讲师, 主要研究方向为基于水平集的图像分割、无线传感器网络室内定位、人工智能与模式识别等。E-mail:yuxiaosheng@mail.neu.edu.cn;陈东岳, 男, 副教授, 主要研究方向为计算机视觉、模式识别、机器学习、增强现实等。E-mail:chendongyue@ise.neu.edu.cn;贾同, 男, 教授, 主要研究方向为智能机器人3维全景视觉感知、工业视觉测量、医学影像计算机辅助诊断与深度学习等。E-mail:jiatong@ise.neu.edu.cn;茹敬雨, 男, 博士研究生, 主要研究方向为无线传感器网络。E-mail:rujingyu@hotmail.com. 中图法分类号: TP37 文献标识码: A 文章编号: 1006-8961(2018)12-1829-09

# 关键词

Pedestrian detection in complex thermal infrared surveillance scene
Xu Ming1, Yu Xiaosheng2, Chen Dongyue1, Wu Chengdong2, Jia Tong1, Ru Jingyu1
1. College of Information Science and Engineering, Northeastern University, Shenyang 110819, China;
2. Faculty of Robot Science and Engineering, Northeastern University, Shenyang 110819, China
Supported by: National Natural Science Foundation of China (U1713216, 61701101, U1613214)

# Abstract

Objective Pedestrian detection in complex thermal infrared surveillance is an important research topic in the field of computer vision. Pedestrian detection is a crucial task to be conducted in several practical applications, such as public security management, disaster relief, and intelligent surveillance. Existing thermal infrared-based pedestrian detection algorithms are generally composed of two steps. In the first step, several regions of interest (ROI) in thermal infrared imageries that are suspected to be containing human targets are generated. Subsequently, the second step verifies whether the ROI is a human target. The verification can be conducted by processing with a classifier after the extraction of features from the ROIs, and the classification task can be combined with the feature extraction task by adopting a deep learning method. However, most of the existing thermal infrared-based pedestrian detection algorithms remarkably rely on the assumption that the gray value of the human target in the image is higher than the environment in their first step, which renders the algorithms ineffective in dealing with high ambient temperature. The gray value inversion occurs with the increase of ambient temperature, that is, the environmental gray value in the thermal infrared imagery becomes higher than the human target gray value, which reduces the accuracy of the pedestrian detection algorithm. On this basis, a fully convolutional network pedestrian detection algorithm based on frequency domain saliency detection is proposed, which aims to improve the robustness of pedestrian detection systems for thermal infrared surveillance scenes and to achieve better accuracy in pedestrian detection. Method In the algorithm, a frequency domain-based saliency detection is first employed to generate the saliency map that can cover all pedestrian targets in the original thermal infrared imagery. The difference of the saliency detection-based method from existing methods is that its detection is related to the saliency of human targets rather than the effect of their gray value. Therefore, the generation of the following ROI map in the saliency detection-based method is not limited to the assumption that the gray value of the human target is high, which avoids the inaccuracies in detection caused by the failure of the assumption when ambient temperature is high. In addition, one full-size saliency map is generated in this algorithm rather than several sub-regions. Then, a fully convolutional network is constructed, where the ROI map generated by the saliency map and thermal infrared original imagery is defined as the network input, and the pedestrian target probability map is defined as the network output. The constructed fully convolutional network consists of two parts. The first part mainly refers to AlexNet and VGG network structures, which can be regarded as feature extraction module. The second part is the probability generation module that consists of three deconvolution layers with two size kernels. A sigmoid activation function is used in the last layer to generate the probability map of pedestrian targets, and the remaining layers use the ReLU activation function. The proposed thermal infrared pedestrian detection algorithm is trained to obtain the pedestrian probability map and achieve the detection of pedestrian target. Result The Ohio State University (OSU) thermal infrared pedestrian database in the infrared video dataset of OTCBVS, which has also been established by OSU, is employed to verify the algorithm, and a comparison between the proposed algorithm and five existing mature algorithms is conducted. A total of 10 sequences are captured from single viewpoint surveillance in the database that covers several weathers, such as sunny, cloudy, and rainy days, which enables the conduct of a comprehensive test on the efficiency of pedestrian detection algorithms. Apart from the methods that are not based on convolutional neural network, the performance of region-based convolutional neural network is plotted. The results show that the proposed algorithm can accurately detect pedestrian targets in various environmental conditions. Furthermore, the several sample results of different pedestrian detections are shown. Taking the miss rate-false positive indicator as a basis for comparison, the proposed algorithm achieves an average miss rate of 7% and performs better than the existing thermal infrared-based pedestrian detection methods and basic deep learning-based object detection methods. The proposed algorithm achieves a high detection rate and shows better robustness in dealing with gray value inversion in thermal infrared imageries. In the detection process, the proposed algorithm can remove the non-pedestrian targets and detect the most pedestrians in thermal imageries, especially when the environment scene is complex, such as the existence of other heat sources (street lights) or at day time. Conclusion A fully convolutional network pedestrian detection algorithm based on frequency domain saliency detection for thermal infrared surveillance scenes is proposed in this study. In the first step, a saliency detection method, which is robust to gray value inversion when the ambient temperature is high, such as in hot summer or at day time, is employed to generate a full-size ROI map. Subsequently, a fully convolutional network is used to output the probability map of pedestrian targets. The proposed algorithm can be trained and avoids the generation of many sub-regions, which renders it efficient without the requirement of redundant computing and storage space. Experiments are conducted, and the results show that the proposed method achieves an improvement in the robustness of pedestrian detection systems in various complex scenes and obtains a high pedestrian detection rate. The experimental results also verify the capability of the proposed method to enhance the detection of pedestrian targets in thermal infrared surveillance systems.

# Key words

computer vision; thermal infrared surveillance; pedestrian detection; saliency detection; fully convolutional network (FCN)

# 2 热红外行人目标检测算法

1) 对原始热红外图像进行显著性检测, 本文采用的是基于图像频域相位谱的显著性检测算法(PFT);

2) 将显著图作为掩膜, 由原始热红外图像生成ROI图;

3) 构建全卷积网络, 将感兴趣区域图作为网络输入, 网络输出为行人目标概率图;

4) 由行人目标概率图生成热红外图像中行人目标的标记框, 完成热红外图像中行人目标检测。

# 2.1 感兴趣区域图生成

 $f\left( {u, v} \right) = F\left( {I\left( {x, y} \right)} \right)$ (1)

 $p\left( {u, v} \right) = P\left( {f\left( {u, v} \right)} \right)$ (2)

 ${M_s}\left( {x, y} \right) = {U_d} * {\left\| {{F^{-1}}({e^{j\cdot p(u, v)}})} \right\|^2}$ (3)

# 2.2 全卷积网络模型

 $M_p^{{\rm{gt}}}\left( {x, y} \right) = \left\{ \begin{array}{l} 1\;\;\left( {x, y} \right) \in {\mathit{\boldsymbol{ \boldsymbol{\varOmega} }}_\mathit{p}}\\ 0\;\;\; {其他} \end{array} \right.$ (4)

 $\begin{array}{l} L =-\frac{1}{N}\sum\limits_{x, y} {{\mathit{\boldsymbol{M}}_p}} {\rm{ln}}(\mathit{\boldsymbol{M}}_p^{{\rm{gt}}}) + \\ (1-{\mathit{\boldsymbol{M}}_p}){\rm{ln}}(1-\mathit{\boldsymbol{M}}_p^{{\rm{gt}}}) \end{array}$ (5)

# 3.2 热红外场景行人检测性能测试

OSU热红外行人数据库中总计284帧热红外图像, 图像分辨率为360×240像素。从中随机抽取60%组成训练样本集, 并通过水平翻转在不影响行人目标的情况下扩大训练集的样本容量, 以提高训练效果。其余热红外图像组成测试样本集以测试行人目标检测算法性能。

# 参考文献

• [1] Ma Y L, Wu X K, Yu G Z, et al. Pedestrian detection and tracking from low-resolution unmanned aerial vehicle thermal imagery[J]. Sensors, 2016, 16(4): #446. [DOI:10.3390/s16040446]
• [2] Lee J H, Choi J S, Jeon E S, et al. Robust pedestrian detection by combining visible and thermal infrared cameras[J]. Sensors, 2015, 15(5): 10580–10615. [DOI:10.3390/s150510580]
• [3] Zhang L, Wu B, Nevatia R. Pedestrian detection in infrared images based on local shape features[C]//Proceeding of 2007 IEEE Conference on Computer Vision and Pattern Recognition. Minneapolis, MN, USA: IEEE, 2007: 1-8.[DOI: 10.1109/CVPR.2007.383452]
• [4] Biswas S K, Milanfar P. Linear support tensor machine with LSK channels:pedestrian detection in thermal infrared images[J]. IEEE Transactions on Image Processing, 2017, 26(9): 4229–4242. [DOI:10.1109/TIP.2017.2705426]
• [5] Li J F, Gong W G, Li W H, et al. Robust pedestrian detection in thermal infrared imagery using the wavelet transform[J]. Infrared Physics & Technology, 2010, 53(4): 267–273. [DOI:10.1016/j.infrared.2010.03.005]
• [6] Qi B, John V, Liu Z, et al. Pedestrian detection from thermal images:a sparse representation based approach[J]. Infrared Physics & Technology, 2016, 76: 157–167. [DOI:10.1016/j.infrared.2016.02.004]
• [7] Wang J T, Chen D B, Chen H Y, et al. On pedestrian detection and tracking in infrared videos[J]. Pattern Recognition Letters, 2012, 33(6): 775–785. [DOI:10.1016/j.patrec.2011.12.011]
• [8] Lin C F, Chen C S, Hwang W J, et al. Novel outline features for pedestrian detection system with thermal images[J]. Pattern Recognition, 2015, 48(11): 3440–3450. [DOI:10.1016/j.patcog.2015.04.024]
• [9] Ostovar A, Hellström T, Ringdahl O. Human detection based on infrared images in forestry environments[C]//Proceeding of the 13th International Conference on Image Analysis and Recognition. Póvoa de Varzim, Portugal: Springer, 2016.[DOI: 10.1007/978-3-319-41501-7_20]
• [10] Lakshmi A, Faheema A G J, Deodhare D. Pedestrian detection in thermal images:an automated scale based region extraction with curvelet space validation[J]. Infrared Physics & Technology, 2016, 76: 421–438. [DOI:10.1016/j.infrared.2016.03.012]
• [11] Zhao X Y, He Z X, Zhang S Y, et al. Robust pedestrian detection in thermal infrared imagery using a shape distribution histogram feature and modified sparse representation classification[J]. Pattern Recognition, 2015, 48(6): 1947–1960. [DOI:10.1016/j.patcog.2014.12.013]
• [12] Budzan S. Human detection in low resolution thermal images based on combined HOG classifier[C]//Proceeding of the International Conference on Computer Vision and Graphics. Warsaw, Poland: Springer, 2016: 304-315.[DOI: 10.1007/978-3-319-46418-3_27]
• [13] Cai Y F, Liu Z, Wang H, et al. Saliency-based pedestrian detection in far infrared images[J]. IEEE Access, 2017, 5: 5013–5019. [DOI:10.1109/ACCESS.2017.2695721]
• [14] Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[J]. arXiv preprint arXiv: 1311.2524, 2013: 580-587.
• [15] Girshick R. Fast R-CNN[C]//Proceeding of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE, 2015: 1440-1448.[DOI: 0.1109/ICCV.2015.169]
• [16] Hou X D, Zhang L Q. Saliency detection: a spectral residual approach[C]//Proceeding of 2007 IEEE Conference on Computer Vision and Pattern Recognition. Minneapolis, MN, USA: IEEE, 2007: 1-8.[DOI: 10.1109/CVPR.2007.383267]
• [17] Guo C L, Ma Q, Zhang L M. Spatio-temporal saliency detection using phase spectrum of quaternion fourier transform[C]//Proceeding of 2008 IEEE Conference on Computer Vision and Pattern Recognition. Anchorage, AK, USA: IEEE, 2008: 1-8.[DOI: 10.1109/CVPR.2008.4587715]
• [18] Johnson J, Karpathy A, Li F F. DenseCap: fully convolutional localization networks for dense captioning[C]//Proceeding of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2016: 4565-4574.[DOI: 10.1109/CVPR.2016.494]
• [19] Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation[C]//Proceeding of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA, USA: IEEE, 2015: 3431-3440.[DOI: 10.1109/CVPR.2015.7298965]
• [20] Gatys L A, Ecker A S, Bethge M. Image style transfer using convolutional neural networks[C]//Proceeding of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2016: 2414-2423.[DOI: 10.1109/CVPR.2016.265]
• [21] Johnson J, Alahi A, Li F F. Perceptual losses for real-time style transfer and super-resolution[C]//Proceeding of the 14th European Conference on Computer Vision. Amsterdam, The Netherlands: Springer, 2016: 694-711.[DOI: 10.1007/978-3-319-46475-6_43]
• [22] Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks[C]//Proceedings of the 25th International Conference on Neural Information Processing Systems. Lake Tahoe, Nevada: Curran Associates Inc., 2012: 1097-1105.
• [23] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[J]. arXiv preprint arXiv: 1409.1556, 2014.
• [24] Davis J W, Keck M A. A two-stage template approach to person detection in thermal imagery[C]//Proceeding of the 7th IEEE Workshops on Applications of Computer Vision. Breckenridge, CO, USA: IEEE, 2005: 364-369.[DOI: 10.1109/ACVMOT.2005.14]