复杂热红外监控场景下行人检测

许茗; 于晓升; 陈东岳; 吴成东; 贾同; 茹敬雨

doi:10.11834/jig.180299

图像分析和识别 | 浏览量 : 0 下载量: 5 CSCD: 7

PDF
导出
分享
收藏
专辑

复杂热红外监控场景下行人检测
Pedestrian detection in complex thermal infrared surveillance scene
2018年23卷第12期页码：1829-1837
收稿：2018-05-10，

修回：2018-7-7，

纸质出版：2018-12-16
DOI： 10.11834/jig.180299
稿件说明：

移动端阅览

许茗, 于晓升, 陈东岳, 吴成东, 贾同, 茹敬雨. 复杂热红外监控场景下行人检测[J]. 中国图象图形学报, 2018,23(12):1829-1837. DOI： 10.11834/jig.180299.

Ming Xu, Xiaosheng Yu, Dongyue Chen, Chengdong Wu, Tong Jia, Jingyu Ru. Pedestrian detection in complex thermal infrared surveillance scene[J]. Journal of Image and Graphics, 2018, 23(12): 1829-1837. DOI： 10.11834/jig.180299.

摘要

目的

复杂热红外监控场景中的行人检测问题是计算机视觉领域的重要研究内容之一，是公共安全、灾难救援以及智慧城市等实际应用中的重要基础任务。现今的热红外行人检测算法大多依据图像中人体目标的灰度值高于场景环境这一假设，导致当环境温度升高热红外图像发生灰度值反转时行人检测率较低。为提高行人检测系统在不同场景中的鲁棒性以及行人目标检测率，提出一种面向热红外监控场景的基于频域显著性检测的全卷积网络行人目标检测算法。

方法

该算法首先对热红外图像进行基于频域的显著性检测，生成对行人目标全覆盖的显著图；然后结合热红外原图像生成感兴趣区域图作为输入，以行人目标概率图为输出，搭建全卷积网络；最后，对热红外行人检测系统进行端对端训练，获取网络输出的行人目标概率图，进而实现行人目标检测。

结果

论文使用俄亥俄州立大学建立的红外视频数据集OTCBVS中的OSU热红外行人数据库对算法进行验证，与目前5种较为成熟的算法进行对比。实验结果表明，本文算法可以在各种场景中准确检测出行人目标，以MR-FP（丢失率—假阳率）为对比依据，本文算法7%的平均丢失率低于其他算法，具有更高的检测率，对热红外图像中的灰度值反转问题具有更好的鲁棒性。

结论

本文提出一种面向热红外监控场景的基于频域显著性检测的全卷积网络行人目标检测算法，在实现检测算法端对端训练的同时，提高了其对各种复杂场景的鲁棒性以及行人目标检测率，提升热红外监控系统中行人目标检测性能。

Abstract

Objective

Pedestrian detection in complex thermal infrared surveillance is an important research topic in the field of computer vision. Pedestrian detection is a crucial task to be conducted in several practical applications

such as public security management

disaster relief

and intelligent surveillance. Existing thermal infrared-based pedestrian detection algorithms are generally composed of two steps. In the first step

several regions of interest (ROI) in thermal infrared imageries that are suspected to be containing human targets are generated. Subsequently

the second step verifies whether the ROI is a human target. The verification can be conducted by processing with a classifier after the extraction of features from the ROIs

and the classification task can be combined with the feature extraction task by adopting a deep learning method. However

most of the existing thermal infrared-based pedestrian detection algorithms remarkably rely on the assumption that the gray value of the human target in the image is higher than the environment in their first step

which renders the algorithms ineffective in dealing with high ambient temperature. The gray value inversion occurs with the increase of ambient temperature

that is

the environmental gray value in the thermal infrared imagery becomes higher than the human target gray value

which reduces the accuracy of the pedestrian detection algorithm. On this basis

a fully convolutional network pedestrian detection algorithm based on frequency domain saliency detection is proposed

which aims to improve the robustness of pedestrian detection systems for thermal infrared surveillance scenes and to achieve better accuracy in pedestrian detection.

Method

In the algorithm

a frequency domain-based saliency detection is first employed to generate the saliency map that can cover all pedestrian targets in the original thermal infrared imagery. The difference of the saliency detection-based method from existing methods is that its detection is related to the saliency of human targets rather than the effect of their gray value. Therefore

the generation of the following ROI map in the saliency detection-based method is not limited to the assumption that the gray value of the human target is high

which avoids the inaccuracies in detection caused by the failure of the assumption when ambient temperature is high. In addition

one full-size saliency map is generated in this algorithm rather than several sub-regions. Then

a fully convolutional network is constructed

where the ROI map generated by the saliency map and thermal infrared original imagery is defined as the network input

and the pedestrian target probability map is defined as the network output. The constructed fully convolutional network consists of two parts. The first part mainly refers to AlexNet and VGG network structures

which can be regarded as feature extraction module. The second part is the probability generation module that consists of three deconvolution layers with two size kernels. A sigmoid activation function is used in the last layer to generate the probability map of pedestrian targets

and the remaining layers use the ReLU activation function. The proposed thermal infrared pedestrian detection algorithm is trained to obtain the pedestrian probability map and achieve the detection of pedestrian target.

Result

The Ohio State University (OSU) thermal infrared pedestrian database in the infrared video dataset of OTCBVS

which has also been established by OSU

is employed to verify the algorithm

and a comparison between the proposed algorithm and five existing mature algorithms is conducted. A total of 10 sequences are captured from single viewpoint surveillance in the database that covers several weathers

such as sunny

cloudy

and rainy days

which enables the conduct of a comprehensive test on the efficiency of pedestrian detection algorithms. Apart from the methods that are not based on convolutional neural network

the performance of region-based convolutional neural network is plotted. The results show that the proposed algorithm can accurately detect pedestrian targets in various environmental conditions. Furthermore

the several sample results of different pedestrian detections are shown. Taking the miss rate-false positive indicator as a basis for comparison

the proposed algorithm achieves an average miss rate of 7% and performs better than the existing thermal infrared-based pedestrian detection methods and basic deep learning-based object detection methods. The proposed algorithm achieves a high detection rate and shows better robustness in dealing with gray value inversion in thermal infrared imageries. In the detection process

the proposed algorithm can remove the non-pedestrian targets and detect the most pedestrians in thermal imageries

especially when the environment scene is complex

such as the existence of other heat sources (street lights) or at day time.

Conclusion

A fully convolutional network pedestrian detection algorithm based on frequency domain saliency detection for thermal infrared surveillance scenes is proposed in this study. In the first step

a saliency detection method

which is robust to gray value inversion when the ambient temperature is high

such as in hot summer or at day time

is employed to generate a full-size ROI map. Subsequently

a fully convolutional network is used to output the probability map of pedestrian targets. The proposed algorithm can be trained and avoids the generation of many sub-regions

which renders it efficient without the requirement of redundant computing and storage space. Experiments are conducted

and the results show that the proposed method achieves an improvement in the robustness of pedestrian detection systems in various complex scenes and obtains a high pedestrian detection rate. The experimental results also verify the capability of the proposed method to enhance the detection of pedestrian targets in thermal infrared surveillance systems.

关键词

Keywords

references

Ma Y L, Wu X K, Yu G Z, et al. Pedestrian detection and tracking from low-resolution unmanned aerial vehicle thermal imagery[J]. Sensors, 2016, 16(4):#446.[DOI:10.3390/s16040446]

Lee J H, Choi J S, Jeon E S, etal. Robust pedestrian detection by combining visible and thermal infrared cameras[J]. Sensors, 2015, 15(5):10580-10615.[DOI:10.3390/s150510580]

Zhang L, Wu B, Nevatia R. Pedestrian detection in infrared images based on local shape features[C ] //Proceeding of 2007 IEEE Conference on Computer Vision and Pattern Recognition. Minneapolis, MN, USA: IEEE, 2007: 1-8.[ DOI: 10.1109/CVPR.2007.383452 http://dx.doi.org/10.1109/CVPR.2007.383452 ]

Biswas S K, Milanfar P. Linear support tensor machine with LSK channels:pedestrian detection in thermal infrared images[J]. IEEE Transactions on Image Processing, 2017, 26(9):4229-4242.[DOI:10.1109/TIP.2017.2705426]

Li J F, Gong W G, Li W H, et al. Robust pedestrian detection in thermal infrared imagery using the wavelet transform[J]. Infrared Physics&Technology, 2010, 53(4):267-273.[DOI:10.1016/j.infrared.2010.03.005]

Qi B, John V, Liu Z, et al. Pedestrian detection from thermal images:a sparse representation based approach[J]. Infrared Physics&Technology, 2016, 76:157-167.[DOI:10.1016/j.infrared.2016.02.004]

Wang J T, Chen D B, Chen H Y, et al. On pedestrian detection and tracking in infrared videos[J]. Pattern Recognition Letters, 2012, 33(6):775-785.[DOI:10.1016/j.patrec.2011.12.011]

Lin C F, Chen C S, Hwang W J, et al. Novel outline features for pedestrian detection system with thermal images[J]. Pattern Recognition, 2015, 48(11):3440-3450.[DOI:10.1016/j.patcog.2015.04.024]

Ostovar A, Hellström T, Ringdahl O. Human detection based on infrared images in forestry environments[C ] //Proceeding of the 13th International Conference on Image Analysis and Recognition. Póvoa de Varzim, Portugal: Springer, 2016.[ DOI: 10.1007/978-3-319-41501-7_20 http://dx.doi.org/10.1007/978-3-319-41501-7_20 ]

Lakshmi A, Faheema A G J, Deodhare D. Pedestrian detection in thermal images:an automated scale based region extraction with curvelet space validation[J]. Infrared Physics&Technology, 2016, 76:421-438.[DOI:10.1016/j.infrared.2016.03.012]

Zhao X Y, He Z X, Zhang S Y, et al. Robust pedestrian detection in thermal infrared imagery using a shape distribution histogram feature and modified sparse representation classification[J]. Pattern Recognition, 2015, 48(6):1947-1960.[DOI:10.1016/j.patcog.2014.12.013]

Budzan S. Human detection in low resolution thermal images based on combined HOG classifier[C ] //Proceeding of the International Conference on Computer Vision and Graphics. Warsaw, Poland: Springer, 2016: 304-315.[ DOI: 10.1007/978-3-319-46418-3_27 http://dx.doi.org/10.1007/978-3-319-46418-3_27 ]

CaiY F, Liu Z, Wang H, et al. Saliency-based pedestrian detection in far infrared images[J]. IEEE Access, 2017, 5:5013-5019.[DOI:10.1109/ACCESS.2017.2695721]

Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[J]. arXiv preprint arXiv: 1311.2524, 2013: 580-587.

Girshick R. Fast R-CNN[C ] //Proceeding of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE, 2015: 1440-1448.[ DOI: 0.1109/ICCV.2015.169 http://dx.doi.org/10.1109/ICCV.2015.169 ]

Hou X D, Zhang L Q. Saliency detection: a spectral residual approach[C ] //Proceeding of 2007 IEEE Conference on Computer Vision and Pattern Recognition. Minneapolis, MN, USA: IEEE, 2007: 1-8.[ DOI: 10.1109/CVPR.2007.383267 http://dx.doi.org/10.1109/CVPR.2007.383267 ]

Guo C L, Ma Q, Zhang L M. Spatio-temporal saliency detection using phase spectrum of quaternion fourier transform[C ] //Proceeding of 2008 IEEE Conference on Computer Vision and Pattern Recognition. Anchorage, AK, USA: IEEE, 2008: 1-8.[ DOI: 10.1109/CVPR.2008.4587715 http://dx.doi.org/10.1109/CVPR.2008.4587715 ]

Johnson J, Karpathy A, Li F F. DenseCap: fully convolutional localization networks for dense captioning[C ] //Proceeding of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2016: 4565-4574.[ DOI: 10.1109/CVPR.2016.494 http://dx.doi.org/10.1109/CVPR.2016.494 ]

Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation[C ] //Proceeding of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA, USA: IEEE, 2015: 3431-3440.[ DOI: 10.1109/CVPR.2015.7298965 http://dx.doi.org/10.1109/CVPR.2015.7298965 ]

Gatys L A, Ecker A S, Bethge M. Image style transfer using convolutional neural networks[C ] //Proceeding of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2016: 2414-2423.[ DOI: 10.1109/CVPR.2016.265 http://dx.doi.org/10.1109/CVPR.2016.265 ]

Johnson J, Alahi A, Li F F. Perceptual losses for real-time style transfer and super-resolution[C ] //Proceeding of the 14th European Conference on Computer Vision. Amsterdam, The Netherlands: Springer, 2016: 694-711.[ DOI: 10.1007/978-3-319-46475-6_43 http://dx.doi.org/10.1007/978-3-319-46475-6_43 ]

Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks[C]//Proceedings of the 25th International Conference on Neural Information Processing Systems. Lake Tahoe, Nevada: Curran Associates Inc., 2012: 1097-1105.

Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[J]. arXiv preprint arXiv: 1409.1556, 2014.

Davis J W, Keck M A. A two-stage template approach to person detection in thermal imagery[C ] //Proceeding of the 7th IEEE Workshops on Applications of Computer Vision. Breckenridge, CO, USA: IEEE, 2005: 364-369.[ DOI: 10.1109/ACVMOT.2005.14 http://dx.doi.org/10.1109/ACVMOT.2005.14 ]