融合密度和精细分数的行人检测

甄烨; 王子磊; 吴枫

doi:10.11834/jig.200060

图像分析和识别 | 浏览量 : 0 下载量: 0 CSCD: 2

PDF
导出
分享
收藏
专辑

融合密度和精细分数的行人检测
Pedestrian detection method based on density and score refinement
2021年26卷第2期页码：425-437
纸质出版日期： 2021-02-16 ，

录用日期： 2020-05-20
DOI： 10.11834/jig.200060
稿件说明：

移动端阅览

甄烨, 王子磊, 吴枫. 融合密度和精细分数的行人检测[J]. 中国图象图形学报, 2021,26(2):425-437.

Ye Zhen, Zilei Wang, Feng Wu. Pedestrian detection method based on density and score refinement[J]. Journal of Image and Graphics, 2021,26(2):425-437.
甄烨, 王子磊, 吴枫. 融合密度和精细分数的行人检测[J]. 中国图象图形学报, 2021,26(2):425-437. DOI： 10.11834/jig.200060.

Ye Zhen, Zilei Wang, Feng Wu. Pedestrian detection method based on density and score refinement[J]. Journal of Image and Graphics, 2021,26(2):425-437. DOI： 10.11834/jig.200060.

摘要

目的

行人检测是指使用矩形框和置信度找出图像或者视频中的所有行人。传统的图像行人检测方法对于姿态各异或者相互遮挡的行人无能为力。深度神经网络（deep neural networks，DNN）在目标检测领域表现出色，然而依然难以解决行人检测中一些问题。本文提出一种融合密度和精细分数的行人检测方法DC-CSP（density map and classifier modules with center and scale prediction）。

方法

首先，在CSP（center and scale prediction）网络的基础上添加密度图模块（density map module，DMM）和分类器模块（classifier module，CM），得到DC-CSP网络；然后，针对置信度不精确问题，利用不同模块对分数预测结果的互补性质，设计阶段分数融合（stage score fusion，SSF）规则对检测分数进行更新，使得行人置信度上升、背景置信度下降；最后，基于NMS（non-maximum suppression），利用估计的行人密度图，设计改进的自适应NMS（improved adaptive NMS，IAN）后处理方法，能够进一步改善检测结果，对相互遮挡行人提高交并比（intersection over union，IOU）阈值从而减少漏检，对单个行人降低IOU阈值从而减少错检。

结果

在公开数据集Citypersons和Caltech上进行定量和定性分析。定量分析中，与其他方法相比，本文方法在Citypersons数据集的Reasonable、Heavy、Partial以及Bare子集上，对数平均漏检率分别下降了0.8%、1.3%、1.0%和0.8%，在Caltech数据集的Reasonable和All子集上分别下降了0.3%和0.7%；在定性分析中，可视化结果表明，本文方法在一定程度上解决了各种不同场景下存在的相互遮挡行人漏检、单个行人错检以及置信度不精确等一系列问题。此外，消融实验证明了所设计模块及其对应规则的有效性。

结论

本文方法使用联合多个模块的卷积神经网络（convolutional neural network，CNN），针对密度特征、分类特征分别设计IAN方法和SSF规则，在一定程度上解决了相互遮挡行人漏检、单个行人错检以及置信度不精确的问题，在多个数据集上证明了方法的有效性和鲁棒性。

Abstract

Objective

Pedestrian detection involves locating all pedestrians in images or videos by using rectangular boxes with confidence scores. Traditional pedestrian detection methods cannot handle situations with different postures and mutual occlusion. In recent years

deep neural networks have performed well in object detection

but they are still unable to solve some challenging issues in pedestrian detection. In this study

we propose a method called DC-CSP(density map and classifier modules with center and scale prediction) to enhance pedestrian detection by combining pedestrian density and score refinement. Under an anchor-free architecture

our method first refines the classification to obtain more accurate confidence scores and then uses different IoU (intersection over union) thresholds to handle varying pedestrian densities with the objective of reducing the omission of occluded pedestrians and the false detection of a single pedestrian.

Method

First

our DC-CSP network is primarily composed of a center and scale prediction(CSP) subnetwork

a density map module (DMM)

and a classifier module (CM). The CSP subnetwork includes a feature extraction module and a detection head module. The feature extraction module uses ResNet-50 as its backbone

in which output feature maps are down-sampled by 4

and 16 with respect to the input image. The shallower features provide more precise localization information

and the deeper features contain more semantic information with larger receptive fields. Thus

we fuse the multi-scale feature maps from all the stages into a single one with a deconvolution layer. Upon the concatenation of feature maps

the detection head module first uses a 3×3 convolutional layer to reduce channel dimension to 256 and then two sibling 1×1 convolutional layers to produce the center heat map and scale map. On the basis of the CSP subnetwork

we design a density estimation module that first utilizes the concatenated feature maps to generate the features of 128 channels via a 1×1 convolutional layer

and then concatenates them with the center heat map and scale map to predict a pedestrian density map with a convolutional kernel of 5×5. The density estimation module integrates diverse features and applies a large kernel to consider surrounding information

generating accurate density maps. Moreover

a CM is designed to use the bounding boxes transformed from the center heat map and the scale map as input. This module utilizes the concatenated feature maps to produce 256-channel features via a 3×3 convolutional layer and then classifies the produced features by using a convolutional layer with a 1×1 kernel. The majority of the confidence scores of the background are below a certain threshold; thus

we can obtain a threshold for easily distinguishing pedestrians from the background. Second

the detection scores in CSP are relatively low and the CM can better discriminate between pedestrians and the background. Therefore

to increase the confidence scores of pedestrians and simultaneously decrease that of the background in the final decision

we design a stage score fusion (SSF) rule to update the detection scores by utilizing the complementarity of the detection head module and CM. In particular

when the classifier judges a sample as a pedestrian

the SSF rule will slightly boost the detection scores. By contrast

when the classifier judges a sample as the background

the SSF rule will slightly decrease. In other cases

a comprehensive judgment will be made by averaging the scores from both modules. Third

an improved adaptive non-maximum suppression (NMS)

called the improved adaptive NMS(IAN) post-processing method

based on the estimated pedestrian density map is also proposed to improve the detection results further. In particular

a high IoU threshold will be used for mutually occluded pedestrians to reduce missed detection

and a low IoU threshold will be used for a single pedestrian to reduce false detection. In contrast with adaptive NMS

our IAN method fully considers various scenes. In addition

IAN is based on NMS rather than on soft NMS

and thus

it involves lower computational cost.

Result

To verify the effectiveness of the proposed modules

we conduct a series of ablation experiments in which C-CSP

D-CSP

and DC-CSP respectively represent the addition of the CM

DMM

and both modules to the CSP subnetwork. We conduct quantitative and qualitative analyses on two widely used public datasets

i.e.

Citypersons and Caltech

for each setting. The experimental results of C-CSP verify the rationality of the SSF rule and demonstrate that the confidence scores of pedestrians can be increased while that of the background can be decreased. Simultaneously

the experimental results of D-CSP demonstrate the effectiveness of the IAN method

which can considerably reduce missed detection and false detection. For the quantitative analyses of DC-CSP

its log-average miss rate decreases by 0.8%

1.3%

1.0%

and 0.8% in the Reasonable

Heavy

Partial

and Bare subsets of Citypersons

respectively

and decreases by 0.3% and 0.7% in the Reasonable and Allsubsets of Caltech

respectively

compared with those of other methods. For the qualitative analyses of DC-CSP

the visualization results show that our method can work well in various scenes

such as pedestrians occluded by other objects

smaller pedestrians

vertical structures

and false reflection. Pedestrians in different scenes can be detected more accurately

and the confidence scores are more convincing. Furthermore

our method can avoid numerous false detections in situations with a complex background.

Conclusion

In this study

we propose a deep convolutional neural network with multiple novel modules for pedestrian detection. In particular

the IAN method and the SSF rule are designed to utilize density and classification features

respectively. Our DC-CSP method can considerably alleviate issues in pedestrian detection

such as missed detection

false detection

and inaccurate confidence scores. Its effectiveness and robustness are verified on multiple benchmark datasets.

关键词

城市场景行人检测卷积神经网络(CNN)密度图分数融合自适应后处理

Keywords

urban scenespedestrian detectionconvolutional neural network (CNN)density mapscore fusionadaptive post-processing

references

Brazil G, Yin X and Liu X M. 2017. Illuminating pedestrians via simultaneous detection and segmentation//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 4960-4969[DOI:10.1109/ICCV.2017.530http://dx.doi.org/10.1109/ICCV.2017.530]

Dalal N and Triggs B. 2005. Histograms of oriented gradients for human detection//Proceedings of 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Diego, USA: IEEE: 886-893[DOI:10.1109/CVPR.2005.177http://dx.doi.org/10.1109/CVPR.2005.177]

Dollár P, Tu Z W, Perona P and Belongie S. 2009. Integral channel features//Proceedings of the British Machine Vision Conference. London, UK: BMVA Press: #91[DOI:10.5244/c.23.91http://dx.doi.org/10.5244/c.23.91]

Felzenszwalb P F, Girshick R B, McAllester D and Ramanan D. 2010. Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9): 1627-1645[DOI: 10.1109/TPAMI.2009.167]

Liu S T, Huang D and Wang Y H. 2019b. Adaptive NMS: refining pedestrian detection in a crowd//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 6452-6461[DOI:10.1109/CVPR.2019.00662http://dx.doi.org/10.1109/CVPR.2019.00662]

Liu W, Liao S C, Hu W D, Liang X Z and Chen X. 2018. Learning efficient single-stage pedestrian detectors by asymptotic localization fitting//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 634-659[DOI:10.1007/978-3-030-01264-9_38http://dx.doi.org/10.1007/978-3-030-01264-9_38]

Liu W, Liao S C, Ren W Q, Hu W D and Yu Y N. 2019a. High-level semantic feature detection: a new perspective for pedestrian detection//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 5182-5191[DOI:10.1109/CVPR.2019.00533http://dx.doi.org/10.1109/CVPR.2019.00533]

Mao J Y, Xiao T T, Jiang Y N and Cao Z M. 2017. What can help pedestrian detection?//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 6034-6043[DOI:10.1109/CVPR.2017.639http://dx.doi.org/10.1109/CVPR.2017.639]

Ren S Q, He K M, Girshick R and Sun J. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks//Advances in neural information processing systems. Montreal, Canada: IEEE: 91-99

Song T, Sun L Y, Xie D, Sun H M and Pu S L. 2018. Small-scale pedestrian detection based on topological line localization and temporal feature aggregation//Proceedings of the European Conference on Computer Vision (ECCV). Munich, Germany: Springer: 554-569

Wang C C R and Lien J J J. 2007. AdaBoost learning for human detection based on histograms of oriented gradients//Proceedings of Asian Conference on Computer Vision. Berlin, Germany: Springer: 885-895

Wang X L, Xiao T T, Jiang Y N, Shao S, Sun J and Shen C H. 2018. Repulsion loss: detecting pedestrians in a crowd//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 7774-7783[DOI:10.1109/CVPR.2018.00811http://dx.doi.org/10.1109/CVPR.2018.00811]

Zhang L L, Lin L, Liang X D and He K M. 2016. Is faster R-CNN doing well for pedestrian detection?//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, The Netherlands: Springer: 443-457[DOI:10.1007/978-3-319-46475-6_28http://dx.doi.org/10.1007/978-3-319-46475-6_28]

Zhang S F, Wen L Y, Bian X, Lei Z and Li S Z. 2018. Occlusion-aware R-CNN: detecting pedestrians in a crowd//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 657-674[DOI:10.1007/978-3-030-01219-9_39http://dx.doi.org/10.1007/978-3-030-01219-9_39]

Zhang S, Benenson R and Schiele B. 2017. Citypersons: a diverse dataset for pedestrian detection//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Hawaii, USA: IEEE: 3213-3221

文章被引用时，请邮件提醒。

提交

深度学习行人检测方法综述

红外—可见光跨模态的行人检测综述

边缘加强的超高清视频质量评估

红外与可见光图像分组融合的视觉Transformer

深度学习多聚焦图像融合方法综述