融合密度和精细分数的行人检测
Pedestrian detection method based on density and score refinement
- 2021年26卷第2期 页码:425-437
纸质出版日期: 2021-02-16 ,
录用日期: 2020-05-20
DOI: 10.11834/jig.200060
移动端阅览
浏览全部资源
扫码关注微信
纸质出版日期: 2021-02-16 ,
录用日期: 2020-05-20
移动端阅览
甄烨, 王子磊, 吴枫. 融合密度和精细分数的行人检测[J]. 中国图象图形学报, 2021,26(2):425-437.
Ye Zhen, Zilei Wang, Feng Wu. Pedestrian detection method based on density and score refinement[J]. Journal of Image and Graphics, 2021,26(2):425-437.
目的
2
行人检测是指使用矩形框和置信度找出图像或者视频中的所有行人。传统的图像行人检测方法对于姿态各异或者相互遮挡的行人无能为力。深度神经网络(deep neural networks,DNN)在目标检测领域表现出色,然而依然难以解决行人检测中一些问题。本文提出一种融合密度和精细分数的行人检测方法DC-CSP(density map and classifier modules with center and scale prediction)。
方法
2
首先,在CSP(center and scale prediction)网络的基础上添加密度图模块(density map module,DMM)和分类器模块(classifier module,CM),得到DC-CSP网络;然后,针对置信度不精确问题,利用不同模块对分数预测结果的互补性质,设计阶段分数融合(stage score fusion,SSF)规则对检测分数进行更新,使得行人置信度上升、背景置信度下降;最后,基于NMS(non-maximum suppression),利用估计的行人密度图,设计改进的自适应NMS(improved adaptive NMS,IAN)后处理方法,能够进一步改善检测结果,对相互遮挡行人提高交并比(intersection over union,IOU)阈值从而减少漏检,对单个行人降低IOU阈值从而减少错检。
结果
2
在公开数据集Citypersons和Caltech上进行定量和定性分析。定量分析中,与其他方法相比,本文方法在Citypersons数据集的Reasonable、Heavy、Partial以及Bare子集上,对数平均漏检率分别下降了0.8%、1.3%、1.0%和0.8%,在Caltech数据集的Reasonable和All子集上分别下降了0.3%和0.7%;在定性分析中,可视化结果表明,本文方法在一定程度上解决了各种不同场景下存在的相互遮挡行人漏检、单个行人错检以及置信度不精确等一系列问题。此外,消融实验证明了所设计模块及其对应规则的有效性。
结论
2
本文方法使用联合多个模块的卷积神经网络(convolutional neural network,CNN),针对密度特征、分类特征分别设计IAN方法和SSF规则,在一定程度上解决了相互遮挡行人漏检、单个行人错检以及置信度不精确的问题,在多个数据集上证明了方法的有效性和鲁棒性。
Objective
2
Pedestrian detection involves locating all pedestrians in images or videos by using rectangular boxes with confidence scores. Traditional pedestrian detection methods cannot handle situations with different postures and mutual occlusion. In recent years
deep neural networks have performed well in object detection
but they are still unable to solve some challenging issues in pedestrian detection. In this study
we propose a method called DC-CSP(density map and classifier modules with center and scale prediction) to enhance pedestrian detection by combining pedestrian density and score refinement. Under an anchor-free architecture
our method first refines the classification to obtain more accurate confidence scores and then uses different IoU (intersection over union) thresholds to handle varying pedestrian densities with the objective of reducing the omission of occluded pedestrians and the false detection of a single pedestrian.
Method
2
First
our DC-CSP network is primarily composed of a center and scale prediction(CSP) subnetwork
a density map module (DMM)
and a classifier module (CM). The CSP subnetwork includes a feature extraction module and a detection head module. The feature extraction module uses ResNet-50 as its backbone
in which output feature maps are down-sampled by 4
8
16
and 16 with respect to the input image. The shallower features provide more precise localization information
and the deeper features contain more semantic information with larger receptive fields. Thus
we fuse the multi-scale feature maps from all the stages into a single one with a deconvolution layer. Upon the concatenation of feature maps
the detection head module first uses a 3×3 convolutional layer to reduce channel dimension to 256 and then two sibling 1×1 convolutional layers to produce the center heat map and scale map. On the basis of the CSP subnetwork
we design a density estimation module that first utilizes the concatenated feature maps to generate the features of 128 channels via a 1×1 convolutional layer
and then concatenates them with the center heat map and scale map to predict a pedestrian density map with a convolutional kernel of 5×5. The density estimation module integrates diverse features and applies a large kernel to consider surrounding information
generating accurate density maps. Moreover
a CM is designed to use the bounding boxes transformed from the center heat map and the scale map as input. This module utilizes the concatenated feature maps to produce 256-channel features via a 3×3 convolutional layer and then classifies the produced features by using a convolutional layer with a 1×1 kernel. The majority of the confidence scores of the background are below a certain threshold; thus
we can obtain a threshold for easily distinguishing pedestrians from the background. Second
the detection scores in CSP are relatively low and the CM can better discriminate between pedestrians and the background. Therefore
to increase the confidence scores of pedestrians and simultaneously decrease that of the background in the final decision
we design a stage score fusion (SSF) rule to update the detection scores by utilizing the complementarity of the detection head module and CM. In particular
when the classifier judges a sample as a pedestrian
the SSF rule will slightly boost the detection scores. By contrast
when the classifier judges a sample as the background
the SSF rule will slightly decrease. In other cases
a comprehensive judgment will be made by averaging the scores from both modules. Third
an improved adaptive non-maximum suppression (NMS)
called the improved adaptive NMS(IAN) post-processing method
based on the estimated pedestrian density map is also proposed to improve the detection results further. In particular
a high IoU threshold will be used for mutually occluded pedestrians to reduce missed detection
and a low IoU threshold will be used for a single pedestrian to reduce false detection. In contrast with adaptive NMS
our IAN method fully considers various scenes. In addition
IAN is based on NMS rather than on soft NMS
and thus
it involves lower computational cost.
Result
2
To verify the effectiveness of the proposed modules
we conduct a series of ablation experiments in which C-CSP
D-CSP
and DC-CSP respectively represent the addition of the CM
DMM
and both modules to the CSP subnetwork. We conduct quantitative and qualitative analyses on two widely used public datasets
i.e.
Citypersons and Caltech
for each setting. The experimental results of C-CSP verify the rationality of the SSF rule and demonstrate that the confidence scores of pedestrians can be increased while that of the background can be decreased. Simultaneously
the experimental results of D-CSP demonstrate the effectiveness of the IAN method
which can considerably reduce missed detection and false detection. For the quantitative analyses of DC-CSP
its log-average miss rate decreases by 0.8%
1.3%
1.0%
and 0.8% in the Reasonable
Heavy
Partial
and Bare subsets of Citypersons
respectively
and decreases by 0.3% and 0.7% in the Reasonable and Allsubsets of Caltech
respectively
compared with those of other methods. For the qualitative analyses of DC-CSP
the visualization results show that our method can work well in various scenes
such as pedestrians occluded by other objects
smaller pedestrians
vertical structures
and false reflection. Pedestrians in different scenes can be detected more accurately
and the confidence scores are more convincing. Furthermore
our method can avoid numerous false detections in situations with a complex background.
Conclusion
2
In this study
we propose a deep convolutional neural network with multiple novel modules for pedestrian detection. In particular
the IAN method and the SSF rule are designed to utilize density and classification features
respectively. Our DC-CSP method can considerably alleviate issues in pedestrian detection
such as missed detection
false detection
and inaccurate confidence scores. Its effectiveness and robustness are verified on multiple benchmark datasets.
城市场景行人检测卷积神经网络(CNN)密度图分数融合自适应后处理
urban scenespedestrian detectionconvolutional neural network (CNN)density mapscore fusionadaptive post-processing
Brazil G, Yin X and Liu X M. 2017. Illuminating pedestrians via simultaneous detection and segmentation//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 4960-4969[DOI:10.1109/ICCV.2017.530http://dx.doi.org/10.1109/ICCV.2017.530]
Dalal N and Triggs B. 2005. Histograms of oriented gradients for human detection//Proceedings of 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Diego, USA: IEEE: 886-893[DOI:10.1109/CVPR.2005.177http://dx.doi.org/10.1109/CVPR.2005.177]
Dollár P, Tu Z W, Perona P and Belongie S. 2009. Integral channel features//Proceedings of the British Machine Vision Conference. London, UK: BMVA Press: #91[DOI:10.5244/c.23.91http://dx.doi.org/10.5244/c.23.91]
Felzenszwalb P F, Girshick R B, McAllester D and Ramanan D. 2010. Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9): 1627-1645[DOI: 10.1109/TPAMI.2009.167]
Liu S T, Huang D and Wang Y H. 2019b. Adaptive NMS: refining pedestrian detection in a crowd//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 6452-6461[DOI:10.1109/CVPR.2019.00662http://dx.doi.org/10.1109/CVPR.2019.00662]
Liu W, Liao S C, Hu W D, Liang X Z and Chen X. 2018. Learning efficient single-stage pedestrian detectors by asymptotic localization fitting//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 634-659[DOI:10.1007/978-3-030-01264-9_38http://dx.doi.org/10.1007/978-3-030-01264-9_38]
Liu W, Liao S C, Ren W Q, Hu W D and Yu Y N. 2019a. High-level semantic feature detection: a new perspective for pedestrian detection//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 5182-5191[DOI:10.1109/CVPR.2019.00533http://dx.doi.org/10.1109/CVPR.2019.00533]
Mao J Y, Xiao T T, Jiang Y N and Cao Z M. 2017. What can help pedestrian detection?//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 6034-6043[DOI:10.1109/CVPR.2017.639http://dx.doi.org/10.1109/CVPR.2017.639]
Ren S Q, He K M, Girshick R and Sun J. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks//Advances in neural information processing systems. Montreal, Canada: IEEE: 91-99
Song T, Sun L Y, Xie D, Sun H M and Pu S L. 2018. Small-scale pedestrian detection based on topological line localization and temporal feature aggregation//Proceedings of the European Conference on Computer Vision (ECCV). Munich, Germany: Springer: 554-569
Wang C C R and Lien J J J. 2007. AdaBoost learning for human detection based on histograms of oriented gradients//Proceedings of Asian Conference on Computer Vision. Berlin, Germany: Springer: 885-895
Wang X L, Xiao T T, Jiang Y N, Shao S, Sun J and Shen C H. 2018. Repulsion loss: detecting pedestrians in a crowd//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 7774-7783[DOI:10.1109/CVPR.2018.00811http://dx.doi.org/10.1109/CVPR.2018.00811]
Zhang L L, Lin L, Liang X D and He K M. 2016. Is faster R-CNN doing well for pedestrian detection?//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, The Netherlands: Springer: 443-457[DOI:10.1007/978-3-319-46475-6_28http://dx.doi.org/10.1007/978-3-319-46475-6_28]
Zhang S F, Wen L Y, Bian X, Lei Z and Li S Z. 2018. Occlusion-aware R-CNN: detecting pedestrians in a crowd//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 657-674[DOI:10.1007/978-3-030-01219-9_39http://dx.doi.org/10.1007/978-3-030-01219-9_39]
Zhang S, Benenson R and Schiele B. 2017. Citypersons: a diverse dataset for pedestrian detection//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Hawaii, USA: IEEE: 3213-3221
相关作者
相关机构