夜间多场景的邻近感知实时行人检测算法

龚安; 李中浩; 梁辰宏

doi:10.11834/jig.220834

复杂场景图像目标智能检测 | 浏览量 : 0 下载量: 0 CSCD: 0

PDF
导出
分享
收藏
专辑

夜间多场景的邻近感知实时行人检测算法
NSPDet： real-time nearby-aware pedestrian detection algorithm for multi-scene surveillance at night
2023年28卷第9期页码：2693-2705
纸质出版日期： 2023-09-16 ，
DOI： 10.11834/jig.220834
稿件说明：

移动端阅览

龚安，李中浩，梁辰宏. 2023. 夜间多场景的邻近感知实时行人检测算法. 中国图象图形学报， 28(09):2693-2705

Gong An， Li Zhonghao， Liang Chenhong. 2023. NSPDet： real-time nearby-aware pedestrian detection algorithm for multi-scene surveillance at night. Journal of Image and Graphics， 28(09):2693-2705
龚安，李中浩，梁辰宏. 2023. 夜间多场景的邻近感知实时行人检测算法. 中国图象图形学报， 28(09):2693-2705 DOI： 10.11834/jig.220834.

Gong An， Li Zhonghao， Liang Chenhong. 2023. NSPDet： real-time nearby-aware pedestrian detection algorithm for multi-scene surveillance at night. Journal of Image and Graphics， 28(09):2693-2705 DOI： 10.11834/jig.220834.

摘要

目的

行人检测是自动驾驶、监控安防等领域的关键技术，为了解决目标检测算法在夜间复杂场景以及遮挡情况下造成的行人检测精度降低的问题，本文提出将低光增强算法（low-light image enhancement）添加到夜间行人检测任务中进行联合训练，并引入邻近感知模块（nearby objects hallucinator，NOH），提出了一种改进的夜间监控场景下的邻近感知行人检测算法（nearby-aware surveillance pedestrian detection algorithm，NSPDet）。

方法

为了提升夜间检测行人的准确率，在基线模型中加入低光增强模块（zero-reference deep curve estimation，Zero-DCE）。为了降低密集人群、遮挡造成的漏检、误检，利用NOH建模周围行人分布信息，提出了行人检测头（PedestrianHead）。为了减少模型参数，提升推理速度，本文利用深度可分离卷积将模型进行轻量化。

结果

在NightSurveillance数据集上进行3组消融实验，相比基线模型YOLOX（exceeding YOLO （you only look once） series），精度最优的NSPDet算法的AP（average precision）和AR（average recall）指标分别提升了10.1%和7.2%。此外，轻量化后的NSPDet模型参数减少了16.4 M，AP和AR分别衰减了7.6%和6.2%，但仍优于基线模型。在Caltech（Caltech pedestrian dataset）、CityPersons（a diverse dataset for pedestrian detection）和NightOwls（a pedestrians at night dataset）数据集上，与其他方法的对比实验表明，提出的夜间行人检测算法具有较低的平均误检率。

结论

提出的夜间行人检测算法，提升了基线模型夜间行人检测的精度，具备实时推理性能，在夜间复杂场景下表现出良好的鲁棒性。

Abstract

Objective

Pedestrian detection is a widely concerned topic in computer vision tasks. It is also a basic and critical technology in automatic driving assistance systems， visual surveillance， and behavior recognition. In the traffic environment， pedestrians and cyclists belong to the “vulnerable groups on the road”. The World Health Organization （WHO） statistics show that approximately half of all fatalities in road accidents involve pedestrians. Unlike conventional detection objects （e.g.， automobiles） with relatively stable structural characteristics， different limb activities of pedestrians exhibit the nonrigid characteristic of structural instability， thereby complicating pedestrian detection. Moreover， the night scene is difficult to navigate. However， insufficient domestic and international research on night pedestrian detection is currently lacking. Given insufficient illumination and local overexposure， pedestrian recognition algorithms are vulnerable to accuracy restrictions， leading to missing and incorrect detections. Therefore， nighttime pedestrian detection technology has important research and social value for ensuring pedestrian safety.

Method

The monitoring conditions at night are constrained by uneven and insufficient lighting. Thus， the acquired photos have inadequate exposure， which reduces the effectiveness of pedestrian detection. The present study suggests adding a low-light enhancement module （Zero-DCE） to the detector to boost the model’s nighttime detection performance and address the issue. We feed the regression loss of the detector and the detection location information to the low-light enhancement module for the joint training of the low-light image enhancement and pedestrian detection tasks to make the low-light image enhancement act as a positive gain for the pedestrian detection task. This approach maintains the regional continuity of pedestrian features in the image and avoids the degradation of detection accuracy caused by the pixel-level low-light enhancement operation that destroys the features in the pedestrian region. Pedestrian detection has a long history. In recent years， pedestrian detection strategies using histograms of oriented gradients （HOG） to model human features with a support vector machine （SVM） as a feature classifier have been widely studied. However， the traditional pedestrian detection methods are based on feature engineering. Moreover， the hand-crafted features have low accuracy and are not generalizable. In recent years， deep learning algorithms have started to be used for pedestrian detection tasks. The convolutional neural network （CNN） can extract high-level features and gradually becomes the mainstream pedestrian detection method. On the basis of whether the detection algorithm is based on region proposal， deep learning-based pedestrian detection algorithms can be broadly divided into two-stage and one-stage methods. Two-stage methods first use sliding windows to find preselected regions in the image. Then， the regions and the representative are classified and regressed. The representative methods are R-CNN and Faster R-CNN. The detection algorithm based on the region proposal can capture rich features. Thus， the detection accuracy is high. However， problems， such as redundancy of preselected regions and slow inference speed， exist. One-stage methods do not base on region proposal. However， they directly regress the target’s position in the image， thereby simplifying the detection process and accelerating inference speed. The representative methods are single shot multibox detector （SSD）， you only look once v3 （YOLOv3）， and YOLOX， proposed by MEGVII. In this study， the one-stage method YOLOX is finally selected as the baseline model for the consideration of detection accuracy and inference speed. The targeted optimization is performed for night scenes on the baseline. Additionally， a significant issue with pedestrian detection is the missing and incorrect detection brought on by interclass occlusion and dense crowds. The original non-maximum suppression（NMS） algorithm is susceptible to falsely deleting the detection box when numerous pedestrians are present and their distribution is concentrated. This scenario leads to pedestrian missing detection. Aiming at this problem， the present study reconsiders the NMS strategy in the model reasoning stage and introduces a nonmaximum suppression algorithm （nearby object hallucinatory （NOH）） that adds the distribution information of nearby pedestrian targets. We eliminate the dependence of NOH on region proposals， allowing it to be ported to the one-stage target detection algorithm. The bounding box features predicted by YOLOX are pooled into the same feature space. Then， we use a simple full connection module to build the location distribution and density information of nearby pedestrians required by NOH. The improved NOH module is combined with the original YOLOHead as Pedestrian-Head to obtain the final pedestrian detection information. We determine through experiments that adding such a full connection module effectively reduces the missing detection problem caused by occlusion， and the reasoning speed is slightly improved. However， full connection modules inevitably bring redundant parameters to the network. Therefore， this study further investigates the reduction of model volume. Deep separable convolution is also used in the lightweight model to maintain the accuracy of model detection and reduce the computational power required for reasoning. The floating-point computation of the lightweight model is reduced to 22.4 GFLOPs. In theory， our algorithm can meet the needs of real-time reasoning of mobile devices.

Result

We divided the ablation experiments into three groups for verification on the NightSurveillance dataset. Compared with the baseline model （YOLOX）， NSPDet increased the average precision （AP） and the average recall （AR） indices by 10.1 and 7.2， respectively. In addition， the parameters of the lightweight NSPDet model are reduced by 16.4 M. The AP attenuation and AR attenuation are 7.6 and 6.2， respectively. However， the lightweight NSPDet model is still better than the baseline model. The comparison experiments of other methods on Caltech， CityPersons， and NightOwls datasets show that the night pedestrian detection algorithm proposed in this study has a low average false detection rate.

Conclusion

The NSPDet algorithm proposed in this study improves the accuracy of the baseline model for pedestrian detection at night. The proposed algorithm also has the performance of real-time reasoning. This study optimizes the accuracy of the baseline model for pedestrian detection in various complex nighttime scenes， including low light， strong light interference， image blur， occlusion， and rainy weather. It has an important application value for promoting research in autonomous driving and intelligent transportation.

关键词

夜间行人检测低光增强YOLOX邻近感知模块（NOH）深度可分离卷积（DSC）

Keywords

night surveillance pedestrian detectionlow light enhancementYOLOXnearby objects hallucinatory （NOH）deep separable convolution（DSC）

references

Bodla N， Singh B， Chellappa R and Davis L S. 2017. Soft-NMS—improving object detection with one line of code//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice， Italy： IEEE： 5562-5570 ［DOI： 10.1109/ICCV.2017.593http://dx.doi.org/10.1109/ICCV.2017.593］

Brazil G， Xi Y and Liu X M. 2017. Illuminating pedestrians via simultaneous detection and segmentation//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice， Italy： IEEE： 4960-4969 ［DOI： 10.1109/ICCV.2017.530http://dx.doi.org/10.1109/ICCV.2017.530］

Dalal N and Triggs B. 2005. Histograms of oriented gradients for human detection//Proceedings of 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Diego， USA： IEEE： 886-893 ［DOI： 10.1109/CVPR.2005.177http://dx.doi.org/10.1109/CVPR.2005.177］

Doll􀆦r P， Appel R， Belongie S and Perona P. 2014. Fast feature pyramids for object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence， 36（8）： 1532-1545 ［DOI： 10.1109/TPAMI.2014.2300479http://dx.doi.org/10.1109/TPAMI.2014.2300479］

Dollar P， Wojek C， Schiele B and Perona P. 2012. Pedestrian detection： an evaluation of the state of the art. IEEE Transactions on Pattern Analysis and Machine Intelligence， 34（4）： 743-761 ［DOI： 10.1109/TPAMI.2011.155http://dx.doi.org/10.1109/TPAMI.2011.155］

Ge Z， Liu S T， Wang F， Li Z M and Sun J. 2021. YOLOX： exceeding YOLO series in 2021 ［EB/OL］. ［2022-09-26］. http：//arxiv.org/pdf/2107.08430.pdfhttp://arxiv.org/pdf/2107.08430.pdf

Girshick R， Donahue J， Darrell T and Malik J. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus， USA： IEEE： 580-587 ［DOI： 10.1109/CVPR.2014.81http://dx.doi.org/10.1109/CVPR.2014.81］

Guo C L， Li C Y， Guo J C， Loy C C， Hou J H， Kwong S and Cong R M. 2020. Zero-reference deep curve estimation for low-light image enhancement//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle， USA： IEEE： 1777-1786 ［DOI： 10.1109/CVPR42600.2020.00185http://dx.doi.org/10.1109/CVPR42600.2020.00185］

He K M， Zhang X Y， Ren S Q and Sun J. 2015. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence， 37（9）： 1904-1916. ［DOI： 10.1109/TPAMI.2015.2389824http://dx.doi.org/10.1109/TPAMI.2015.2389824］

Howard A G， Zhu M L， Chen B， Kalenichenko D， Wang W J， Weyand T， Andreetto M and Adam H. 2017. MobileNets： efficient convolutional neural networks for mobile vision applications ［EB/OL］. ［2022-09-26］. http：//arxiv.org/pdf/1704.04861.pdfhttp://arxiv.org/pdf/1704.04861.pdf

Liu S T， Huang D and Wang Y H. 2019. Adaptive NMS： refining pedestrian detection in a crowd//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach， USA： IEEE： 6452-6461 ［DOI： 10.1109/CVPR.2019.00662http://dx.doi.org/10.1109/CVPR.2019.00662］

Liu W， Anguelov D， Erhan D， Szegedy C， Reed S， Fu C Y and Berg A C. 2016. SSD： single shot MultiBox detector//Proceedings of the 14th European Conference on Computer Vision. Amsterdam， the Netherlands： Springer： 21-37 ［DOI： 10.1007/978-3-319-46448-0_2http://dx.doi.org/10.1007/978-3-319-46448-0_2］

Liu W J， Dong L B and Qu H C. 2021. Small-scale pedestrian detection based on improved R-FCN model. Journal of Image and Graphics， 26（10）： 2400-2410

刘万军，董利兵，曲海成. 2021. 改进R-FCN模型的小尺度行人检测. 中国图象图形学报， 26（10）： 2400-2410［DOI： 10.11834/jig.200287http://dx.doi.org/10.11834/jig.200287］

Neumann L， Karg M， Zhang S S， Scharfenberger C， Piegert E， Mistr S， Prokofyeva O， Thiel R， Vedaldi A， Zisserman A and Schiele B. 2018. NightOwls： a pedestrians at night dataset//Proceedings of the 14th Asian Conference on Computer Vision. Perth， Australia： Springer： 691-705 ［DOI： 10.1007/978-3-030-20887-5_43http://dx.doi.org/10.1007/978-3-030-20887-5_43］

Redmon J and Farhadi A. 2018. YOLOv3： an incremental improvement ［EB/OL］. ［2022-09-26］. http：//arxiv.org/pdf/1804.02767.pdfhttp://arxiv.org/pdf/1804.02767.pdf

Ren S Q， He K M， Girshick R and Sun J. 2017. Faster R-CNN： towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence， 39（6）： 1137-1149 ［DOI： 10.1109/TPAMI.2016.2577031http://dx.doi.org/10.1109/TPAMI.2016.2577031］

Toroyan T. 2009. Global status report on road safety. Injury Prevention， 15（4）： #286 ［DOI： 10.1136/ip.2009.023697http://dx.doi.org/10.1136/ip.2009.023697］

Wang L， Xu L S and Yang M H. 2016. Pedestrian detection in crowded scenes via scale and occlusion analysis//Proceedings of 2016 IEEE International Conference on Image Processing. Phoenix， USA： IEEE： 1210-1214 ［DOI： 10.1109/ICIP.2016.7532550http://dx.doi.org/10.1109/ICIP.2016.7532550］

Wang X， Chen J， Wang Z， Liu W， Satoh S， Liang C and Lin C W. 2021. When pedestrian detection meets nighttime surveillance： a new benchmark//Proceedings of the 29th International Joint Conference on Artificial Intelligence. Yokohama， Japan： ijcai.org： 509-515 ［DOI： 10.24963/ijcai.2020/71http://dx.doi.org/10.24963/ijcai.2020/71］

Wang X， Liang C， Chen C， Chen J， Wang Z， Han Z and Xiao C X. 2020. S3D： scalable pedestrian detection via score scale surface discrimination. IEEE Transactions on Circuits and Systems for Video Technology， 30（10）： 3332-3344 ［DOI： 10.1109/TCSVT.2019.2913114http://dx.doi.org/10.1109/TCSVT.2019.2913114］

Wang X Y， Han T X and Yan S C. 2009. An HOG-LBP human detector with partial occlusion handling//Proceedings of the 12th IEEE International Conference on Computer Vision. Kyoto， Japan： IEEE： 32-39 ［DOI： 10.1109/ICCV.2009.5459207http://dx.doi.org/10.1109/ICCV.2009.5459207］

Xu X K， Ma Y， Qian X and Zhang Y. 2021. Scale-aware EfficientDet： real-time pedestrian detection algorithm for automated driving. Journal of Image and Graphics， 26（1）： 93-100

徐歆恺，马岩，钱旭，张龑. 2021. 自动驾驶场景的尺度感知实时行人检测. 中国图象图形学报， 26（1）： 93-100［DOI： 10.11834/jig.200445http://dx.doi.org/10.11834/jig.200445］

Zhang L L， Lin L， Liang X D and He K M. 2016. Is faster R-CNN doing well for pedestrian detection?//Proceedings of the 14th European Conference on Computer Vision. Amsterdam， the Netherlands： Springer： 443-457 ［DOI： 10.1007/978-3-319-46475-6_28http://dx.doi.org/10.1007/978-3-319-46475-6_28］

Zhang S S， Benenson R and Schiele B. 2017. CityPersons： a diverse dataset for pedestrian detection//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu， USA： IEEE： 4457-4465 ［DOI： 10.1109/CVPR.2017.474http://dx.doi.org/10.1109/CVPR.2017.474］

Zhou P H， Zhou C， Peng P， Du J L， Sun X， Guo X W and Huang F Y. 2020. NOH-NMS： improving pedestrian detection by nearby objects hallucination//Proceedings of the 28th ACM International Conference on Multimedia. Seattle， USA： ACM： 1967-1975 ［DOI： 10.1145/3394171.3413617http://dx.doi.org/10.1145/3394171.3413617］

文章被引用时，请邮件提醒。

提交

暂无数据