龚安1, 李中浩1, 梁辰宏2(1.中国石油大学(华东)计算机科学与技术学院, 青岛 266580;2.厦门大学信息学院, 厦门 361104)
目的 行人检测是自动驾驶、监控安防等领域的关键技术,为了解决目标检测算法在夜间复杂场景以及遮挡情况下造成的行人检测精度降低的问题,本文提出将低光增强算法(low-light image enhancement)添加到夜间行人检测任务中进行联合训练,并引入邻近感知模块(nearby objects hallucinator,NOH),提出了一种改进的夜间监控场景下的邻近感知行人检测算法(nearby-aware surveillance pedestrian detection algorithm,NSPDet)。方法 为了提升夜间检测行人的准确率,在基线模型中加入低光增强模块(zero-reference deep curve estimation,Zero-DCE)。为了降低密集人群、遮挡造成的漏检、误检,利用 NOH 建模周围行人分布信息,提出了行人检测头(PedestrianHead)。为了减少模型参数,提升推理速度,本文利用深度可分离卷积将模型进行轻量化。结果 在 NightSurveillance 数据集上进行 3 组消融实验,相比基线模型 YOLOX(exceeding YOLO(you only look once)series),精度最优的 NSPDet 算法的 AP(average precision)和 AR(average recall)指标分别提升了 10.1% 和 7.2%。此外,轻量化后的 NSPDet 模型参数减少了 16.4 M,AP 和 AR 分别衰减了 7.6% 和 6.2%,但仍优于基线模型。在 Caltech(Caltech pedestrian dataset)、CityPer-sons(a diverse dataset for pedestrian detection)和 NightOwls(a pedestrians at night dataset)数据集上,与其他方法的对比实验表明,提出的夜间行人检测算法具有较低的平均误检率。结论 提出的夜间行人检测算法,提升了基线模型夜间行人检测的精度,具备实时推理性能,在夜间复杂场景下表现出良好的鲁棒性。
NSPDet:real-time nearby-aware pedestrian detection algorithm for multi-scene surveillance at night
Gong An1, Li Zhonghao1, Liang Chenhong2(1.College of Computer Science and Technology, China University of Petroleum(East China), Qingdao 266580, China;2.School of Informatics, Xiamen University, Xiamen 361104, China)
Objective Pedestrian detection is a widely concerned topic in computer vision tasks.It is also a basic and critical technology in automatic driving assistance systems, visual surveillance, and behavior recognition.In the traffic environment, pedestrians and cyclists belong to the "vulnerable groups on the road".The World Health Organization(WHO)statistics show that approximately half of all fatalities in road accidents involve pedestrians.Unlike conventional detection objects(e.g., automobiles)with relatively stable structural characteristics, different limb activities of pedestrians exhibit the nonrigid characteristic of structural instability, thereby complicating pedestrian detection.Moreover, the night scene is difficult to navigate.However, insufficient domestic and international research on night pedestrian detection is currently lacking.Given insufficient illumination and local overexposure, pedestrian recognition algorithms are vulnerable to accuracy restrictions, leading to missing and incorrect detections.Therefore, nighttime pedestrian detection technology has important research and social value for ensuring pedestrian safety.Method The monitoring conditions at night are constrained by uneven and insufficient lighting.Thus, the acquired photos have inadequate exposure, which reduces the effectiveness of pedestrian detection.The present study suggests adding a low-light enhancement module(Zero-DCE)to the detector to boost the model's nighttime detection performance and address the issue.We feed the regression loss of the detector and the detection location information to the low-light enhancement module for the joint training of the low-light image enhancement and pedestrian detection tasks to make the low-light image enhancement act as a positive gain for the pedestrian detection task.This approach maintains the regional continuity of pedestrian features in the image and avoids the degradation of detection accuracy caused by the pixel-level low-light enhancement operation that destroys the features in the pedestrian region.Pedestrian detection has a long history.In recent years, pedestrian detection strategies using histograms of oriented gradients(HOG)to model human features with a support vector machine(SVM)as a feature classifier have been widely studied.However, the traditional pedestrian detection methods are based on feature engineering.Moreover, the hand-crafted features have low accuracy and are not generalizable.In recent years, deep learning algorithms have started to be used for pedestrian detection tasks.The convolutional neural network(CNN)can extract high-level features and gradually becomes the mainstream pedestrian detection method.On the basis of whether the detection algorithm is based on region proposal, deep learning-based pedestrian detection algorithms can be broadly divided into two-stage and one-stage methods.Two-stage methods first use sliding windows to find preselected regions in the image.Then, the regions and the representative are classified and regressed.The representative methods are R-CNN and Faster R-CNN.The detection algorithm based on the region proposal can capture rich features.Thus, the detection accuracy is high.However, problems, such as redundancy of preselected regions and slow inference speed, exist.One-stage methods do not base on region proposal.However, they directly regress the target's position in the image, thereby simplifying the detection process and accelerating inference speed.The representative methods are single shot multibox detector(SSD), you only look once v3(YOLOv3), and YOLOX, proposed by MEGVII.In this study, the one-stage method YOLOX is finally selected as the baseline model for the consideration of detection accuracy and inference speed.The targeted optimization is performed for night scenes on the baseline.Additionally, a significant issue with pedestrian detection is the missing and incorrect detection brought on by interclass occlusion and dense crowds.The original non-maximum suppression(NMS)algorithm is susceptible to falsely deleting the detection box when numerous pedestrians are present and their distribution is concentrated.This scenario leads to pedestrian missing detection.Aiming at this problem, the present study reconsiders the NMS strategy in the model reasoning stage and introduces a nonmaximum suppression algorithm(nearby object hallucinatory(NOH))that adds the distribution information of nearby pedestrian targets.We eliminate the dependence of NOH on region proposals, allowing it to be ported to the one-stage target detection algorithm.The bounding box features predicted by YOLOX are pooled into the same feature space.Then, we use a simple full connection module to build the location distribution and density information of nearby pedestrians required by NOH.The improved NOH module is combined with the original YOLOHead as Pedestrian-Head to obtain the final pedestrian detection information.We determine through experiments that adding such a full connection module effectively reduces the missing detection problem caused by occlusion, and the reasoning speed is slightly improved.However, full connection modules inevitably bring redundant parameters to the network.Therefore, this study further investigates the reduction of model volume.Deep separable convolution is also used in the lightweight model to maintain the accuracy of model detection and reduce the computational power required for reasoning.The floating-point computation of the lightweight model is reduced to 22.4 GFLOPs.In theory, our algorithm can meet the needs of real-time reasoning of mobile devices.Result We divided the ablation experiments into three groups for verification on the NightSurveillance dataset.Compared with the baseline model(YOLOX), NSPDet increased the average precision(AP)and the average recall(AR)indices by 10.1 and 7.2, respectively.In addition, the parameters of the lightweight NSPDet model are reduced by 16.4 M.The AP attenuation and AR attenuation are 7.6 and 6.2, respectively.However, the lightweight NSPDet model is still better than the baseline model.The comparison experiments of other methods on Caltech, CityPersons, and NightOwls datasets show that the night pedestrian detection algorithm proposed in this study has a low average false detection rate.Conclusion The NSPDet algorithm proposed in this study improves the accuracy of the baseline model for pedestrian detection at night.The proposed algorithm also has the performance of real-time reasoning.This study optimizes the accuracy of the baseline model for pedestrian detection in various complex nighttime scenes, including low light, strong light interference, image blur, occlusion, and rainy weather.It has an important application value for promoting research in autonomous driving and intelligent transportation.