Current Issue Cover
融合多层特征的多尺度行人检测

曾接贤1,2, 方琦1, 符祥1, 冷璐1(1.南昌航空大学软件学院, 南昌 330063;2.南昌航空大学科技学院, 共青城 332020)

摘 要
目的 行人检测在自动驾驶、视频监控领域中有着广泛应用,是一个热门的研究话题。针对当前基于深度学习的行人检测算法在分辨率较低、行人尺度较小的情况下存在误检和漏检问题,提出一种融合多层特征的多尺度的行人检测算法。方法 首先,针对行人检测问题,删除了深度残差网络的一部分,仅采用深度残差网络的3个区域提取特征图,然后采用最邻近上采样法将最后一层提取的特征图放大两倍后再用相加法,将高层语义信息丰富的特征和低层细节信息丰富的特征进行融合;最后将融合后的3层特征分别输入区域候选网络中,经过softmax分类,得到带有行人的候选框,从而实现行人检测的目的。结果 实验结果表明,在Caltech行人检测数据集上,在每幅图像虚警率(FPPI)为10%的条件下,本文算法丢失率仅为57.88%,比最好的模型之一——多尺度卷积神经网络模型(MS-CNN)丢失率(60.95%)降低3.07%。结论 深层的特征具有高语义信息且感受野较大的特点,而浅层的特征具有位置信息且感受野较小的特点,融合两者特征可以达到增强深层特征的效果,让深层的特征具有较为丰富的目标位置信息。融合后的多层特征图具有不同程度的细节和语义信息,对检测不同尺度的行人有较好的效果。所以利用融合后的特征进行行人检测,能够提高行人检测性能。
关键词
Multi-scale pedestrian detection algorithm with multi-layer features

Zeng Jiexian1,2, Fang Qi1, Fu Xiang1, Leng Lu1(1.School of Software, Nanchang Hangkong University, Nanchang 330063, China;2.School of Science and Technology, Nanchang Hangkong University, Gongqingcheng 332020, China)

Abstract
Objective Humans fully understand a picture, often classify different images, and understand all the information in each image, including the location and concept of the object. This task is called object detection and is one of the basic research areas in computer vision. Object detection consists of different subtasks, such as pedestrian detection and skeleton detection. Pedestrian detection is a key link in object detection and one of the difficult tasks. This study mainly investigates pedestrian detection in traffic scenes, which is one of the most valuable topics in the field of pedestrian detection. Pedestrian detection in traffic scenes has always been a key technology for intelligent video surveillance technology, unmanned technology, intelligent transportation, and other issues. In recent years, this topic has been the research focus in academic and industrial circles. With the upsurge of artificial intelligence technology development, a large number of computer vision technologies are widely used. Multi-scale pedestrian detection has great research value because the development and application of pedestrian detection has complex real scenes and different pedestrian scales. Pedestrian detection is widely used in the field of automatic driving and video surveillance and is a hot research topic. Current pedestrian detection algorithms based on deep learning have false detection and miss detection problems in the case of low resolution and small pedestrian scale. A multi-scale pedestrian detection algorithm based on multi-layer features is proposed. The proposed convolutional neural network exhibits improved accuracy of pedestrian detection by a level, and there has been no small progress in practical applications. The academic enthusiasm brought about by deep learning has enabled scholars to make great progress and breakthroughs in pedestrian detection in complex scenes. Deep learning in the future will be a major boost for pedestrian detection. Method The deep residual network is mainly used in the multi-objective classification field. After analyzing the network, only the feature maps of the three stages are used and the residual unit and the full connection layer of the last stage are deleted. The deep residual network is mainly used to extract the feature maps of the three stages. The feature map extracted by the last layer is doubled using the characteristics of the three feature maps and then added by the nearest neighbor sampling method. The features with rich high-level semantic information and the features with rich low-level detail information are combined to improve the detection effect. The merged three-layer features are encoded into the region proposal network, and the proposal frames with pedestrians are obtained through Softmax classification for pedestrian detection. In this work, four experiments are designed, three of which are used to verify the validity of the proposed method. Results are compared with the mainstream algorithm results. Comparative experiments indicate that simple stratification does not improve the effect and the effect of multi-layer fusion is unsatisfactory. Therefore, the method of adjacent layer fusion is selected, and the result of multi-scale pedestrian detection is directly compared with that of the deepest network. The effect of adjacent layer fusion is better than the result. All experimental results are compared, and the fusion results of the adjacent layers are the best. The rate of missed detection is lower than that of the mainstream algorithm. The network is fully convolved and consists end-to-end training through random downsampling and backpropagation. Each image contains a number of candidate boxes for positive and negative samples. However, directly taking the optimized sample will easily lead to loss bias to the negative sample because the number of negative samples is larger than that of positive samples. This study takes an image to select 256 anchors and calculates its loss. The ratio of the positive and negative samples is 1:1. This article will randomly initialize all new layers in the network, and the standard of initialization is from the zero mean standard deviation. The value set is 0.01, and the weight is taken from Gaussian distribution. The other layers are initialized by classifying the pre-trained model, and the entire training process iterates through two epochs. Result On the Caltech pedestrian detection dataset and under the condition that each image false alarm rate (FPPI) is 10%, the loss rate of the proposed algorithm is only 57.88%, which is decreased by 3.07% compared with the loss of one of the best models, namely, MS-CNN (multi-scale convolutional neural network) (60.95%). This work also adopts comparative experiment. The overall loss rate of Ped-RPN is 64.55%, which is worse than that of the proposed algorithm. The loss rate of the layered and then detected method (Ped-muti-RPN) is 77.15%, which is better than that of Ped-RPN method. Ped-fused-RPN is a detection algorithm that combines multiple layers. The result is 61.32%, and the effect is better than the proposed algorithm. Conclusion Small-scale pedestrians have the disadvantage of blurred images, which make the detection effect extremely poor and affect the overall multi-scale detection. In order to solve the problem of the sharp decline of small-scale pedestrian detection, this paper proposes a method of integrating deep semantic information and shallow detail features so the features of all scales have rich semantic information. The deep features have high semantic information, and the receptive field is small. The shallow features have positional information, and the receptive field is more fused. The two features can enhance the deep features, which have rich target position information. The merged feature map has different levels of detail and semantic information and has a good effect on detecting pedestrians of different scales.
Keywords

订阅号|日报