![]()
融合非临近跳连与多尺度残差结构的小目标车辆检测
张浩,董锴龙,高尚兵,刘斌,华奇凡,张格(淮阴工学院;淮阴工学院,计算机与软件工程学院) 摘 要
目的 基于深度卷积神经网络的目标检测模型易受复杂环境(遮挡、光照、远距离、小目标等)影响导致漏检、误检和目标轮廓特征模糊的问题,现有模型难以直接泛化到航拍场景下的小目标检测任务。为有效解决上述问题,本文提出一种融合非临近跳连与多尺度残差结构的小目标车辆检测算法(non-adjacent hop network you only look once version 5s multi-scale residual edge contour feature extraction strategy,NHN-YOLOv5s-MREFE)。方法 首先,设计4种不同尺度的检测层,根据自身感受野大小,针对性的负责不同尺寸车辆的检测。其次,借鉴DenseNet密集跳连的思想,构建一种非临近跳连特征金字塔结构(non-adjacent hop network,NHN),通过跳连相加策略,在强化非临近层次信息交互的同时融合更多未被影响的原始信息,解决位置信息在传递过程中被逐渐稀释的问题,有效降低了模型的误检率。然后,以减少特征丢失为前提,引入反卷积和并行策略,通过参数学习实现像素填充和突破每一维度信息量的方式扩充小目标细节信息。接着,设计一种多尺度残差边缘轮廓特征提取策略(multi-scale residual edge contour feature extraction strategy,MREFE),遵循特征逐渐细化的原则,构建多尺度残差结构,采用双分支并行的方法捕获不同层级的多尺度信息,通过多尺度下的高语义信息与初始浅层信息的逐像素作差实现图像边缘特征提取,进而辅助网络模型完成目标分类。最后,采用K-Means++算法使聚类中心分散化,促使结果达到全局最优,加速模型收敛。结果 实验结果表明,在十字路口、沿途车道双场景下包含多种车辆类型的航拍图像数据集上,对比分析4种主流的目标检测方法,本文算法的综合性能最优。相较于基准模型(you only look once version 5s,YOLOv5s),精确率(precision,P)、召回率(recall,R)和平均精度均值(mean average precision,mAP)分别提升了13.7%、1.6%和8.1%。结论 本文提出的检测算法可以较好的平衡检测速度与精度,以增加极小的参数量为代价,显著地提升了检测精度, 并能够自适应复杂的交通环境,满足航拍场景下小目标车辆检测的实时性需求,在交通流量、密度等参数的测量和统计,车辆定位与跟踪等场景下有较高的应用价值。
关键词
Small-target vehicle detection by fusing non-adjacent hopping and multi-scale residual structures
Zhang Hao,Dong Kai-long,Gao Shang-bing,Liu Bin,Hua Qi-fan,Zhang Ge(School of Computer and Software Engineering,Huaiyin Institute of Technology) Abstract
Objective Target detection models based on deep convolutional neural networks are susceptible to complex environments (occlusion, illumination, long distance, small targets, etc.) leading to problems of missed detection, false detection and blurred target contour features, and existing models are difficult to be directly generalized to small target detection tasks in aerial photography scenarios. To effectively solve the above problems, this paper proposes a small-target vehicle detection algorithm (NHN-YOLOv5s-MREFE) that fuses non-adjacent hopping and multi-scale residual structure. Method Firstly, four different scales of detection layers are designed, which are targeted to be responsible for the detection of vehicles of different sizes according to their own perceptual field size. Secondly, drawing on the idea of DenseNet dense hopping, a non-adjacent hopping feature pyramid structure (NHN) is constructed, and through the hopping summing strategy, more unaffected original information is fused while strengthening the information interaction of non-adjacent layers, which solves the problem that the location information is gradually diluted during the transmission process and effectively reduces the false detection rate of the model is effectively reduced. Then, with the premise of reducing feature loss, a deconvolution and parallelism strategy is introduced to expand small target detail information by means of parameter learning to achieve pixel filling and breaking the amount of information in each dimension. Then, a multi-scale residual edge contour feature extraction strategy (MREFE) is designed to follow the principle of gradual feature refinement, build a multi-scale residual structure, and capture multi-scale information at different levels using a two-branch parallel approach The multi-scale residual structure is constructed by following the principle of gradual feature refinement, capturing multi-scale information at different levels using a two-branch parallel approach, and achieving image edge feature extraction by pixel-by-pixel difference between the high semantic information and the initial shallow information at multiple scales, and then assisting the network model to complete target classification. Finally, the K-Means++ algorithm is used to decentralize the clustering centers to drive the results to the global optimum and accelerate the convergence of the model. Result The experimental results show that the comprehensive performance of this algorithm is optimal when comparing four mainstream target detection methods on the aerial image dataset containing multiple vehicle types in dual scenarios of intersections and along lanes. Compared with the benchmark model (YOLOv5s), the precision (P), recall (R) and mean average precision (mAP) are improved by 13.7%, 1.6% and 8.1%, respectively. Conclusion The detection algorithm proposed in this paper can better balance detection speed and accuracy, and significantly improve detection accuracy at the cost of increasing the number of parameters by a very small amount, and can adapt to complex traffic environment to meet the real-time requirements of small target vehicle detection in aerial photography scenarios.
Keywords
|