融合非临近跳连与多尺度残差结构的小目标车辆检测

张浩; 董锴龙; 高尚兵; 刘斌; 华奇凡; 张格

doi:10.11834/jig.230220

图像分析和识别 | 浏览量 : 0 下载量: 2 CSCD: 0

PDF
导出
分享
收藏
专辑

融合非临近跳连与多尺度残差结构的小目标车辆检测
Small-target vehicle detection by fusing non-adjacent hopping and multi-scale residual structures
2023年28卷第12期页码：3797-3809
纸质出版日期： 2023-12-16 ，
DOI： 10.11834/jig.230220
稿件说明：

移动端阅览

张浩，董锴龙，高尚兵，刘斌，华奇凡，张格. 2023. 融合非临近跳连与多尺度残差结构的小目标车辆检测. 中国图象图形学报， 28(12):3797-3809

Zhang Hao， Dong Kailong， Gao Shangbing， Liu Bin， Hua Qifan， Zhang Ge. 2023. Small-target vehicle detection by fusing non-adjacent hopping and multi-scale residual structures. Journal of Image and Graphics， 28(12):3797-3809
张浩，董锴龙，高尚兵，刘斌，华奇凡，张格. 2023. 融合非临近跳连与多尺度残差结构的小目标车辆检测. 中国图象图形学报， 28(12):3797-3809 DOI： 10.11834/jig.230220.

Zhang Hao， Dong Kailong， Gao Shangbing， Liu Bin， Hua Qifan， Zhang Ge. 2023. Small-target vehicle detection by fusing non-adjacent hopping and multi-scale residual structures. Journal of Image and Graphics， 28(12):3797-3809 DOI： 10.11834/jig.230220.

摘要

目的

基于深度卷积神经网络的目标检测模型易受复杂环境（遮挡、光照、远距离、小目标等）影响导致漏检、误检和目标轮廓特征模糊的问题，现有模型难以直接泛化到航拍场景下的小目标检测任务。为有效解决上述问题，提出一种融合非临近跳连与多尺度残差结构的小目标车辆检测算法（non-adjacent hop network you only look once version 5s multi-scale residual edge contour feature extraction strategy，NHN-YOLOv5s-MREFE）。

方法

首先，设计4种不同尺度的检测层，根据自身感受野大小，针对性地负责不同尺寸车辆的检测。其次，借鉴DenseNet密集跳连的思想，构建一种非临近跳连特征金字塔结构（non-adjacent hop network，NHN），通过跳连相加策略，在强化非临近层次信息交互的同时融合更多未被影响的原始信息，解决位置信息在传递过程中被逐渐稀释的问题，有效降低了模型的误检率。然后，以减少特征丢失为前提，引入反卷积和并行策略，通过参数学习实现像素填充和突破每1维度信息量的方式扩充小目标细节信息。接着，设计一种多尺度残差边缘轮廓特征提取策略（multi-scale residual edge contour feature extraction strategy，MREFE），遵循特征逐渐细化的原则，构建多尺度残差结构，采用双分支并行的方法捕获不同层级的多尺度信息，通过多尺度下的高语义信息与初始浅层信息的逐像素作差实现图像边缘特征提取，进而辅助网络模型完成目标分类。最后，采用K-Means++算法使聚类中心分散化，促使结果达到全局最优，加速模型收敛。

结果

实验结果表明，非临近跳连的特征金字塔与多尺度残差结构的多模态融合策略，在提升模型运行效率，降低模型计算资源消耗的同时，有效提升了小目标检测的准确性和鲁棒性。通过多场景、多时段、多角度的样本数据增强，强化了模型在不同场景下的泛化能力。最后，在十字路口、沿途车道双场景下包含多种车辆类型的航拍图像数据集上，对比分析4种主流的目标检测方法，本文算法的综合性能最优。相较于基准模型（YOLOv5s），精确率、召回率和平均精度均值分别提升了13.7%、1.6%和8.1%。

结论

本文算法可以较好地平衡检测速度与精度，以增加极小的参数量为代价，显著地提升了检测精度，并能够自适应复杂的交通环境，满足航拍场景下小目标车辆检测的实时性需求，在交通流量、密度等参数的测量和统计，车辆定位与跟踪等场景下有较高的应用价值。

Abstract

Objective

Target detection models based on deep convolutional neural networks are susceptible to complex environments （e.g.， occlusion， illumination， long distance， and small targets）， thereby leading to missed detection， false detection， and blurred target contour features. Moreover， the existing models cannot be easily generalized to small target detection tasks in aerial photography scenarios. To effectively solve these problems， this paper proposes a small-target vehicle detection algorithm called non-adjacent hop network you only look once version 5s multi-scale residual edge contour feature extraction strategy（NHN-YOLOv5s-MREFE） that fuses non-adjacent hopping and multi-scale residual structures.

Method

First， four different scales of detection layers are designed， which are targeted to be responsible for the detection of vehicles of different sizes according to their perceptual field size. Second， drawing on the idea of DenseNet dense hopping， a non-adjacent hopping feature pyramid structure is constructed， and through the hopping summing strategy， additional unaffected original information is fused while strengthening the information interaction of non-adjacent layers， thereby addressing the problem where the location information is gradually diluted during the transmission process and effectively reducing the false detection rate of the model. Third， under the premise of reducing feature loss， a deconvolution and parallelism strategy is introduced to expand small target detail information by means of parameter learning to achieve pixel filling and to break the amount of information in each dimension. Fourth， a multi-scale residual edge contour feature extraction strategy is designed to follow the principle of gradual feature refinement， build a multi-scale residual structure， and capture multi-scale information at different levels using a two-branch parallel approach. Fifth， a multi-scale residual structure is constructed following the principle of gradual feature refinement. This structure captures multi-scale information at different levels using a two-branch parallel approach， achieves image edge feature extraction based on the pixel-by-pixel difference between the high semantic information and the initial shallow information at multiple scales， and assists the network model in completing target classification. Finally， the K-Means++ algorithm is used to decentralize the clustering centers to drive the results to the global optimum and accelerate the convergence of the model.

Result

Experimental results show that the multimodal fusion strategy of the non-adjacent hopping and multi-scale residual structures effectively improves the accuracy and robustness of small target detection while enhancing the model operation efficiency and reducing the consumption of the model computational resources. The generalization ability of the model in different scenarios is strengthened through the enhancement of sample data in multiple scenarios， time periods， and perspectives. Finally， NHN-YOLOv5s-MREFE outperforms four mainstream target detection methods on an aerial image dataset containing multiple vehicle types in dual scenarios of intersections and along lanes. Compared with the benchmark model （YOLOv5s）， the Precision， Recall， and mean average precision of NHN-YOLOv5s-MREFE are improved by 13.7%， 1.6%， and 8.1%， respectively.

Conclusion

The proposed NHN-YOLOv5s-MREFE can balance detection speed and accuracy and significantly improve detection accuracy at the cost of increasing the number of parameters by a very small amount. This algorithm can also adapt to complex traffic environments to meet the real-time requirements of small target vehicle detection in aerial photography scenarios.

关键词

智能交通目标检测深度学习非临近跳连多尺度残差结构

Keywords

intelligent transportationtarget detectiondeep learningnon-adjacent hoppingmulti-scale residual structure

references

Azevedo C L， Cardoso J L， Ben-Akiva M， Costeira J P and Marques M. 2014. Automatic vehicle trajectory extraction by aerial remote sensing. Procedia-Social and Behavioral Sciences， 111： 849-858 ［DOI： 10.1016/j.sbspro.2014.01.119http://dx.doi.org/10.1016/j.sbspro.2014.01.119］

Cao J L， Li Y L， Sun H Q， Xie J， Huang K Q and Pang Y W. 2022. A survey on deep learning based visual object detection. Journal of Image and Graphics， 27（6）： 1697-1722

曹家乐，李亚利，孙汉卿，谢今，黄凯奇，庞彦伟. 2022. 基于深度学习的视觉目标检测技术综述. 中国图象图形学报， 27（6）： 1697-1722 ［DOI： 10.11834/jig.220069http://dx.doi.org/10.11834/jig.220069］

Cheng X B， Qiu G H， Jiang Y and Zhu Z M. 2021. An improved small object detection method based on Yolo V3. Pattern Analysis and Applications， 24（3）： 1347-1355 ［DOI： 10.1007/s10044-021-00989-7http://dx.doi.org/10.1007/s10044-021-00989-7］

Dai K， Xu L B， Huang S Y and Li Y L. 2022. Single stage object detection algorithm based on fusing strategy optimization selection and dual attention mechanism. Journal of Image and Graphics， 27（8）： 2430-2443

戴坤，许立波，黄世旸，李鋆铃. 2022. 融合策略优选和双注意力的单阶段目标检测. 中国图象图形学报， 27（8）： 2430-2443 ［DOI： 10.11834/jig.210204http://dx.doi.org/10.11834/jig.210204］

Dai Y M， Wu Y Q， Zhou F and Barnard K. 2021. Asymmetric contextual modulation for infrared small target detection//Proceedings of 2021 IEEE Winter Conference on Applications of Computer Vision. Waikoloa， USA： IEEE： 949-958 ［DOI： 10.1109/WACV48630.2021.00099http://dx.doi.org/10.1109/WACV48630.2021.00099］

Dalal N and Triggs B. 2005. Histograms of oriented gradients for human detection//Proceedings of 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Diego， USA： IEEE： 886-893 ［DOI： 10.1109/CVPR.2005.177http://dx.doi.org/10.1109/CVPR.2005.177］

Du W H， Li D X， Wang Q N and Wu S. 2022. Moving target detection based on improved frame difference and edge extraction algorithm. Science Technology and Engineering， 22（5）： 1944-1949

杜文汉，李东兴，王倩楠，武帅. 2022. 融合改进帧差和边缘提取算法的运动目标检测. 科学技术与工程， 22（5）： 1944-1949 ［DOI： 10.3969/j.issn.1671-1815.2022.05.027http://dx.doi.org/10.3969/j.issn.1671-1815.2022.05.027］

Girshick R. 2015. Fast R-CNN//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago， Chile： IEEE： 1440-1448 ［DOI： 10.1109/ICCV.2015.169http://dx.doi.org/10.1109/ICCV.2015.169］

Girshick R， Donahue J， Darrell T and Malik J. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus， USA： IEEE： 580-587 ［DOI： 10.1109/CVPR.2014.81http://dx.doi.org/10.1109/CVPR.2014.81］

He K M， Zhang X Y， Ren S Q and Sun J. 2015. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence， 37（9）： 1904-1916 ［DOI： 10.1109/TPAMI.2015.2389824http://dx.doi.org/10.1109/TPAMI.2015.2389824］

Huang G， Liu Z， Van Der Maaten L and Weinberger K Q. 2017. Densely connected convolutional networks//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu， USA： IEEE： 2261-2269 ［DOI： 10.1109/CVPR.2017.243http://dx.doi.org/10.1109/CVPR.2017.243］

Jiang Z T， Xiao Y， Zhang S Q， Zhu L H， He Y T and Zhai F S. 2023. Low-illumination object detection method based on Dark-YOLO. Journal of Computer-Aided Design and Computer Graphics， 35（3）： 441-451

江泽涛，肖芸，张少钦，朱玲红，何玉婷，翟丰硕. 2023. 基于Dark-YOLO的低照度目标检测方法. 计算机辅助设计与图形学学报， 35（3）： 441-451 ［DOI： 10.3724/SP.J.1089.2023.19354http://dx.doi.org/10.3724/SP.J.1089.2023.19354］

Lee C， Kim H J and Oh K W. 2016. Comparison of faster R-CNN models for object detection//Proceedings of the 16th International Conference on Control， Automation and Systems. Gyeongju， Korea （South）： IEEE： 107-110 ［DOI： 10.1109/ICCAS.2016.7832305http://dx.doi.org/10.1109/ICCAS.2016.7832305］

Li B Y， Xiao C， Wang L G， Wang Y Q， Lin Z P， Li M， An W and Guo Y L. 2023. Dense nested attention network for infrared small target detection. IEEE Transactions on Image Processing， 32： 1745-1758 ［DOI： 10.1109/TIP.2022.3199107http://dx.doi.org/10.1109/TIP.2022.3199107］

Li Y C， Chen Y T， Yuan S， Liu J L， Zhao X， Yang Y and Liu Y H. 2021. Vehicle detection from road image sequences for intelligent traffic scheduling. Computers and Electrical Engineering， 95： #107406 ［DOI： 10.1016/j.compeleceng.2021.107406http://dx.doi.org/10.1016/j.compeleceng.2021.107406］

Lin T Y， Doll􀅡r P， Girshick R， He K M， Hariharan B and Belongie S. 2017. Feature pyramid networks for object detection//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu， USA： IEEE： 936-944 ［DOI： 10.1109/CVPR.2017.106http://dx.doi.org/10.1109/CVPR.2017.106］

Lin T Y， Maire M， Belongie S， Hays J， Perona P， Ramanan D， Doll􀅡r P and Zitnick C L. 2014. Microsoft COCO： common objects in context//Proceedings of the 13th European Conference on Computer Vision. Zurich， Switzerland： Springer： 740-755 ［DOI： 10.1007/978-3-319-10602-1_48http://dx.doi.org/10.1007/978-3-319-10602-1_48］

Liu S， Qi L， Qin H F， Shi J P and Jia J Y. 2018. Path aggregation network for instance segmentation//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City， USA： IEEE： 8759-8768 ［DOI： 10.1109/CVPR.2018.00913http://dx.doi.org/10.1109/CVPR.2018.00913］

Liu W， Anguelov D， Erhan D， Szegedy C， Reed S， Fu C Y and Berg A C. 2016. SSD： single shot multibox detector//Proceedings of the 14th European Conference on Computer Vision. Amsterdam， the Netherlands： Springer： 21-37 ［DOI： 10.1007/978-3-319-46448-0_2http://dx.doi.org/10.1007/978-3-319-46448-0_2］

Luo X Z， Luo X N and Liu T J. 2022. Improved Yolo v3 via feature map design and anchor box design//Proceedings of the SPIE 12158， International Conference on Computer Vision and Pattern Analysis. Guangzhou， China： SPIE： 39-49 ［DOI： 10.1117/12.2626943http://dx.doi.org/10.1117/12.2626943］

Niknejad H T， Takeuchi A， Mita S and McAllester D. 2012. On-road multivehicle tracking using deformable object model and particle filter with improved likelihood estimation. IEEE Transactions on Intelligent Transportation Systems， 13（2）： 748-758 ［DOI： 10.1109/TITS.2012.2187894http://dx.doi.org/10.1109/TITS.2012.2187894］

Redmon J， Divvala S， Girshick R and Farhadi A. 2016. You only look once： unified， real-time object detection//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas， USA： IEEE： 779-788 ［DOI： 10.1109/CVPR.2016.91http://dx.doi.org/10.1109/CVPR.2016.91］

Wang H X， Zhang Y S， Song B， Chen D S and Yang Y. 2023. NAS-FPNLite object detection method fused with cross stage connection and inverted residual. Journal of Image and Graphics， 28（4）： 1004-1018

王红霞，张永善，宋邦，陈德山，杨益. 2023. 融合跨阶段连接与倒残差的NAS-FPNLite目标检测方法. 中国图象图形学报， 28（4）： 1004-1018 ［DOI： 10.11834/jig.211099http://dx.doi.org/10.11834/jig.211099］

Wang J M， Lai X F， Ye L， Zuo D W and Zhang L P. 2021. Medical image deblur using generative adversarial networks with channel attention. Computer Science， 48（S1）： 101-106

王建明，黎向锋，叶磊，左敦稳，张丽萍. 2021. 基于信道注意结构的生成对抗网络医学图像去模糊. 计算机科学， 48（S1）： 101-106 ［DOI： 10.11896/jsjkx.200600144http://dx.doi.org/10.11896/jsjkx.200600144］

Zhang W， Liu C S， Chang F L and Song Y. 2020. Multi-scale and occlusion aware network for vehicle detection and segmentation on UAV aerial images. Remote Sensing， 12（11）： #1760 ［DOI： 10.3390/rs12111760http://dx.doi.org/10.3390/rs12111760］

Zhang Y， Yuan B， Zhang J， Li Z W， Pang C X and Dong C H. 2022. Lightweight PM-YOLO network model for moving object recognition on the distribution network side//Proceedings of the 2nd Asia-Pacific Conference on Communications Technology and Computer Science. Shenyang， China： IEEE： 508-516 ［DOI： 10.1109/ACCTCS53867.2022.00109http://dx.doi.org/10.1109/ACCTCS53867.2022.00109］

文章被引用时，请邮件提醒。

提交