Current Issue Cover
融合注意力机制与多检测层结构的伪装目标检测

赖杰, 彭锐晖, 孙殿星, 黄杰(哈尔滨工程大学)

摘 要
目的 伪装目标是目标检测领域一类重要研究对象,由于目标与背景融合度较高、视觉边缘性较差、特征信息不足,常规目标检测算法容易出现漏警、虚警,且检测精度不高。本文针对伪装目标检测的难点,基于YOLOv5算法提出了一种基于多检测层与自适应权重的伪装目标检测算法(Algorithm for detecting camouflage targets based on multi-detection layers and adaptive weight, MAH-YOLOv5)。方法 首先在网络预测头部中增加一个非显著目标检测层,提升网络对于像素占比极低、语义信息不足这类目标的感知能力;其次在特征提取骨干中融合注意力机制,调节卷积网络对特征信息不足目标的权重配比,使其更关注待检测的伪装目标;然后在网络训练过程中使用多尺度训练策略,进一步提升模型鲁棒性与泛化能力;最后定义了用于军事目标检测领域的漏警、虚警指标,并提出伪装目标综合检测指数。结果 实验在课题组采集的伪装数据集上进行训练和验证,结果表明本文方法在自制数据集上的mAP(mean average precision)达到76.64%,较YOLOv5算法提升3.89%;漏检率8.53%、虚警率仅有0.14%,较YOLOv5算法分别降低2.75%、0.56%;伪装目标综合检测能力指数高达88.17%。而与其他对比算法相比,本文方法的综合检测能力指数仅次于最先进的YOLOv8等算法。结论 本文方法在识别精度、漏检率等指标上均有较大改善,具有最优的综合检测能力,可为战场伪装目标的快速高精度检测识别提供技术支撑和借鉴参考。
关键词
The detection of camouflage targets is based on an attention mechanism and a multi-detection layer structure

LAI Jie, PENG Ruihui, SUN Dianxing, HUANG Jie(Harbin Engineering University)

Abstract
Objective Camouflage target identification is a critical area of research in computer vision. Its major goal is to extract information about the target"s position and categorization from a complex backdrop environment. In addition to being widely used in the military, camouflage target identification has significant application and research value in medical image segmentation, industrial defect detection, agricultural fruit detection, and other fields. Typically, there is a significant degree of fusion between the disguised target and the surrounding background environment, a poor visual edge, low resolution, and insufficient feature information. As a result, the conventional target detection algorithms struggle to meet the requirements of camouflage target detection, and typically have a high missed detection rate and low detection accuracy. To address the aforementioned issues, this work provides a camouflage target recognition approach based on YOLOv5 (MAH-YOLOv5). Method Firstly, the YOLOv5 algorithm detects large, medium, and small objects in the network prediction head by using three different scale detection layers: 80×80, 40×40, and 20×20. The smallest detection layer, however, can only recognize objects with pixel sizes greater than 8×8, resulting targets with an extremely low pixel ratio are overlooked. As a consequence, a non-significant target detection layer can be added to the prediction head , improve the network"s perception of targets with insufficient feature information, and reduce the possibility of missed detection and false alarms during the detection and recognition process. Secondly, the convolutional neural network is used for feature extraction, but in the extraction process, each component of the image is given the same weight. This strategy makes it impossible to focus on the target"s effective information extraction, resulting in a waste of computer resources. Therefore, the convolutional block attention module (CBAM) can be implemented in the network feature extraction backbone to optimize target feature information using. CBAM is divided into two components: channel attention and spatial attention. It fuses attention features from two dimensions of space and channel, adjusts the weight ratio of the network and the target with insufficient feature information, and causes the network to pay more attention to the camouflage target to be detected, improving the camouflage target"s average detection accuracy. Thirdly, in the network training process, a multi-scale training method is utilized to expand the variety of the data set and improve the model"s robustness and generalization ability through scale shift during training. Finally, the indexes for missed alarms and false alarms for military target detection are determined. The comprehensive detection ability index of camouflage target is proposed, which gives a mechanism for quantitative comparison between different techniques when combined with the average detection accuracy and speed. Results The experiment was trained and verified using the study group"s camouflage data set, which included 3200 training sets and 1100 test sets, and was compared to Faster Rcnn, YOLOv4-tiny, SSD, DETR, YOLOx, YOLOv7, YOLOv8, and other algorithms. The results show that the proposed method"s mAP (mean average precision) on the self-made dataset is 76.64%, the number of frames detected per second(FPS) is 53, the missed detection rate(MA) is 8.53%, the false alarm rate(FA) is only 0.14%, and the comprehensive detection index of camouflage targets is as high as 88.17%. The mAP is 3.89% higher than the YOLOv5 algorithm, the MA is 2.75% lower, the FA is 0.56% lower, and the comprehensive detection index of camouflage targets is 0.74% higher. Furthermore, by adding a small target detection layer, integrating an attention mechanism, and training using a multi-scale method, the detection impact of YOLOv5 can be increased. The mAP of YOLOv5 is raised by 4.12% after adding a small target detection layer, the FA is lowered by 0.71%, and the MA and comprehensive detection index of camouflage target change little. The mAP of YOLOv5 is improved by 3.89% after using the attention mechanism, the FA is lowered by 0.63%, the MA is reduced by 0.71%, and the comprehensive detection index of camouflage targets is enhanced by 0.29%. The mAP is increased by 3.13% after using the multi-scale training methods, the MA is lowered by 2.85%, and the FA is reduced by 0.56%. To demonstrate the usefulness of the suggested technique, the MAH-YOLOv5 algorithm is compared to Faster-RCNN, SSD, YOLOv4-tiny, DETR, YOLOx, YOLOv7, YOLOv8 and other algorithms. The testing results reveal that the suggested method outperforms the most advanced YOLOv8 algorithm in terms of mAP, FPS, MA, FA, and other indicators, and its comprehensive detection index is second only to that of the most advanced YOLOv8 algorithm. Conclusion This study improves the YOLOv5 method by adding a small target detection layer and a fusion attention mechanism, and proposes a camouflage target recognition method. The experimental results show that the proposed method has great improvement in detection accuracy and recognition rate, and can effectively identify camouflage targets in complex background environment. In the process of performance comparison, the comprehensive detection performance of this method is much better than Faster-Rcnn, SSD, YOLOv4-tiny, DETR, YOLOv7 and other algorithms. The results show that the method proposed in this study can provide technical assistance and reference for the rapid and accurate identification of battlefield camouflage targets.
Keywords

订阅号|日报