结合旋转框和注意力机制的轻量遥感图像检测模型

李朝辉; 安金堂; 贾红雨; 方艳

发布时间： 2023-09-20
摘要点击次数： 1016
全文下载次数： 679
DOI: 10.11834/jig.220839
2023 | Volume 28 | Number 9

复杂场景图像目标智能检测
<< 上一篇
下一篇>>

结合旋转框和注意力机制的轻量遥感图像检测模型

李朝辉, 安金堂, 贾红雨, 方艳(大连海事大学航运经济管理学院, 大连 116000)

摘要

目的遥感图像目标检测在国防安全、智能监测等领域扮演着重要的角色。面对遥感图像中排列密集且方向任意分布的目标,传统水平框目标检测不能实现精细定位,大型和超大型的目标检测网络虽然有强大表征学习能力,但是忽略了模型准确率与计算量、参数量之间的性价比,也满足不了实时检测的要求,庞大的参数量和计算量在模型部署上也非常受限,针对以上问题,设计了一种轻量级的旋转框遥感图像目标检测模型(YOLO-RMV4)。方法对原 MobileNetv3 网络进行改进,在特征提取网络中加入性能更好的通道注意力机制模块(efficient channelattention,ECA),并且对网络规模进行适当扩展,同时加入路径聚合网络(path aggregation network,PANet),对主干网络提取特征进行多尺度融合,为网络提供更丰富可靠的目标特征。网络检测头中则采用多尺度检测技术,来应对不同尺寸的目标物体,检测头中的角度预测加入了环形圆滑标签(circular smooth label,CSL),将角度回归问题转换为分类问题,从而使预测角度和真实角度之间的距离可以衡量。结果将提出的检测模型在制备的 AVSP(aerialimages of vehicle ship and plane)数据集上进行实验验证,并对主流的 7 种轻量级网络模型进行了对比实验,相比RYOLOv5l,该模型大小(5.3 MB)仅为 RYOLOv5(l 45.3 MB)的 1/8,平均精度均值(mean average precision,mAP)提高了 1.2%,平均召回率(average recall,AR)提高了 1.6%。并且 mAP 和 AR 均远高于其他的轻量级网络模型。同时也对各个改进模块进行了消融实验,验证了不同模块对模型性能的提升程度。结论本文提出的模型在轻量的网络结构下辅以多尺度融合和旋转框检测,使该模型在极有限参数量下实现实时推理和高精度检测。

关键词

深度学习旋转框检测轻量级注意力机制多尺度融合遥感图像

Lightweight object detection model in remote sensing image by combining rotation box and attention mechanism

Li Zhaohui, An Jintang, Jia Hongyu, Fang Yan(School of Maritime Economics and Management, Dalian Maritime University, Dalian 116000, China)

Abstract

Objective Remote sensing image object detection plays an important role in military security, maritime traffic supervision, intelligent monitoring, and other fields.Remote sensing images are different from natural images.Most remote sensing images are taken at altitudes ranging from several kilometers to tens of thousands of meters.Therefore, the scale of target objects in remote sensing images is large.Most of the target objects are small, such as small vehicles.The other target objects are huge, such as ships.The angles of the objects in the remote sensing images are distributed arbitrarily because of the shooting angle.Therefore, this scenario is a huge challenge for the feature extraction network in remote sensing image target detection, particularly in complex backgrounds.Given the continuous improvement in the computing power of hardware devices and the rapid development of deep learning theory, large and ultralarge object detection networks have been continuously proposed in recent years to improve detection accuracy.Although these detection networks have strong representation learning capabilities, they ignore the cost-effectiveness gained from the relationship of detection accuracy with model calculation amount and the number of parameters.Moreover, real-time detection requirements are difficult to achieve, and the number of parameters and amount of calculation are very limited in model deployment.In addition, most of the general target detection models are designed for natural field datasets.The detection effect in remote sensing image target detection is unsatisfactory, particularly for densely arranged objects.The traditional horizontal box object detection cannot achieve precise detection, such as ships in port and cars in parking lots.Aiming at the above problems, a lightweight rotating box remote sensing image object detection model (YOLO-RMV4)is designed.Method In the experiment, the open-source datasets DOTA2.0, FAIR1M, and HRSC2016 are used as the basic datasets.Moreover, four common vehicles, including a ship, a plane, a small vehicle, and a large vehicle, are selected as objects.A aerial images of vehicle ship and plane(AVSP)dataset is prepared after preprocesses, such as filtering, segmentation, conversion, and relabeling, are performed.This dataset contains 19 406 images of 1 024×1 024 and 637 466 object instances.The AVSP data labels are divided into HBB and OBB(HBB is the horizontal box annotation, and OBB is the rotating box annotation), where OBB is represented by eight parameters.YOLO-RMV4 is improved based on the MobileNetv3 network.Adding an efficient channel attention(ECA)mechanism module with excellent performance in the feature extraction network, appropriately expanding the network scale, adding the SPPF module after the feature extraction network, and adding the path aggregation network(PANet)result in multiscale fusion of the extracted features of the backbone network, thereby providing the network with rich and reliable target features.In the network detection head, multiscale detection technology is used to deal with target objects of different sizes.More than half of the objects in the dataset are small targets.Thus, the detection after four times of downsampling is added, resulting in 4, 8, 16, and 32 times of downsampling.Moreover, the small target loss is given a high weight.The smooth circular label is added to the angle prediction in the detection head, which converts the angle regression problem into a classification problem.Thus, the distance between the predicted angle and the real angle can be measured, and the angle periodicity problem is solved.This scenario results in a precise bounding box positioning.Moreover, the anchor size is designed according to the characteristics of the dataset.We use random cropping, flipping, mosaic technique, and other data augmentation approaches in the training.Result In this study, we conduct comparative experiments, and ablation experiments are carried out on the AVSP dataset.We also conduct comparative experiments on seven mainstream lightweight network models to verify the effectiveness of the model.we used average recall(AR), mean average precision(mAP), parameter count, and detection speed(frames per second, FPS)as evaluation metrics.Each model's parameters, such as mAP, AR, and FPS, are also compared.The size of YOLO-RMV4(5.3 M)is only 1/8 of that of RYOLOv5l(45.3 M).Compared with the mAP and AR of RYOLOv5l, those of YOLO-RMV4 are increased by 1.2% and 1.6%, respectively.Moreover, the mAP and AR of YOLO-RMV4 are much higher than those of other lightweight network models(EfficientNet and ShuffleNet).We also compress and prune YOLO-RMV4 to obtain YOLO-RMV4S, whose size is only 4.5 M.YOLO-RMV4S is also better than common lightweight network models in terms of detection precision and recall.Ablation experiments were also conducted on each improved module to verify the improvement degree of model performance by different modules.The mAP increases by 8.4% after the addition of PANet.PANet fuses the features of different layers.This phenomenon largely makes up for the defect of the limited feature extraction capability of the lightweight network.After the rotation detection head is added, the mAP increases by 16.8%, greatly increasing the detection performance of the model.After the ECA module is added, the mAP increases by 1.6%.The ECA module can accurately stimulate the backbone feature extraction network to utilize the limited capacity and the limited amount of parameters and learn the feature information of the target object.After the addition of four times of downsampling, the mAP increases by 3.0%.The addition of four times of downsampling greatly enhances the performance of small target objects.One of the modules is also eliminated based on YOLO-RMV4.The performance degradation degree of the model is compared to reflect the unique role of each module in the model.Finally, the detection accuracy of each category is analyzed.The mAP and AR of the plane are the highest.Those of the ship and the large vehicle are the second, whereas those of the small vehicle are the lowest.Conclusion YOLO-RMV4 is supplemented by multiscale fusion and rotating box detection under the lightweight network structure.Thus, the model can achieve real-time inference and high-precision detection under extremely limited parameters, thereby making it very cost-effective.

Keywords

deep learning rotated box detection lightweight attention mechanism multiscale feature fusion remote sensing image