Current Issue Cover
增强小目标特征的航空遥感目标检测

赵文清1,2, 孔子旭1, 周震东1, 赵振兵3(1.华北电力大学控制与计算机工程学院, 保定 071003;2.复杂能源系统智能计算教育部工程研究中心, 保定 071003;3.华北电力大学电气与电子工程学院, 保定 071003)

摘 要
目的 航空遥感图像中多为尺寸小、方向错乱和背景复杂的目标。传统的目标检测算法由于模型的特征提取网络对输入图像进行多次下采样,分辨率大幅降低,容易造成目标特征信息丢失,而且不同尺度的特征图未能有效融合,检测目标之间存在的相似特征不能有效关联,不仅时间复杂度高,而且提取的特征信息不足,导致目标漏检率和误检率偏高。为了提升算法对航空遥感图像目标的检测准确率,本文提出一种基于并行高分辨率结构结合长短期记忆网络(long short-term memory,LSTM)的目标检测算法。方法 首先,构建并行高分辨率网络结构,由高分辨率子网络作为第1阶段,分辨率从高到低逐步增加子网络,将多个子网并行连接,构建子网时对不同分辨率的特征图反复融合,以增强目标特征表达;其次,对各个子网提取的特征图进行双线性插值上采样,并拼接通道特征;最后,使用双向LSTM整合通道特征信息,完成多尺度检测。结果 将本文提出的检测算法在COCO (common objects in context)2017数据集、KITTI (Karlsruhe Institute of Technology and Toyota Technological Institute at Chicago)车辆检测和UCAS-AOD (University of Chinese Academy of Sciences-Aerial Object Detection)航空遥感数据集上进行实验验证,平均检测准确率(mean average precision,mAP)分别为41.6%、69.4%和69.3%。在COCO 2017、KITTI和VCAS-AOD数据集上,本文算法与SSD513算法相比,平均检测准确率分别提升10.46%、7.3%、8.8%。结论 本文方法有效提高了航空遥感图像中目标的平均检测准确率。
关键词
Target detection algorithm of aerial remote sensing based on feature enhancement technology

Zhao Wenqing1,2, Kong Zixu1, Zhou Zhendong1, Zhao Zhenbing3(1.School of Control and Computer Engineering, North China Electric Power University, Baoding 071003, China;2.Engineering Research Center of Intelligent Computing for Complex Energy Systems, Ministry of Education, Baoding 071003, China;3.Department of Electrical and Electronic Engineering, North China Electric Poner University, Baoding 071003, China)

Abstract
Objective Saliency in the detection of aerial remote sensing image can have many military and life applications. On the one hand, the spatial resolution of remote sensing image is becoming higher with the improvement of technology. On the other hand, it can be applied in urban traffic planning, military target tracking, ground object classification, and other aspects. Most of the advanced target detection algorithms (such as Fast region with convolutional neural network (R-CNN), Mask R-CNN, and single shot multibox detector (SSD)) are tested on the general data set. However, the classifier based on the training of the general data set does not have a good detection effect on the aerial remote sensing image primarily due to the particularity of the aerial remote sensing image. An aerial remote sensing image is taken from a height of several hundred meters or even up to 10 000 m due to scale diversity. Thus, the sizes of similar objects in the remote sensing image differ. Taking the ship in the port as an example, the super large ship is nearly 400 meters long, and the small ship is tens of meters long. Aerial remote sensing images are shot from a high-altitude perspective, and the objects presented are all top views, which are quite different from the data set (horizontal perspective) generally used due to the particularity of perspective, which will lead to the poor effect of the trained target detection algorithm in practical application of remote sensing images. In the small target problem, most of the targets in the aerial remote sensing image are small (tens of pixels or even several pixels), the amount of information of these targets in the image is very small, and the mainstream target detection algorithm is not ideal for the detection effect of small targets in these remote sensing images mainly because the detection method based on convolutional neural network uses the pooling layer, resulting in a lower original amount of information. For example, the target image of 24×24 pixels is transformed into 1×1 pixel after four pooling layers, and the dimension is very low to be classified. The background complexity is high because the aerial remote sensing image is taken from a high altitude, its field of vision is relatively large (usually covers several square kilometers), and the image contains tens of thousands of backgrounds, resulting in the integration of the background and the small target, which has a strong interference on detection. Generally, the recognition rate of a small target in the remote sensing image is low, the scale is diverse, the direction is disordered, and the background is complex. On the one hand, edge information is lost when a small target is pooled. On the other hand, the semantic information of the feature map is not strong enough to detect the corresponding target. In this paper, a parallel high-resolution network structure combined with long short-term memory (LSTM) is proposed to replace the basic detection network visual geometry group 16-layer net (VGG16) of SSD and improve the detection accuracy of the algorithm for aerial targets. Method This paper introduces high-resolution network (HRNet) network and LSTM network in the SSD model. The largest feature of the HR-Net parallel network is that the input image can always maintain a high-resolution output. This parallel network structure and traditional top-down extraction feature are then up sampled and restored. The feature size is different. The parallel structure effectively reduces the number of down sampling and the loss of feature information of the target edge to be detected. The LSTM network is a variant of the circulatory neural network. The R-CNN cannot be deeply trained due to the disappearance of the gradient. The LSTM network combines short-term memory with long-term memory through subtle door control, which solves the gradient disappearance to a certain extent. To address the problem of gradient explosion, first, the method of parallel high-resolution feature map in HRNet is used to build the residual module. The first stage is the high-resolution subnetwork, which gradually increases the high-resolution subnetwork to the low-resolution subnetwork, and the multistage subnetwork is connected in parallel. Second, repeated feature fusion is carried out to obtain rich feature information. Finally, the feature map of each subnet is sampled and fused, the channel information is integrated with bidirectional LSTM, and context information is effectively used to form a multiscale detection. Result By applying the improved network to SSD algorithm, this paper compares it with the SSD method on common objects in context (COCO) 2017 dataset, Karlsruhe Institute of Technology and Toyota Technological Institute at Chicago (KITTI), and University of Chinese Academy of Sciences-Aerial Object Detection (UCAS-AOD) of aviation target dataset. In the COCO2017 dataset, the model mean average precision is 41.6%, which is 10.4% higher than that of SSD513 + ResNet101. In the KITTI and UCAS-AOD datasets, the mean average precision (mAP) of this model is 69.4% and 69.3%, respectively. On COCO2017 dataset, KITTI dataset and UCAS-AOD dataset, the average detection accuracy of this algorithm increased by 10.4%, 7.3% and 8.8% compared with SSD513. Conclusion Results show that this method can reduce the miss detection rate of a small target and improve the average detection accuracy of the entire target.
Keywords

订阅号|日报