发布时间: 2019-08-16
摘要点击次数:
全文下载次数:
DOI: 10.11834/jig.180649
2019 | Volume 24 | Number 8

遥感图像处理

注意力机制改进卷积神经网络的遥感图像目标检测

李红艳, 李春庚, 安居白, 任俊丽

大连海事大学信息科学技术学院, 大连 116026

收稿日期: 2018-12-10; 修回日期: 2019-01-10

基金项目: 国家自然科学基金项目（61471079）

第一作者简介: 李红艳, 1994年生, 女, 硕士研究生, 主要研究方向为计算机视觉、深度学习、图像中的多目标检测与识别。E-mail:hyl@dlmu.edu.cn;
安居白, 男, 教授, 主要研究方向为数字图像处理技术、计算机视觉。E-mail:jubaian@dlmu.edu.cn;
任俊丽, 女, 硕士研究生, 主要研究方向为数字图像处理、目标跟踪。E-mail:junli_ren1@163.com.

中图法分类号: TP753

文献标识码: A

文章编号: 1006-8961(2019)08-1400-09

摘要

目的遥感图像目标检测是遥感图像处理的核心问题之一，旨在定位并识别遥感图像中的感兴趣目标。为解决遥感图像目标检测精度较低的问题，在公开的NWPU_VHR-10数据集上进行实验，对数据集中的低质量图像用增强深度超分辨率（EDSR）网络进行超分辨率重构，为训练卷积神经网络提供高质量数据集。方法对原Faster-RCNN（region convolutional neural network）网络进行改进，在特征提取网络中加入注意力机制模块获取更多需要关注目标的信息，抑制其他无用信息，以适应遥感图像视野范围大导致的背景复杂和小目标问题；并使用弱化的非极大值抑制来适应遥感图像目标旋转；提出利用目标分布之间的互相关对冗余候选框进一步筛选，降低虚警率，以进一步提高检测器性能。结果为证明本文方法的有效性，进行了两组对比实验，第1组为本文所提各模块间的消融实验，结果表明改进后算法比原始Faster-RCNN的检测结果高了12.2%，证明了本文所提各模块的有效性。第2组为本文方法与其他现有方法在NWPU_VHR-10数据集上的对比分析，本文算法平均检测精度达到79.1%，高于其他对比算法。结论本文使用EDSR对图像进行超分辨处理，并改进Faster-RCNN，提高了算法对遥感图像目标检测中背景复杂、小目标、物体旋转等情况的适应能力，实验结果表明本文算法的平均检测精度得到了提高。

关键词

遥感图像; 目标检测; 注意力机制; 卷积神经网络; 图像超分辨率

Attention mechanism improves CNN remote sensing image object detection

Li Hongyan, Li Chungeng, An Jubai, Ren Junli

School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China

Supported by: National Natural Science Foundation of China (61471079)

Abstract

Objective Remote sensing image object detection aims to locate and identify the object of interest in remote sensing images, and it is one of the core issues in remote sensing image processing. Object detection in optical remote sensing images is a fundamental and challenging problem in the field of aerial and satellite image analysis and is an important part of automated extraction of remote sensing information. Object detection in remote sensing images plays an important role in a wide range of applications, having a broad application value in the fields of national defense security, urban construction planning, and disaster monitoring. In recent years, it has received great attention. The application range of remote sensing images is expanding day by day, thereby giving fast and effective remote sensing object detection methods a broad application prospect. With the rapid development of platform and sensor technology, the spatial resolution of remote sensing images continues to increase, and the visual difference from natural images is decreasing. An increasing number of computer vision methods can be applied to high-spatial-resolution remote sensing image object recognition, but problems of low detection accuracy and low efficiency still exist and need to be addressed. Method In this paper, an improved convolutional neural network (CNN) detection method for attention mechanism is proposed and tested on the NWPU_VHR-10 dataset. The dataset is a 10-level geospatial object detection dataset. Some of the images have low resolution, which affects the experimental results. Therefore, some low-quality images in the dataset were reconstructed with enhanced depth super-resolution (EDSR) network in super-resolution to provide a high-quality dataset for training CNNs. This paper studies how to use the Faster-RCNN model for multi-class object recognition to adapt to some characteristics of remote sensing images that are different from natural images. The original Faster-RCNN network was improved as follows:An attention mechanism was added to the feature extraction network module. Then, an attention CNN was obtained for more information. The object is focused by inhibiting other useless information from adapting to the background of the large range of remote sensing image vision, which leads to the complex problem of small targets. Weak non-maximal suppression is used to adapt to the target rotation of the remote sensing image. To improve detector performance, the cross-correlation between target distributions is used to further screen redundant candidate frames and reduce false alarm rate. Result Two sets of comparative experiments were conducted to prove the validity of the method. The first set of comparative experiments is the ablation experiment between the four modules mentioned in this paper:attention mechanism module, non-maximal suppression, cross-correlation filtering mechanism, and image super-resolution processing for low-quality images. Experimental results show that the improved attentional CNN has higher detection accuracy than the original Faster-RCNN in 10 categories. The average detection accuracy improved by 12.2%. All the modules mentioned in this paper effectively improved the object detection of aerial remote sensing images. Moreover, the added attention module is a lightweight module that hardly increases the computational cost of the network model. Thus, it does not reduce the efficiency of the network. The second set of comparative experiments is the comparison and analysis of the improved attentional CNN and other existing traditional methods and deep learning methods on the open dataset NWPU_VHR-10. The average detection accuracy of this algorithm is 79.1%, which is higher than that of other algorithms. Conclusion CNN has great application potential in remote sensing image object detection and is a research hotspot at present and in the future. How to better apply CNN to object detection of aerial remote sensing images has important theoretical significance. In this study, the enhanced depth super resolution network is used to super-resolve some low-resolution images in the dataset. The attentional mechanism was proposed to improve the gross-RCNN to enable the algorithm to focus on the target region of interest in the image, that is, the extracted features that are more valuable for the current detection task. It improves the adaptability of the algorithm to the complex background, small objects caused by a wide field of view, and object rotation caused by the angle of view used in aerial photography for aerial remote sensing image object detection. Experimental results show the improved average detection accuracy of the proposed algorithm.

Key words

remote sensing images; object detection; attention mechanism; convolutional neural networks (CNN); image super-resolution

0 引言

光学遥感图像目标检测是传统目标检测问题的一个分支。光学遥感图像的特点是视野范围较大、背景复杂度高、视角特殊、目标旋转、小目标等，在提供更多感兴趣区域的同时带来了更复杂的背景信息，给目标检测带来了巨大挑战。如何准确、高效地从大幅面光学遥感图像中检测出感兴趣的目标已经成为遥感信息自动化提取以及遥感图像应用于实际的关键问题之一。

传统的遥感图像目标检测方法通常是基于图像处理的方法，即先进行阈值分割、纹理/几何特征提取，然后使用模板匹配、背景建模以及基于浅层学习等方法对目标进行判断^[1]。过去的几十年中，研究者竭尽所能，开发了用于检测航空遥感图像中不同类型物体的各种特征表示，主要有方向梯度直方图特征(HOG)、词袋特征(BoW)、Gabor特征等。BoW模型^[2]是遥感图像目标检测中常用的传统方法之一，压缩感知的出现使基于稀疏编码的特征表示法被应用于遥感图像分析^[3-5]。传统的目标识别方法通过人工设计特征，借鉴大量的先验知识进行手动设计，需要大量的特征工程实验来寻找针对特定问题的有用特征^[6]，不仅特征的抽象和泛化能力较弱，而且导致工作量和工程复杂度都很高。采用手工设计的特征或基于浅层学习的特征训练出来的模型或人为定义的模板在一些特定的对象检测任务上取得了成功，随着视觉识别任务变得更加具有挑战性，这些特征的描述能力可能变得有限甚至匮乏^[7]。而在光学遥感图像目标检测问题中由于特殊拍摄条件和复杂的背景条件，传统检测方法效果并不是很好。

近几年，深度卷积神经网络在自然图像分类、目标识别等领域取得了成功，原因是深度卷积神经网络能通过大量有监督训练获得更具表征能力的特征。目前主流的基于深度学习的检测模型可以分为两大类：one-stage检测算法和two-stage检测算法。one-stage检测算法包括单次检测器(YOLO)^[8]和单网多尺度检测器(SSD)^[9]；two-stage检测算法包括区域建议网络R-CNN (region convolutional neural network)^[10]的系列算法。与传统的基于人工工程的特征相比，依赖于深层体系结构的神经网络能从图像像素级获取更具表达能力的特征^[11]，因此特征设计的负担就转移到网络建设中。很多研究人员从深度学习框架出发，设计算法以提高遥感图像中目标检测精度。例如旋转不变卷积神经网络(RICNN)^[7]，该网络模型是在AlexNet^[12]的基础上引入和学习新的旋转不变层来实现的, 对遥感图像目标检测中的物体旋转效果良好，但是在进行卷积之前在图像上生成了大量感兴趣区域，这些感兴趣区域之间存在较大的重叠区域会被重复性地卷积计算，浪费计算资源；基于单次检测器的视感知检测算法^[13]不需要区域建议阶段，但由于归一化图像会导致目标特征丢失。

Faster-RCNN^[14]是对R-CNN和Fast-RCNN^[15]的进一步完善，将选择性搜索(selective search)算法替换成了区域生成网络(RPN)并整合到深度网络中，不仅解决了选择性搜索算法CPU实现速度慢的问题，而且与深度网络结合，共享前面的卷积计算，提高计算效率。本文将Faster-RCNN用到光学遥感图像目标识别中，并根据光学遥感图像目标检测中的问题进行改进。1)为适应遥感图像视野范围大导致的背景复杂和小目标问题，在特征提取网络中加入注意力机制模块；2)为解决目标旋转对检测精度的影响，使用弱化的非极大值抑制；3)提出利用目标分布之间的互相关对冗余候选框进一步筛选，降低虚警率，进一步提高检测器性能。

1 数据预处理

1.1 小样本数据集数据增强

数据增强即数据增广，目的是增加数据集的规模和防止机器学习的模型过拟合。NWPU_VHR-10数据集属于小样本数据集，本文采用翻转、平移和缩放变形3种数据增强方式对其进行扩充。1)翻转(水平或上下)可以使数据成倍增广，适用于小数据集。2)平移即在图像平面上对图像以一定方式平移，可以随机或人为指定平移范围和平移步长，沿水平或竖直方向进行平移，改变图像内容位置，使得模型不会对某一位置的目标出现概率过拟合。3)缩放变形即随机或手动选取图像的一部分，然后调整到原图像尺寸，因为同一目标表现的特征在不同尺度的图像中会随着尺度的变化产生差异，由此可以让神经网络“看到”更多不同的特征，增加模型的泛化能力，有利于增加模型性能。

1.2 图像超分辨率处理

NWPU_VHR-10数据集中的图像裁剪自Google Earth和Vaihingen数据集，数据增强时采用了缩放变形的增强方式，而将裁剪下来的图像的一小部分直接调整成原图大小会导致图像质量降低，直接放大调整后的图像，会降低分辨率且丢失特征信息，如图 1所示。

图 1 缩放变形导致图片质量降低示例

Fig. 1 Example of image quality degradation due to scaling distortion

为解决上述问题，对分辨率降低的遥感图像进行超分辨率处理，采用增强深度超分辨率网络(EDSR)^[16]，重构出一幅高分辨率图像，重构效果优于最邻近像元法和双线性内插法，EDSR网络结构如图 2所示，其残差块与原ResNet^[17]和SRResNet^[18]中残差块的对比如图 3所示。

图 2 EDSR网络结构

Fig. 2 The network structure of EDSR

图 3 原始ResNet、SRResNet和EDSR中残差块对比

Fig. 3 Comparison of residual blocks among original ResNet, SRResNet and EDSR

从图 2和图 3可以看出，该模型在传统残差网络基础上去除了批标准化(BN)模块并对模型规模进行了扩展。由于BN层消耗了与它前面的卷积层相同大小的内存，在去掉这一步操作后，相同的计算资源下，EDSR可以堆叠更多的网络层或使每层提取更多的特征从而得到更好的性能表现，既能保证训练稳定又能节约训练空间。EDSR在训练时先训练低倍数的上采样模型，这样能减少高倍数上采样模型的训练时间，同时训练结果也好，很好地解决了因缩放形变导致的图像分辨率降低的问题。

2 改进卷积网络

2.1 注意力模型的使用

注意力在人类感知中起着重要作用，人类视觉系统的一个重要特性是人们不会一次尝试处理整个场景，相反，人类利用一系列局部瞥见并选择性地聚焦于显著部分，以便更好地捕捉视觉结构。注意力模型(AM)被广泛使用在自然语言处理、图像识别及语音识别等各种不同类型的深度学习任务中，是深度学习技术中值得关注与了解的技术^[19]。从本质上讲，深度学习中的注意力机制与人类的选择性视觉注意力机制类似，核心目标是从众多信息中选择出对当前任务目标更关键的信息。

传统方法和现有卷积神经网络方法都不能很好地解决遥感图像中的小目标问题，本文在原始Faster-RCNN框架上引入注意力模型对遥感图像进行目标检测，使用的是一种结合了空间和通道的注意力机制模块：CBAM(convolutional block attention module)^[20], 使用的特征提取网络是ResNet101，将CBAM模块加入到ResNet结构中, 需要在ResNet的每个残差块后加入该模块，结构如图 4所示。给定一个中间的特征映射$\boldsymbol{F} \in R C \times H \times W$作为输入，CBAM依次推断出一个1维通道注意力图谱$\boldsymbol{M} \boldsymbol{c} \in R C \times 1 \times 1$和一个2维的空间注意力图谱$\boldsymbol{M} \boldsymbol{s} \in R 1 \times H \times W$，然后将注意力映射图谱乘以输入特征图谱以进行自适应特征细化，比仅关注通道的注意力机制效果更好。由于CBAM是一个轻量级的模块，可以集成到CNN框架中，开销可以忽略不计，并且可以和CNN一起进行端到端的训练，实验证明该模块的加入提高了网络的检测性能。

图 4 基于残差结构的CBAM模块

Fig. 4 CBAM module based on residual structure

2.2 用Soft-NMS替换NMS

在检测器对图像中的目标物体进行检测时，最后选定的候选框(bounding box)必然会有一定的重叠现象，当重叠度很高(高于某一阈值$Nt$)时，将置信度最高的作为输出，而其他的预测结果直接去掉，这种方法称为非极大值抑制(NMS)。但是由于光学遥感图像视角的特殊性，基本都是高空俯视，所以会出现目标的多方向问题即旋转。但是实验中对数据集进行的目标标注框都是水平竖直的，所以会出现大量目标框重叠的现象，类似自然图像中的目标相互遮挡，为了克服NMS对这种场景的不适应，本文用弱化的非极大值抑制(Soft-NMS)^[21]替换NMS。NMS过于简单直接，而Soft-NMS就是对函数进行平滑，文献[21]提出两种平滑函数，一种是线性加权函数，一种是高斯加权函数。

原NMS为

$ s_{i}=\left\{\begin{array}{ll}{s_{i}} & {f_{\mathrm{IoU}}\left(\boldsymbol{M}, \boldsymbol{b}_{i}\right) <N t} \\ {0} & {f_{\mathrm{IoU}}\left(\boldsymbol{M}, \boldsymbol{b}_{i}\right) \geqslant N t}\end{array}\right. $

(1)

线性加权为

$ s_{i}=\left\{\begin{array}{ll}{s_{i}} & {f_{\text { IoU }}\left(\boldsymbol{M}, \boldsymbol{b}_{i}\right) <N t} \\ {s_{i}\left(1-f_{\text { IoU }}\left(\boldsymbol{M}, \boldsymbol{b}_{i}\right)\right)} & {f_{\text { IoU }}\left(\boldsymbol{M}, \boldsymbol{b}_{i}\right) \geqslant N t}\end{array}\right. $

(2)

高斯加权为

$ {{s}_{i}}={{s}_{i}}{{\text{e}}^{\frac{-{{f}_{\text{IoU}}}{{\left( \boldsymbol{M}, {{\boldsymbol{b}}_{i}} \right)}^{2}}}{\sigma }}}, \quad \forall {{\boldsymbol{b}}_{i}}\notin \boldsymbol{D} $

(3)

式中，$s_{i}$表示$\boldsymbol{b}_{i}$的得分，$\boldsymbol{M}$为当前得分最高的框，$\boldsymbol{b}_{i}$为待处理框，$\boldsymbol{D}$表示最终的检测框集合，$\text{IoU}$ (intersection over union)表示重叠率，$f_{\mathrm{IoU}}\left(\boldsymbol{M}, \boldsymbol{b}_{i}\right)$即$\boldsymbol{M}$和$\boldsymbol{b}_{i}$之间的重叠率。$\boldsymbol{b}_{i}$和$\boldsymbol{M}$的$f_{\mathrm{IoU}}$越大，$s_{i}$下降得越厉害。实验证明，在其他条件相同的情况下，将使用线性加权与高斯加权进行对比，使用高斯加权效果更好，并且都比原网络检测精度有所提高，证明了改进后网络的有效性。

2.3 引入互相关

图 5是统计数据集NWPU_VHR-10各类目标在每幅图像中的分布情况，横轴为图像标号，纵轴为包含的该类目标的个数，通过图 5可以清晰地看出10类目标分布之间存在一定的关联性，例如，棒球场和篮球场经常会出现在同一幅图像中，但是棒球场和桥梁一起出现的情况就很少。

图 5 10类目标的分布情况

Fig. 5 Distribution of 10 classes ((a)aircraft; (b)ship; (c)oil tank; (d)baseball filed; (e)tennis court; (f)basketball court; (g)playground; (h)port; (i)bridge; (j)car)

本文使用互相关函数表示它们之间的关联关系。互相关函数是信号分析中的概念，表示两个时间序列之间的相关程度，即描述信号$x(t), y(t)$在任意两个不同时刻$t_{1}, t_{2}$的取值之间的相关程度。

互相关在连续函数上的定义为

$ (f * g)(\tau)=\int_{-\infty}^{+\infty} f^{*}(t) g(t+\tau) \mathrm{d} t $

(4)

在离散函数上的定义为

$ (f * g)(n)=\sum\limits_{-\infty}^{+\infty} f^{*}[m] g(m+n) $

(5)

式中，$f*g$表示$f(t)$和$g(t)$的互相关函数，$f^{*}$表示$f$的复共轭函数，从式(4)(5)可以看到，互相关函数和卷积运算类似，也是两个序列滑动相乘，反映的是两个函数在不同的相对位置上互相匹配的程度，与要分析的同一幅图像中两个目标的相关性(即同时出现的概率)不谋而合，所以引入互相关机制。由于各目标在图像中的分布属于离散函数，所以使用式(5)进行计算，计算结果采用离散标准化即式(6)进行归一化，10类目标两两之间的互相关值用$R\left(x_{i}, x_{j}\right)$表示。

离散标准化为

$ x^{*}=\frac{x-x_{\min }}{x_{\max }-x_{\min }} $

(6)

经过Soft-NMS的$\text{IoU}$筛选后，再用互相关机制进一步筛选，核心思想是被检测框类别与当前得分最高的框的类别相关性越小，得分下降越多。互相关抑制计算为

$ {{s}_{i}}={{s}_{i}}{{\text{e}}^{\frac{-{{\left( 1-R\left( \boldsymbol{M}, {{\boldsymbol{b}}_{i}} \right) \right)}^{2}}}{\sigma }}}, \forall {{\boldsymbol{b}}_{i}}\notin \boldsymbol{D} $

(7)

式中，$\boldsymbol{b}_{i}$表示当前检测框，${{s}_{i}}$表示检测框$\boldsymbol{b}_{i}$的得分，$\boldsymbol{M}$表示当前置信度最高的目标框，$R$($\boldsymbol{M}$, $\boldsymbol{b}_{i}$)表示$\boldsymbol{M}$和$\boldsymbol{b}_{i}$类别之间的相关性。

3 实验结果与分析

图 6是本文在NWPU_VHR-10数据集上对9幅图像进行检测的结果，可以看出, 无论是对田径场这种较大目标还是车辆这种较小目标，本文算法都能较为精确地检测到目标并给出合理的包围框。

图 6 在9幅图像上的检测示例

Fig. 6 Examples of detection on nine images((a) aircraft; (b) oil tank and ship; (c) tennis court; (d) baseball field and playground; (e) baseball field and basketball court; (f) baseball field, tennis court and basketball court; (g) port; (h) bridge; (i) car)

准确率(precision)和召回率(recall)是经常使用的对目标检测结果进行分析的指标。图 7是可视化的Faster-RCNN与本文算法的precision-recall(P-R)曲线对比，横轴是召回率，纵轴是准确率，$p$($r$)是准确率关于召回率的函数，函数曲线与坐标轴之间的面积AP可以更直观地表现分类器性能, 计算为

$ f_{\mathrm{AP}}=\int_{0}^{1} p(r) \mathrm{d} r $

(8)

图 7 Precision-recall曲线对比

Fig. 7 Comparison of precision-pecall curves ((a) original Faster-RCNN; (b) our method)

可以看出，本文算法对应的P-R曲线面积更大。

为验证本文算法的有效性，在公开数据集NWPU_VHR-10上对本文所提各模块进行消融实验，再与3种传统方法BoW^[2]、基于稀疏编的特征(FDDL)的检测算法^[4]、局部融合检测器(COPD)^[22]和2种卷积神经网络算法T(transterred)-CNN^[23]、RICNN^[7]以及原Faster-RCNN进行对比试验。测试平台CPU为Intel Core i7 7700，GPU为NvidiaGTX1060 6 GB显存，操作系统为Ubuntu x64。使用平均准确率(AP)和均值平均精度(mAP)评价目标检测单个目标和总体性能。

表 1是本文各模块的消融实验对比，原始的Faster-RCNN检测精度是66.9%，表中结果验证了本文对数据的高分辨率处理以及对网络的3方面改进对提高平均检测精度是有效的，尤其是注意力机制模块的加入，使得特征提取网络能够获得更多与任务目标相关的信息而舍弃其他无用信息。

表 1 本文各模块的消融实验对比
Table 1 Comparison of ablation experiments for each module in this paper

下载CSV

NMS	EDSR	CBAM	Soft-NMS	互相关	mAP/%
√					66.9
√	√				68.7
√		√			72.5
			√		71.4
√				√	70.8
√	√	√		√	75.0
	√	√	√	√	79.1
注：加粗字体表示最优结果。

表 2是7个算法在NWPU_VHR-10数据集上的平均准确率对比。从表 2可以看出，本文算法虽然在储油罐上的准确率比T-CNN和RICNN算法低，但数据集中旋转情况最严重的网球场和码头的准确率都高于其他对比方法，田径场的准确率更是达到了99.7%。证明本文算法对遥感图像物体旋转适应性良好。

表 2 7个算法在NWPU_VHR-10数据集上的平均准确率对比
Table 2 Comparison of average precision for seven algorithms in NWPU_VHR-10 dataset

下载CSV

/%
图像	算法
图像	BoW	FDDL	COPD	T-CNN	RICNN	原Faster-RCNN	本文
飞机	25.0	29.2	62.3	66.1	88.3	89.9	95.2
船舰	58.5	37.6	68.9	56.9	77.3	67.9	79.7
储油罐	63.2	77.0	63.7	84.3	85.3	63.4	73.7
棒球场	9.0	25.8	83.3	81.6	88.1	93.2	96.4
网球场	4.4	2.8	32.1	35.0	40.4	69.7	71.6
篮球场	3.2	3.7	36.3	45.9	58.4	52.9	72.1
田径场	7.8	20.1	85.3	80.0	86.7	94.6	99.7
港口	53.0	25.4	55.3	62.0	68.6	61.6	73.2
桥梁	12.2	21.5	14.8	42.3	61.5	21.8	57.0
车辆	9.1	4.5	44.0	42.9	71.1	54.2	72.0
mAP	24.6	24.8	54.6	59.7	72.6	66.9	79.1
注：加粗字体表示每行最优结果。

从计算复杂度上看，由于RICNN算法采用了选择性搜索算法，在每幅图像上获取类别不同、尺度不一、位置不同的感兴趣区域，并将这些感兴趣区域归一化送入卷积神经网络进行计算，会因感兴趣区域重叠而重复进行卷积计算, 从而浪费计算资源。而本文采用的Faster-RCNN将感兴趣区域提取与卷积神经网络融合在一起，使用端到端的网络进行目标识别，共享卷积计算结果，提高计算效率，降低计算复杂度，使得检测速度得到了提升。

4 结论

针对遥感图像目标检测精度和效率较低的问题，本文使用图像超分辨率网络对数据集中部分分辨率较低图像进行超分辨处理，为训练神经网络提供更好的数据集。为适应遥感图像背景复杂、小目标、物体旋转等情况，从3方面对Faster-RCNN进行改进，得到的注意力卷积神经网络提高了检测精度。在小样本NWPU_VHR-10数据集上进行实验，在测试集上的平均检测精度达到了79.1%，分别与原Faster-RCNN网络和现有的一些遥感图像检测算法作对比试验，证明本文对图像的超分辨处理和对Faster-RCNN的改进更适用于遥感图像的目标检测任务，但在桥梁上的检测精度有待提高。在未来工作中，可以根据遥感图像中存在的大量物体旋转情况，在数据标注时进行特殊的倾斜包围框标注，同时将继续寻找优化方法，进一步改进本文网络，促进卷积神经网络在遥感图像目标识别、遥感影像分割等任务上应用。

参考文献

[1] Cheng G, Han J W. A survey on object detection in optical remote sensing images[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2016, 117: 11–28. [DOI:10.1016/j.isprsjprs.2016.03.014]

[2] Xu S, Fang T, Li D R, et al. Object classification of aerial images with bag-of-visual words[J]. IEEE Geoscience and Remote SensingLetters, 2010, 7(2): 366–370. [DOI:10.1109/LGRS.2009.2035644]

[3] Han J, Zhou P, Zhang D, et al. Efficient, simultaneous detection of multi-class geospatial targets based on visual saliency modeling and discriminative learning of sparse coding[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2014, 89: 37–48. [DOI:10.1016/j.isprsjprs.2013.12.011]

[4] Chen Y, Nasrabadi N M, Tran T D. Sparse representation for target detection in hyperspectral imagery[J]. IEEE Journal of Selected Topics in Signal Processing, 2011, 5(3): 629–640. [DOI:10.1109/JSTSP.2011.2113170]

[5] Sun H, Sun X, Wang H Q, et al. Automatic target detection in high-resolution remote sensing images using spatial sparse coding bag-of-words model[J]. IEEE Geoscience and Remote Sensing Letters, 2012, 9(1): 109–113. [DOI:10.1109/LGRS.2011.2161569]

[6] Lu Y F, Zhang S H. Object detection in optical remote sensing images withconvolutional neural networks[J]. China Sciencepaper, 2017, 12(14): 1583–1589, 1633. [卢艺帆, 张松海. 基于卷积神经网络的光学遥感图像目标检测[J]. 中国科技论文, 2017, 12(14): 1583–1589, 1633. ] [DOI:10.3969/j.issn.2095-2783.2017.14.004]

[7] Cheng G, Zhou P C, Han J W. Learning rotation-invariant convolutional neural networks for object detection in VHR optical remote sensing images[J]. IEEE Transactions on Geoscience and Remote Sensing, 2016, 54(12): 7405–7415. [DOI:10.1109/TGRS.2016.2601622]

[8] Redmon J, Farhadi A. YOLO9000: better, faster, stronger[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition.Honolulu, HI: IEEE, 2017.[DOI: 10.1109/CVPR.2017.690]

[9] Liu W, Anguelov D, Erhan D, et al. SSD: single shot MultiBoxdetector[C]//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, The Netherlands: Springer, 2016.[DOI: 10.1007/978-3-319-46448-0_2]

[10] Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate objectdetection and semanticsegmentation[C]//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, OH: IEEE, 2014.[DOI: 10.1109/CVPR.2014.81]

[11] Gao C X, Sang N. Deep learning for object detection in remote sensing image[J]. Bulletin of Surveying and Mapping, 2014(S1): 108–111. [高常鑫, 桑农. 基于深度学习的高分辨率遥感影像目标检测[J]. 测绘通报, 2014(S1): 108–111. ] [DOI:10.13474/j.cnki.11-2246.2014.0625]

[12] Krizhevsky A, Sutskever I, Hinton G E. ImageNetclassification with deep convolutional neural networks[C]//Proceedings of the 25th International Conference on Neural Information Processing Systems. Lake Tahoe, Nevada: IEEE, 2012.[DOI: 10.114513065386]

[13] Li C, Zhang Y C, Lan T, et al. An object detection algorithm with visual perception for high-resolution remote sensing images[J]. Journal of Xi'an Jiaotong University, 2018, 52(6): 9–16. [李策, 张亚超, 蓝天, 等. 一种高分辨率遥感图像视感知目标检测算法[J]. 西安交通大学学报, 2018, 52(6): 9–16. ] [DOI:10.7652/xjtuxb201806002]

[14] Ren S Q, He K M, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems. Montreal, Canada: MIT Press, 2015.

[15] Girshick R. Fast R-CNN[EB/OL].[2018-11-25].https: //arxiv.org/pdf/1504.08083.pdf.

[16] Lim B, Son S, Kim H, et al. Enhanced deep residual networks for single image super-resolution[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Honolulu, HI: IEEE, 2017.[DOI: 10.1109/CVPRW.2017.151]

[17] He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV: IEEE, 2016.[DOI: 10.1109/CVPR.2016.90]

[18] Ledig C, Theis L, Huszár F, et al. Photo-realistic single image super-resolution using a generative adversarial network[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI: IEEE, 2017.[DOI: 10.1109/CVPR.2017.19]

[19] Rush A M, Chopra S, Weston J. A neural attention model for abstractive sentence summarization[EB/OL].[2018-11-25].https://arxiv.org/pdf/1509.00685.pdf.

[20] Woo S, Park J, Lee J Y, et al. CBAM: convolutional block attention module[EB/OL].[2018-11-25].https://arxiv.org/pdf/1807.06521.pdf.

[21] Bodla N, Singh B, Chellappa R, et al. Soft-NMS-improving object detection with one line of code[C]//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE, 2017.[DOI: 10.1109/ICCV.2017.593]

[22] Cheng G, Han J W, Zhou P C, et al. Multi-class geospatial object detection and geographic image classification based on collection of part detectors[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2014, 98: 119–132. [DOI:10.1016/j.isprsjprs.2014.10.002]

[23] Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deepconvolutional neuralnetworks[J]. Communications of the ACM, 2017, 60(6): 84–90. [DOI:10.1145/3065386]