发布时间: 2017-07-16
摘要点击次数:
全文下载次数:
DOI: 10.11834/jig.160627
2017 | Volume 22 | Number 7

遥感图像处理

高分辨卫星图像卷积神经网络分类模型

周明非, 汪西莉, 王磊, 陈粉

陕西师范大学计算机科学学院, 西安 710119

收稿日期: 2016-12-21; 修回日期: 2017-03-29

基金项目: 国家自然科学基金项目（41171338，41471280，61401265）

第一作者简介: 周明非(1993—), 男, 现为陕西师范大学计算机科学学院计算机应用技术工学专业硕士研究生, 主要研究方向为智能信息处理、遥感图像处理、模式识别。E-mail:443164394@qq.com

中图法分类号: TP391.4

文献标识码: A

文章编号: 1006-8961(2017)07-0996-12

摘要

目的卫星图像往往目标、背景复杂而且带有噪声，因此使用人工选取的特征进行卫星图像的分类就变得十分困难。提出一种新的使用卷积神经网络进行卫星图像分类的方案。使用卷积神经网络可以提取卫星图像的高层特征，进而提高卫星图像分类的识别率。方法首先，提出一个包含六类图像的新的卫星图像数据集来解决卷积神经网络的有标签训练样本不足的问题。其次，使用了一种直接训练卷积神经网络模型和3种预训练卷积神经网络模型来进行卫星图像分类。直接训练模型直接在文章提出的数据集上进行训练，预训练模型先在ILSVRC（the ImageNet large scale visual recognition challenge）-2012数据集上进行预训练，然后在提出的卫星图像数据集上进行微调训练。完成微调的模型用于卫星图像分类。结果提出的微调预训练卷积神经网络深层模型具有最高的分类正确率。在提出的数据集上，深层卷积神经网络模型达到了99.50%的识别率。在数据集UC Merced Land Use上，深层卷积神经网络模型达到了96.44%的识别率。结论本文提出的数据集具有一般性和代表性，使用的深层卷积神经网络模型具有很强的特征提取能力和分类能力，且是一种端到端的分类模型，不需要堆叠其他模型或分类器。在高分辨卫星图像的分类上，本文模型和对比模型相比取得了更有说服力的结果。

关键词

卫星图像; 分类; 卷积神经网络; 模型; 特征

Convolutional neural network models for high spatial resolution satellite imagery classification

Zhou Mingfei, Wang Xili, Wang Lei, Chen Fen

School of Computer Science, Shaanxi Normal University, Xi'an 710119, China

Supported by: National Natural Science Foundation of China (41171338, 41471280, 61401265)

Abstract

Objective Satellite imagery classification is a task that uses classification models to divide a set of satellite images into several classes. The satellite images discussed in this paper are collected from the Quickbird satellite imagery dataset. Satellite images are divided into six classes, namely, airplanes, dense residential areas, harbors, intersections, overpasses, and parking lots. Generally, the task of satellite imagery classification is difficult because of the complex targets and backgrounds in satellite images. Traditional methods, such as the artificial neural networks and support vector machines, usually use low-level and manually selected features. These features are insufficient and cannot represent the multi-level and intrinsic features of satellite images. Simultaneously, obtaining high accuracy is difficult through the classification methods, which use low-level features. Some deep learning methods use pre-trained convolutional neural networks to extract the high-level features of satellite images and some classifier to classify satellite images. These methods can improve their performance than the traditional methods. However, these methods ignore the inherent classification capability of convolutional neural networks because considerable labeled training data of satellite images are required to train a convolutional neural network, which could extract features and classify images simultaneously; however, training data are limited in practice. Other methods use a stack of shallow convolutional neural networks to classify satellite images. However, the stack of low-level features remains insufficient representative to substantially improve the classification accuracy of satellite image. In this paper, a new approach using deep convolutional neural networks is presented to improve the classification accuracy for satellite imagery. The classification accuracy of satellite images could be improved using the deep features extracted by convolutional neural networks. Method An end-to-end training and classification method is proposed. This method does not require additional classifiers and stack of shallow convolutional neural networks to improve the capability of feature extraction from satellite images. First, a new satellite imagery dataset, which contains six classes, is proposed to deal with the problem of lacking labeled training data. Second, three kinds of pre-trained deep convolutional neural network models and a directly trained shallow convolutional neural network model are used to perform the classification task for satellite images. The shallow model has low training weights and can be trained directly on the satellite image dataset to classify satellite images. The proposed three kinds of deep models should be pre-trained on an auxiliary dataset, because the amount of the training weights of the three deep models is too large to be directly trained on the proposed satellite images dataset. The three kinds of deep models are pre-trained on a large auxiliary dataset, which contains roughly 1 200 000 labeled training images of 1 000 classes. All of the images contain the common objects, which could be viewed everywhere in the daily life. The weights of the three deep architectures of convolutional neural networks can be trained adequately after pre-training on the large auxiliary dataset. The capability of the deep models to extract representative features and to classify images can be improved after pre-training, and the application objects of the models can be transferred from the daily common objects to the satellite image objects. The key point of such transformation is fine-tuning the pre-trained deep models on the proposed satellite image dataset. The architectures of the three deep models should be changed slightly and then they could be fine-tuned on the proposed dataset. After fine-tuning, the three deep convolutional neural network models could be used to classify satellite images directly without using other classifiers or stacked shallow models. Result The proposed convolutional neural network models are validated on two datasets. One of the testing satellite images datasets is the proposed dataset, and the other dataset is the famous UC Merced land use dataset. The four proposed models demonstrate high performance on the proposed dataset. The classification accuracies of the three deep models are higher than the accuracy of the shallow model. In particular, the deepest convolutional neural network model achieves the highest accuracy of 99.50% on the proposed dataset. The results of the UC Merced land use dataset of the three similar methods in the literature are compared with the results of the proposed models. Two of the three comparative methods use the features extracted from the pre-trained convolutional neural networks without fine-tuning and use an additional classifier to classify satellite images. The other method uses a multi-view convolutional neural network to perform the classification task. Experimental results indicate that the proposed deep models in this paper achieve the highest accuracy(96.44%) among all the models. Conclusion This paper proposes a new satellite image dataset, which is representative of satellite images. Convolutional neural networks could be trained adequately through the proposed satellite image dataset. Shallow convolutional neural networks can possibly be directly trained on this dataset. Pre-trained convolutional neural networks obtain better classification accuracy on other satellite imagery dataset after fine-tuning on the proposed dataset. Furthermore, the proposed deep convolutional neural network models are effective in terms of deep feature extraction and satellite image classification. The proposed models obtain more competitive results than other reported methods in literature. The proposed deep models exhibit good generalization capability and could achieve high accuracy on the UC Merced land use dataset, in which images differ from those in the fine-tuning dataset in terms of scale and quality. The effective pre-training and fine-tuning with the depth of the proposed deep models contribute to the good performance of the models. In addition, the proposed models are end-to-end models. Additional classifiers and stack of the shallow models are not required to classify satellite images.

Key words

satellite imagery; classification; convolutional neural networks; model; feature

0 引言

卫星图像分类指的是把包含不同目标的卫星图像分类为相应的类别。本文讨论的卫星图像主要指QuickBird卫星影像。由于卫星图像中的噪声、分辨率、形变和背景干扰等因素的影响，卫星图像的分类变得具有挑战性。传统的卫星图像分类方法主要是基于人工神经网络(ANN)^[1]和支持向量机(SVM)^[2]等机器学习方法。它们通常采用人工提取的特征，难以提取数据多尺度、高层次的特征，影响了传统方法对卫星图像的分类正确率。随着深度学习的发展，深度学习的方法也被应用到卫星图像的特征提取和分类中。通过深度学习方法，图像中不同类别的多层次特征被提取并应用到分类中，这对卫星图像分类正确率的提高带来了可能。

2015年Zou等人^[3]使用深度信念网络(DBN)进行卫星图像的特征提取和分类，Luus等人^[4]提出多尺度输入卷积神经网络(CNN)模型并把它应用到了卫星图像分类中，和非监督特征学习方法相比大幅度提高了分类正确率。同年，Hu等人^[5]把预训练CaffeNet^[6]等CNN模型中提取到的高层特征应用到传统的分类方法中，在高分辨卫星图像的数据集上得到了较高的分类正确率，比使用SIFT特征^[7]的方法更好。预训练CNN模型指的是把在ILSVRC(ImageNet^[8] large scale visual recognition challenge)^[9]挑战赛数据集上训练好的CNN模型用到卫星图像分类上。ILSVRC^[9]挑战赛中需要处理1000个类别的普通光学图像分类任务，图像的内容均为现实生活中的常见事物。Penatti等人^[10]堆叠了ILSVRC^[9]预训练OverFeat^[11]和CaffeNet^[6]等CNN模型，通过在卫星图像数据集上微调网络，取得了较高的卫星图像分类正确率，证明了CNN的特征提取具有鲁棒性，在日常常见目标上进行特征提取的CNN模型可以被迁移到卫星图像特征提取上。但是文章只使用了OverFeat^[11]模型和CaffeNet^[6]模型，没有给出更多的CNN模型在卫星图像上的分类结果。2016年，Marmanis等人^[12]把在ILSVRC^[9]数据集上预训练的CNN模型OverFeat^[11]当做特征提取器，并把特征输入另一个浅层网络进行分类，虽然提高了分类正确率，但是使用堆叠模型增加了结构复杂性和计算复杂度。

目前CNN模型在高分辨卫星图像分类领域已经有了许多应用，但是依然存在问题。首先，CNN模型的训练需要大量有标签样本。由于高分辨卫星图像分类领域的训练样本的缺乏，上述方法普遍使用交叉验证或者随机选取的方法在有限数据上进行模型的训练和测试，模型得不到充分的训练，并且结果的说服力不足。其次，由于有标签样本数量的限制，既不能直接训练CNN模型，也无法充分微调深层CNN模型。再次，上述方法普遍使用了CNN模型堆叠或CNN特征和传统分类器的堆叠，增加了模型和计算的复杂度。

针对以上几点进行了探索，构建基于卷积神经网络的端到端卫星图像分类模型。首先，提出了新的卫星图像数据集Satellite-2000，以期增加带标签的训练样本，公开数据集UC Merced Land Use^[13]中的样本全部用来测试，避免了训练样本参与测试的情形。其次，由于训练样本的数量得到保障，本文使用一种直接训练模型和3种微调预训练模型进行卫星图像分类并对比其结果。实验结果表明，通过微调，深层模型在卫星图像分类上取得了比对比方法更高的识别率。最后，使用端到端的分类模型，既不用把CNN网络提取的特征输入其他分类器进行分类，也不用堆叠多个CNN模型。

1 卷积神经网络模型

和基本的机器学习方法相比，CNN模型通过堆叠卷积层和池化层，使模型拥有更强的提取分层特征和高层特征的能力。CNN模型中的全连接层可以把特征信息转化成类别信息，使CNN模型在特征提取的同时也拥有强大的分类能力。一般情况下，训练样本越充足，CNN模型中的卷积层和全连接层的权值训练越充分，CNN模型的特征提取能力越强，分类正确率也越高。进行卫星图像分类时，由于目标和场景变得复杂，识别的难度就会大大提升，也就需要更多的训练样本来保证CNN的识别能力。

基于4种CNN模型Simple-AlexNet^[14]、CaffeNet^[6]、VGG-16^[15]和VGG-19^[15]建立了4种针对卫星图像分类的CNN模型。其中，Simple-AlexNet模型是浅层模型，待训练权值较少，可以在Satellite-2000训练集上直接训练。CaffeNet、VGG-16和VGG-19模型结构更加复杂，待训练权值较多。为了达到更好的识别效果，这3种模型先在ILSVRC-2012数据集上进行预训练。ILSVRC-2012数据集包含1 000个常见目标类别的总计120万幅图像。由于训练样本充足，复杂的CNN模型中的权值也可以得到充分的训练，模型的特征提取和分类能力可以得到保证。预训练完成后，CNN模型在Satellite-2000训练集上进行微调。微调是为了使预训练CNN模型进一步学习卫星图像的特征并进行分类。经过微调的模型可以直接用于卫星图像分类。

完成训练的四种模型在Satellite-2000测试集和UC Merced Land Use数据集上进行测试。本文方法的框架如图 1所示。图 1中，浅层模型表示Simple-AlexNet模型，深层模型包括CaffeNet模型、VGG-16模型和VGG-19模型。本节主要阐述浅层模型Simple-AlexNet的结构和深层模型的结构和预训练，模型的微调在下一节阐述。

图 1 本文提出的卫星图像分类框架

Fig. 1 The proposed framework of satellite images classification

1.1 CaffeNet模型

CaffeNet是AlexNet^[16]基于卷积神经网络工具箱Caffe(convolutional architecture for fast feature embedding)^[6]的一种实现，CaffeNet的模型结构如图 2所示。图 2中，“Input”表示输入层，其下的数字表示输入数据的通道数量和规模。“Conv”表示卷积层^[17]，“Pool”表示池化层，卷积层和池化层名称下的数字表示本层的特征图数量和大小。“Fc”表示全连接层，其下的数字表示本层的节点数量。

图 2 预训练CaffeNet的模型结构

Fig. 2 The architecture of pre-trained CaffeNet convolutional neural network model

CaffeNet模型包含1个数据输入层、5个卷积层(其中3个卷积层之后有池化层)和3个全连接层。输入层输入的是3通道的227×227像素的RGB图像。Fc8层的1 000个神经元对应网络的输出，即ILSVRC-2012数据集的1 000个分类类别。

1.2 VGG-16和VGG-19模型

VGG-16模型和VGG-19模型是对AlexNet模型的一种改进。和AlexNet相比，VGG系列模型的特点体现在两个方面：1) 所有的卷积层都使用非常小的感受野(3×3和1×1)；2) 模型拥有多个卷积层，在模型深度上远远超过AlexNet。

VGG-16模型的输入是224×224的RGB图像，采用了5个卷积池化组，分别有2、2、3、3、3个卷积层，每个卷积池化组有一个池化层。所有的卷积层都使用3×3大小的卷积核。5个卷积池化组之后是3个全连接层，分别有4 096、4 096和1 000个节点，最后一个全连接层是输出层，有1 000个节点，对应网络输出的1 000个类别。VGG-19模型同样采用了5个卷积池化组，不同的是，VGG-19模型中第3个、第4个和第5个卷积池化组都拥有4个连续的卷积层，每个卷积层的卷积核大小、个数和VGG-16模型相同。

通过引入小感受野和深层结构，可以使得CNN模型提取到更加准确和深层的特征表达，从而提升CNN模型的分类正确率。VGG模型在ILSVRC-2012和ILSVRC-2014挑战赛的数据集上获得了比AlexNet模型更高的分类正确率。

1.3 Simple-AlexNet模型

Simple-AlexNet是对传统的AlexNet在结构上的一种简化表达。Simple-AlexNet的设计是基于小样本的情形。AlexNet需要使用百万级的训练样本来训练网络中的权值，而Simple-AlexNet通过简化结构减少了训练权值的数量，使用万级的数据就可以完成对Simple-AlexNet的训练。Simple-AlexNet输入的是227×227的RGB图像，输入层之后是第1个卷积层，有96个7×7的卷积核。第2个卷积层有256个5×5的卷积核，第3个卷积层有384个3×3的卷积核，每个卷积层之后有一个池化层。之后是2个全连接层，分别有512个节点，全连接层之后是输出层。

Simple-AlexNet模型结构最简单，模型层数最少，易于训练，但是简单的结构同时限制了模型提取高层特征的能力。从CaffeNet到VGG-16再到VGG-19，模型的深度逐步增加。经过充分的预训练，深层模型和浅层模型相比可以提取到更高层、更具有鲁棒性的特征。另外，由于模型的全连接层也得到了充分的训练，深层模型的分类正确率要高于浅层模型。3种深层模型的预训练使用Caffe工具箱^[6]在ILSVRC-2012数据集上完成，经过预训练，CaffeNet、VGG-16和VGG-19模型的Top-1错误率分别为42.6%、27.0%和25.5%，3种模型的Top-5错误率分别为19.6%，8.8%和8.0%。其中Top-1错误率计算错分测试样本占测试样本总数的比例，Top-5错误率计算正确类别在模型输出的前五类别之外的测试样本占样本总数的比例。

2 微调预训练模型进行卫星图像分类

完成预训练的3种深层CNN模型在本文提出的Satellite-2000训练集上进行微调，以使其能够完成卫星图像分类的任务^[18]。由于本文Satellite-2000数据集的提出，扩充了训练样本的数量，使CNN模型的微调可以进行得更加充分。

2.1 Satellite-2000数据集

UC Merced Land Use数据集^[13]是目前卫星图像识别领域广泛使用的数据集，数据集中包含21类卫星图像目标，每个类别有100幅图像。图像的分辨率是1英尺，图像的大小为256×256像素。UC Merced Land Use数据集中的图像是从USGS National Map Urban Area Imagery Collection卫星图像库中人工选取的。

本文提出的Satellite-2000数据集的训练集包含6种卫星图像目标，分别是Airplane、Dense Residential、Harbor、Intersection、Overpass和Parking Lot，这6个类别也包含在UC Merced Land Use数据集中。每个类别采集了500幅图像，经过顺时针90°，180°和270°的旋转，图像的数量得到了扩充，所以每个类别最终拥有2 000幅图像。Satellite-2000数据集的测试集中包含6类图像，类别和训练集相对应，每个类别包含100幅图像。需要注意的是，测试集中的图像并没有使用旋转的方法来扩充数据量。Satellite-2000数据集中图像的大小为256×256像素。和UC Merced Land Use数据集不同的是，Satellite-2000数据集是从Google Earth上人工选取的卫星图像。虽然两个数据集中的图像拥有相同的大小和近似的分辨率，但是Satellite-2000数据集中的图像更加清晰，图像的质量更高。

Satellite-2000数据集的提出是为了解决UC Merced Land Use数据集的样本不足问题。在之前的文献中普遍使用的是交叉验证的方法，在UC Merced Land Use数据集上同时进行CNN模型的训练和测试。本文建立Satellite-2000数据集，为CNN模型的训练提供了更多的选择，CNN模型的训练和测试可以分开，分别在Satellite-2000数据集的训练集和测试集上进行。经过数据量的扩充，不仅微调的CNN模型可以得到更高的测试集分类正确率，而且小型的CNN模型也可以直接在Satellite-2000数据集上进行训练和测试。图 3为UC Merced Land Use数据集和Satellite-2000数据集的部分图像。

图 3 UC Merced Land Use数据集和Satellite-2000数据集的部分图像

Fig. 3 Part of the images in the UC Merced Land Use dataset and the Satellite-2000 dataset

((a)UC Merced Land Use; (b)Satellite-2000 training set; (c)Satellite-2000 testing set)

2.2 微调预训练CNN模型

在本文使用的Simple-AlexNet、CaffeNet、VGG-16和VGG-19这4种CNN模型中，Simple-AlexNet模型是一个浅层CNN模型，其权值数量明显少于其他3种模型。因此，本文中Simple-AlexNet模型在Satellite-2000训练集上直接训练，而不是微调。Simple-AlexNet模型的引入是为了和其他微调预训练模型进行对比。

CaffeNet、VGG-16和VGG-19模型经过预训练之后，已经在ILSVRC-2012数据集中的1 000类目标上拥有了很好的识别率。为了把使用常见目标预训练的CNN模型迁移到卫星图像的分类上，在Satellite-2000数据集上对CNN模型进行了微调。

对CNN模型的微调分为两个阶段。首先，要改变预训练CNN模型的结构。CNN模型的结构可以分为卷积池化层和全连接层，其中，卷积池化层负责特征提取和特征映射，即提取输入数据的高层特征表达。全连接层的作用是把卷积池化层提取出的特征表达映射为分类类别，其中，全连接层的最后一层输出分类的结果。结构的改变是指要更改预训练CNN模型的输出层(Fc8层)的结构。因为预训练时CNN模型进行的是1 000个类别的分类，因此CNN模型的输出层要使用1 000个节点，而在卫星图像分类上CNN模型输出的是6个类别的信息，因此CNN模型的输出层节点要改为6个。更改结构之后的所有CNN模型的配置如表 1所示。表格中“Input”表示输入层，“Conv”表示卷积层，“MaxPool”表示使用最大值池化的池化层，“Fc”表示全连接层，“Soft-max”表示CNN模型使用的分类器是Soft-Max。“Input”之后的数字是输入图像的规模，“Conv”之后的数字是卷积核的规模和数量，“Fc”之后的数字表示的是全连接层的节点数量。

表 1 微调预训练CNN模型的网络结构配置
Table 1 The architectures of the fine-tuning pre-trained CNN models

下载CSV

Simple-AlexNet	CaffeNet	VGG-16	VGG-19
Input 227×227×3	Input 227×227×3	Input 224×224×3	Input 224×224×3
Conv7-96	Conv11-96	Conv3-64	Conv3-64
MaxPool	MaxPool	Conv3-64	Conv3-64
Conv5-256	Conv5-256	MaxPool	MaxPool
MaxPool	MaxPool	Conv3-128	Conv3-128
Conv3-384	Conv3-384	Conv3-128	Conv3-128
MaxPool	Conv3-384	MaxPool	MaxPool
Fc-512	Conv3-256	Conv3-256	Conv3-256
Fc-512	MaxPool	Conv3-256	Conv3-256
Fc-6	Fc-4096	Conv3-256	Conv3-256
Soft-Max	Fc-4096	MaxPool	Conv3-256
	Fc-6	Conv3-512	MaxPool
	Soft-Max	Conv3-512	Conv3-512
		Conv3-512	Conv3-512
		MaxPool	Conv3-512
		Conv3-512	Conv3-512
		Conv3-512	MaxPool
		Conv3-512	Conv3-512
		MaxPool	Conv3-512
		Fc-4096	Conv3-512
		Fc-4096	Conv3-512
		Fc-6	MaxPool
		Soft-Max	Fc-4096
			Fc-4096
			Fc-6
			Soft-Max

确定好网络的结构之后就是网络的微调训练。微调训练是指在Satellite-2000训练集上重新训练CNN模型。在训练时，只有最后一个全连接层(输出层)的权值使用高斯随机数初始化，CNN模型的其他权值均使用预训练模型的相应权值来初始化。模型在Satellite-2000训练集上完成微调，并使用Satellite-2000测试集和UC Merced Land Use数据集进行分类正确率测试。

3 实验结果

本文的Simple-AlexNet模型在Satellite-2000训练集上进行训练，在Satellite-2000测试集和UC Merced Land Use数据集上进行测试。CaffeNet、VGG-16和VGG-19预训练模型在Satellite-2000训练集上进行微调，在Satellite-2000测试集和UC Merced Land Use数据集上进行测试。

3.1 Satellite-2000测试集实验结果

本文模型的训练和微调使用24 000次迭代，Simple-AlexNet、CaffeNet和VGG-16模型使用64幅图像一批的规模进行训练，VGG-19模型使用32幅图像一批的规模进行训练，每训练1 000次迭代对网络进行一次测试。网络训练的基础学习率为0.001。网络的训练和微调使用的是Caffe工具箱，使用NVIDIA Tesla K40 GPU进行训练。CNN模型在Satellite-2000训练集上进行训练和微调，在Satellite-2000测试集上进行测试，CNN模型的分类正确率如表 2所示。

表 2 CNN模型在Satellite-2000测试集上的正确率
Table 2 The classification accuracies of CNN models on the Satellite-2000 testing dataset

下载CSV

CNN模型	正确率/%
Simple-AlexNet	95.56
预训练CaffeNet	97.96
预训练VGG-16	97.88
预训练VGG-19	99.50

从表 2可以看出，4个模型都取得了很高的正确率，预训练VGG-19模型取得了最高的正确率，其正确率已经接近100%。总体来看，预训练模型的分类正确率要高于直接训练模型的正确率，这是因为更深的模型结构带来了更强的特征提取能力。另外，经过模型预训练，模型的权值被充分训练，经过对模型的微调，使CNN模型学习到了卫星图像独有的特征表达。这些因素使得在ILSVRC-2012数据集上预训练的模型经过微调也可以很好地提取卫星图像中的高层特征。图 4列出了4种模型训练和微调时的Loss曲线和Accuracy曲线。

图 4 使用Satellite-2000测试集时CNN模型结果曲线

Fig. 4 The curves of training models on the Satellite-2000 testing dataset

((a)Simple-AlexNet; (b)Caffe Net; (c)VGG-16;(d)VGG-19)

从Loss曲线和Accuracy曲线中可以看出，VGG模型相比其他两种模型更加稳定，其中VGG-19模型Loss下降最快，Accuracy曲线上升最快，而且训练集Loss和测试集Loss拟合程度最高。这表示VGG-19模型的特征提取能力最强，使用微调预训练VGG-19模型时卫星图像的分类正确率最高。由于VGG-19模型在预训练时提取到了广泛适用于日常物体并具有迁移性的深层权值特征，经过微调可以快速用于卫星图像分类，因此VGG-19模型的Loss曲线和Accuracy曲线陡峭且平滑。VGG-16模型的测试Loss曲线出现一次震荡，主要是因为训练初期权值调整过大产生过拟合，不过之后恢复平稳。CaffeNet模型的训练Loss曲线和Accuracy曲线出现了多次震荡，这是因为为了加快网络收敛，设置的学习率偏大，并且预训练时CaffeNet学习到的权值鲁棒性较差，不能快速迁移到卫星图像分类上。训练后期CaffeNet模型的Loss曲线和Accuracy曲线趋于平稳。Simple-AlexNet模型Loss曲线和Accuracy曲线相对平稳，但是由于模型深度的限制分类正确率难以有进一步的提升。

3.2 UC Merced Land Use数据集实验结果

模型的训练和微调使用24 000次迭代，Simple-AlexNet、CaffeNet和VGG-16模型使用64幅图像一批的规模进行训练，VGG-19模型使用32幅图像一批的规模进行训练，每训练1 000次迭代对网络进行一次测试。网络训练的基础学习率为0.001。网络在Satellite-2000训练集上进行训练和微调，在UC Merced Land Use数据集上进行测试。

本文模型将会和其他使用UC Merced Land Use数据集的模型进行正确率上的对比。需要注意的是，本文模型使用的是UC Merced Land Use数据集中6个类别的平均正确率，其他模型使用的是UC Merced Land Use数据集中全部21个类别的平均正确率。本文模型中UC Merced Land Use数据集中的所有图像都用来测试且不参与训练，而其他模型的正确率结果是交叉验证的结果。预训练OverFeat模型^[12]使用预训练的CNN模型OverFeat进行卫星图像的分类，Multiview-CNN^[4]使用多尺度输入的方法来提高CNN模型的正确率，预训练AlexNet模型^[5]使用经典的预训练AlexNet模型进行卫星图像的分类。表 3给出了各个模型的分类正确率。图 5给出了本文4种模型使用UC Merced Land Use数据集进行测试时的Loss曲线图和Accuracy曲线图。

表 3 CNN模型在UC Merced Land Use数据集上的正确率
Table 3 The classification accuracies of CNN models on the UC Merced Land Use dataset

下载CSV

CNN模型	正确率/%
预训练AlexNet^[5]	94.37
预训练OverFeat^[12]	92.40
Multiview-CNN^[4]	93.48
Simple-AlexNet	86.88
预训练CaffeNet	90.24
预训练VGG-16	93.64
预训练VGG-19	96.44

图 5 使用UC Merced Land Use数据集时CNN模型结果曲线

Fig. 5 The curves of training models on the UC Merced Land Use dataset

((a)Simple-AlexNet; (b)Caffe Net; (c)VGG-16;(d)VGG-19)

从表 3可以看出，本文的3种预训练模型都取得了较高的分类正确率，其中VGG-19模型在所有模型中获得了最高的正确率。另外，从实验结果可以看出，微调预训练模型的正确率要明显高于直接训练模型的正确率。这表明微调预训练模型，特别是深层CNN模型VGG-19，可以提取到输入图像的更加鲁棒的特征表达，即使在完全没有训练过的、在尺度和质量上和训练样本存在差异的数据集上进行测试，深层模型也能表现出很好的性能。

和预训练AlexNet、预训练OverFeat和Multiview-CNN等在UC Merced Land Use数据集上使用交叉验证或者随机选取训练、测试样本的模型相比，本文的VGG-19模型也取得了最高的分类正确率。实验结果首先论证了作者建立的Satellite-2000数据集对于卫星图像来说具有一般性和代表性。在Satellite-2000数据集上训练和微调的CNN模型在其他卫星数据集上也能取得很高的正确率。另外，因为本文模型一次性使用UC Merced Land Use数据集上的所有样本进行测试，而不是使用交叉验证和随机选取测试样本的方法进行测试，所以本文模型在UC Merced Land Use数据集上得出的识别率更具说服力。其次，本文建立的微调预训练模型的端到端训练机制是有效的，经过微调的预训练模型不需要借助其他的分类器，也不需要进行模型的堆叠，就可以获得很高的分类正确率。最后，和文献中的模型相比，深度模型VGG-19具有最强的特征提取能力和迁移能力，这表明模型的深度对特征提取和分类正确率的提高有很大的影响。

从图 5可以看出，使用VGG-19模型时网络的收敛速度最快，测试集的Loss曲线下降最多。而且，使用VGG-19模型时在UC Merced Land Use数据集上获得了最高的分类正确率。虽然Satellite-2000数据集和UC Merced Land Use数据集在图像的质量、色彩、背景和纹理上都有不同，但是由于VGG-19深层模型有很强的特征提取能力，通过在Satallite-2000数据集上进行微调，VGG-19模型可以在UC Merced Land Use数据集上获得较高的分类正确率。并且，使用VGG-19模型进行微调时Loss曲线和Accuracy曲线前期变化较快，后期更加平稳。使用VGG-16模型和CaffeNet模型进行微调时，由于预训练时模型的权值学习鲁棒性和迁移性不够强，当训练数据集和测试数据集存在较大差异时，细微的权值调整可能造成Loss曲线和Accuracy曲线较大的波动，这种波动在CaffeNet模型的微调中更加明显。而Simple-AlexNet模型由于深度的限制，在训练数据集上训练出的权值不能很好地提取测试数据集中图像的特征，造成Loss曲线和Accuracy曲线的波动，分类正确率难以提高。

4 结论

本文提出了Satellite-2000卫星图像数据集，为CNN模型用于卫星图像分类时的训练和微调提供了更多的可能。由于样本数量的扩充，本文提出的4种模型在Satellite-2000测试集上获得了很高的分类正确率。另外，本文提出的Satellite-2000数据集也具有推广能力，在Satellite-2000数据集上进行训练的模型在其他数据集上也能获得很高的测试正确率。其次，本文提出了一种有效的端到端CNN模型训练和分类机制，不需要借助其他分类器和模型的堆叠就可以获得很高的分类正确率。最后，本文微调的预训练VGG-19模型和其他所有模型相比拥有最强的特征提取能力，通过在Satellite-2000训练集上进行微调，VGG-19模型在Satellite-2000测试集上取得了接近100%(99.50%)的识别率，在UC Merced Land Use数据集上更是取得了高于对比模型的识别率(96.44%)。

总体来看，深层CNN模型具有强大的特征提取能力和分类能力，可以提取卫星图像中的高层次的、具有代表性的特征。但是卷积神经网络模型，特别是深层模型，对训练样本存在很大的依赖性。网络的训练需要大量样本的支持，而卫星图像处理领域严重缺乏含有人工标注的数据集。本文的后续工作将围绕CNN模型在遥感领域的应用展开，探索CNN模型在遥感图像分割和目标检测领域的应用^[19-21]。为了解决样本缺乏的问题，研究的重点将放在弱监督学习和最大化利用现有样本上。

参考文献

[1] Mahmon N A, Ya'Acob N. A review on classification of satellite image using artificial neural network(ANN)[C]//Proceedings of 2014 IEEE the 5th Control and System Graduate Research Colloquium. Shah Alam, Malaysia: IEEE, 2014: 153-157. [DOI: 10.1109/ICSGRC.2014.6908713]

[2] Hwang J T, Chang K T, Chiang H C. Satellite image classification based on Gabor texture features and SVM[C]//Proceedings of 2011 the 19th International Conference on Geoinformatics. Shanghai, China: IEEE, 2011: 1-6. [DOI: 10.1109/GeoInformatics.2011.5980774]

[3] Zou Q, Ni L H, Zhang T, et al.Deep learning based feature selection for remote sensing scene classification[J]. IEEE Geoscience and Remote Sensing Letters, 2015, 12(11): 2321–2325. [DOI:10.1109/LGRS.2015.2475299]

[4] Luus F P S, Salmon B P, van den Bergh F, et al.Multiview deep learning for land-use classification[J]. IEEE Geoscience and Remote Sensing Letters, 2015, 12(12): 2448–2452. [DOI:10.1109/LGRS.2015.2483680]

[5] Hu F, Xia G S, Hu J W, et al.Transferring deep convolutional neural networks for the scene classification of high-resolution remote sensing imagery[J]. Remote Sensing, 2015, 7(11): 14680–14707. [DOI:10.3390/rs71114680]

[6] Jia Y Q, Shelhamer E, Donahue J, et al. Caffe: convolutional architecture for fast feature embedding[C]//Proceedings of the 22nd ACM International Conference on Multimedia. New York, NY, USA: ACM, 2014: 675-678. [DOI: 10.1145/2647868.2654889]

[7] Lowe D G.Distinctive image features from scale-invariant keypoints[J]. International Journal of Computer Vision, 2004, 60(2): 91–110. [DOI:10.1023/B:VISI.0000029664.99615.94]

[8] Deng J, Dong W, Socher R, et al. Imagenet: a large-scale hierarchical image database[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Miami, Florida: IEEE, 2009: 248-255. [DOI: 10.1109/CVPR.2009.5206848]

[9] Russakovsky O, Deng J, Su H, et al.Imagenet large scale visual recognition challenge[J]. International Journal of Computer Vision, 2015, 115(3): 211–252. [DOI:10.1007/s11263-015-0816-y]

[10] Penatti O A B, Nogueira K, dos Santos J A. Do deep features generalize from everyday objects to remote sensing and aerial scenes domains?[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition Workshops. Boston, MA, USA: IEEE, 2015: 44-51. [DOI: 10.1109/CVPRW.2015.7301382]

[11] Sermanet P, Eigen D, Zhang X, et al. OverFeat: integrated recognition, localization and detection using convolutional networks[C]// International Conference on Learning Representations. Banff, Canada: ICLR, 2014.

[12] Marmanis D, Datcu M, Esch T, et al.Deep learning earth observation classification using imageNet pretrained networks[J]. IEEE Geoscience and Remote Sensing Letters, 2016, 13(1): 105–109. [DOI:10.1109/LGRS.2015.2499239]

[13] Yang Y, Newsam S. Bag-of-visual-words and spatial extensions for land-use classification[C]//Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems. New York, NY, USA: ACM, 2010: 270-279. [DOI: 10.1145/1869790.1869829]

[14] Levi G, Hassncer T. Age and gender classification using convolutional neural networks[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition Workshops. Boston, MA, USA: IEEE, 2015: 34-42. [DOI: 10.1109/CVPRW.2015.7301352]

[15] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.

[16] Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks[C]//Proceedings of the 25th International Conference on Neural Information Processing Systems. Lake Tahoe, Nevada: ACM, 2012: 1097-1105.

[17] Lecun Y, Bottou L, Bengio Y, et al.Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278–2324. [DOI:10.1109/5.726791]

[18] Zhong Y F, Fei F, Zhang L P.Large patch convolutional neural networks for the scene classification of high spatial resolution imagery[J]. Journal of Applied Remote Sensing, 2016, 10(2): #025006. [DOI:10.1117/1.JRS.10.025006]

[19] Basaeed E, Bhaskar H, Hill P, et al.A supervised hierarchical segmentation of remote-sensing images using a committee of multi-scale convolutional neural networks[J]. International Journal of Remote Sensing, 2016, 37(7): 1671–1691. [DOI:10.1080/01431161.2016.1159745]

[20] Zhang P Z, Gong M G, Su L Z, et al.Change detection based on deep feature representation and mapping transformation for multi-spatial-resolution remote sensing images[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2016, 116: 24–41. [DOI:10.1016/j.isprsjprs.2016.02.013]

[21] Längkvist M, Kiselev A, Alirezaie M, et al.Classification and segmentation of satellite orthoimagery using convolutional neural networks[J]. Remote Sensing, 2016, 8(4): 329. [DOI:10.3390/rs8040329]