发布时间: 2019-04-16
摘要点击次数:
全文下载次数:
DOI: 10.11834/jig.180422
2019 | Volume 24 | Number 4

遥感图像处理

双卷积池化结构的3D-CNN高光谱遥感影像分类方法

李冠东¹, 张春菊¹, 高飞¹, 张雪英²

1. 合肥工业大学土木与水利工程学院, 合肥 230009;

2. 南京师范大学虚拟地理环境教育部重点实验室, 南京 210046

收稿日期: 2018-07-04; 修回日期: 2018-10-07

基金项目: 国家自然科学基金项目（41401451，41671456，41671393）

第一作者简介: 李冠东, 1993年生, 男, 硕士研究生, 主要研究方向为深度学习处理遥感影像。E-mail:leeguandon@gmail.com;
高飞, 男, 教授, 主要研究方向为精密工程测量与误差理论及全球卫星定位系统。E-mail:gaofei@hfut.edu.cn;
张雪英, 女, 教授, 主要研究方向为空间数据挖掘和地理信息系统。E-mail:zhangsnowy@163.com.

中图法分类号: TP237

文献标识码: A

文章编号: 1006-8961(2019)04-0639-16

摘要

目的高光谱遥感影像数据包含丰富的空间和光谱信息，但由于信号的高维特性、信息冗余、多种不确定性和地表覆盖的同物异谱及同谱异物现象，导致高光谱数据结构呈高度非线性。3D-CNN（3D convolutional neural network）能够利用高光谱遥感影像数据立方体的特性，实现光谱和空间信息融合，提取影像分类中重要的有判别力的特征。为此，提出了基于双卷积池化结构的3D-CNN高光谱遥感影像分类方法。方法双卷积池化结构包括两个卷积层、两个BN（batch normalization）层和一个池化层，既考虑到高光谱遥感影像标签数据缺乏的问题，也考虑到高光谱影像高维特性和模型深度之间的平衡问题，模型充分利用空谱联合提供的语义信息，有利于提取小样本和高维特性的高光谱影像特征。基于双卷积池化结构的3D-CNN网络将没有经过特征处理的3D遥感影像作为输入数据，产生的深度学习分类器模型以端到端的方式训练，不需要做复杂的预处理，此外模型使用了BN和Dropout等正则化策略以避免过拟合现象。结果实验对比了SVM（support vector machine）、SAE（stack autoencoder）以及目前主流的CNN方法，该模型在Indian Pines和Pavia University数据集上最高分别取得了99.65%和99.82%的总体分类精度，有效提高了高光谱遥感影像地物分类精度。结论讨论了双卷积池化结构的数目、正则化策略、高光谱首层卷积的光谱采样步长、卷积核大小、相邻像素块大小和学习率等6个因素对实验结果的影响，本文提出的双卷积池化结构可以根据数据集特点进行组合复用，与其他深度学习模型相比，需要更少的参数，计算效率更高。

关键词

3D-CNN; 双卷积池化结构; 空谱联合特征; 高光谱影像分类; 正则化策略

Doubleconvpool-structured 3D-CNN for hyperspectral remote sensing image classification

Li Guandong¹, Zhang Chunju¹, Gao Fei¹, Zhang Xueying²

1. School of Civil Engineering, Hefei University of Technology, Hefei 230009, China;

2. Key Laboratory of Virtual Geographical Environment(Nanjing Normal University), Ministry of Education, Nanjing 210046, China

Supported by: National Natural Science Foundation of China (41401451, 41671456, 41671393)

Abstract

Objective Hyperspectral remote sensing image data are rich in spatial and spectral information. Continuous spectral segment information enhances the capability to distinguish between ground objects. This information has been widely used in the fields of image classification, target detection, agricultural monitoring, and environmental management. However, the data structure of hyperspectral remote sensing image is highly nonlinear due to the high-dimensional characteristics of the signal, information redundancy, and multiple uncertainties. Some classification models based on statistical patterns are difficult to classify and recognize original hyperspectral data directly. Training samples for supervised learning are extremely limited. A Hughes phenomenon occurs for a limited number of training samples, that is, the classification accuracy decreases as feature dimension increases. The traditional pixel-level hyperspectral remote sensing image classification method mostly adopts the framework of feature extraction and classifier. In the feature extraction, a series of spectral feature dimension reduction methods is proposed for the high spectral characteristics of hyperspectral data. However, these methods cannot solve the nonlinear problem of hyperspectral data. Some methods only use spectral information, which will greatly neglect the rich spatial structural information of high-resolution images. Classification results often have many discrete isolated points, and the classification accuracy is greatly reduced. Therefore, introducing spatial information is necessary. In recent years, image classification based on deep learning has become a research hotspot. In comparison with the traditional artificial design features, it can automatically extract the abstract features from the bottom to the high-level semantics and convert the images into easily recognizable advanced features. At present, mainstream methods include the use of image input 2D-CNN(2D convolutional neural network) after PCA(principal component analysis) dimensionality reduction and fusion with spectral information in the subsequent stage, to achieve the extraction of spatial spectrum information. However, these methods require separate extraction of spatial and spectral information, do not take advantage of the combined spatial-spectral information, and require complex preprocessing. Moreover, 3D-CNN is used to extract spatial-spectral information simultaneously. The 3D-CNN simultaneously acquires the spectral and spatial information of hyperspectral remote sensing images and utilizes the characteristics of hyperspectral remote sensing image data cubes to achieve full fusion of spectral and spatial information. It extracts important discriminative features from the classification and effectively solves the problem of spatial homogeneity and heterogeneity. Therefore, the use of 3D-CNN for spatial and spectral information extraction of hyperspectral remote sensing images has become the development trend of image classification. However, such methods use only simply stacked CNNs, do not fully consider the excellent features of 3D-CNN, and has low model scalability. This study proposes a 3D-CNN model based on a doubleconvpool structure. Method Doubleconvpool structure includes two convolution layers, two BN(batch normalization) layers, and one pooling layer. It not only considers the lack of label data in hyperspectral images but also the balance between the high-dimensional characteristics of hyperspectral images and model depth. Contrary to the use of only spatial or spectral information, the model fully uses the semantic information provided by the spatial-spectral information, thereby facilitating the feature extraction of hyperspectral images with small samples and high-dimensional characteristics. In a 3D-CNN based on doubleconvpool structure, the 3D remote sensing image without feature engineering is used as input data, and the deep learning model is trained in an end-to-end approach without complicated preprocessing. Moreover, the model uses regularization strategies, such as BN and Dropout, to avoid overfitting. We use the dual convolution pooling structure as a standard component of the network, and its number is used as an important hyperparameter of the network. For images with different data characteristics, acceptable classification results are achieved by rationally designing the number of doubleconvpool structures. The proposed method avoids the hyperparameter setting for the network when applied on different datasets and must greatly modify the network parameters. It also enhances the scalability of the network. Result The experiment compares SVM(support vector machine), SAE(stack autoencoder), and the current mainstream CNN method. The model has achieved 99.65% and 99.82% of the overall classification accuracy on the Indian Pines and Pavia University datasets, respectively. It effectively improves the classification accuracy of hyperspectral remote sensing images. We analyze and discuss the number of doubleconvpool structures, the regularization strategy, the spectral sampling stride of the first-layer convolution, the size of the convolution kernel, the size of neighboring pixel blocks, and the learning rate to provide a reasonable model under different constraints, such as training time and computational cost. Conclusion The doubleconvpool structure can be combined and multiplexed according to the characteristics of datasets. In comparison with other deep learning models, it requires less parameters and has higher computational efficiency. It further illustrates the deep learning, particularly the application potential of 3D-CNN on hyperspectral images.

Key words

3D-CNN; doubleconvpool structure; spatial-spectral information; hyperspectral image classification; regularization strategy

0 引言

高光谱遥感影像数据是一个包含光谱和空间信息的数据立方体, 能够获取地表物体上百个连续谱段的信息，提供丰富的光谱信息来增强对地物的区分能力^[1-2]。高光谱遥感影像分类是给影像中的每个像元赋予唯一的类别标识。由于光谱的高维特性、信息冗余和地表覆盖的同物异谱及同谱异物现象，导致高光谱数据结构呈高度非线性。基于统计模式识别的分类模型难以直接对原始高光谱数据进行分类识别^[3]，且参与监督学习的训练样本十分有限，造成分类精度随着特征维数上升而下降的Hughes现象。因此，针对高光谱影像分类问题，既要考虑分类模型的有效性，也要充分利用丰富的空间和光谱信息^[4]。

传统的像素级高光谱遥感影像分类方法多采用特征提取和分类器的框架。在特征提取上，早期针对光谱维度高的特点，提出了主成分分析(PCA)^[5]、独立成分分析(ICA)^[6]等一系列光谱特征降维的方法，但不能解决高光谱数据存在的非线性问题。流形学习^[7]通过获得高光谱数据的内在结构，实现特征的非线性降维，从而提升高光谱数据的分类精度。但高光谱影像分类仅利用光谱信息，会极大地忽略图像丰富的空间结构信息，分类结果往往具有很多离散的孤立点，使得分类精度大大降低^[8]。部分学者联合光谱信息和空间信息，设计了基于全变差优化的稀疏图结构^[9]和基于稀疏表示的扩展形态学属性断面算法^[10]，实现高光谱遥感影像的分类。在分类器上，常见的分类器包括支持向量机(SVM)、最小距离分类和人工神经网络(NN)^[11]等。

近年来，基于深度学习的图像分类成为研究热点。与传统的人工设计特征相比，它能够自动提取从底层到高层语义的抽象特征，将图像转换成更容易识别的高级特征，经分类器实现图像像素到标签的映射。例如，堆叠自动编码器(SAE)^[12]和深度置信网络(DBN)^[13]利用无监督方法学习高光谱图像特征，对空谱特征进行提取和分类。虽然SAE和DBN可以按照分层训练的方式分层次地提取深度特征，但由图像块构成的训练样本必须平展到1维以满足这些模型全连接层的输入要求，从而会丢失原图像的空间信息。Zhao等人^[14]提出使用主成分分析法对原始高光谱图像进行降维，利用2D-CNN(2D convolutional neural network)在降维图像上提取空间特征，使用平衡局部判别嵌入(BLDE)算法提取光谱信息，最终将空间和光谱信息融合并输入分类器以提高分类精度。Yue等人^[15]利用深度CNN提取深度特征，再结合逻辑回归分类器(LRC)进行分类。但2D-CNN等方法^[16]需要将空间和光谱信息分开提取，不能充分利用联合的空间和光谱信息，且需要复杂的预处理。Zhong等人^[17]基于3D-CNN，提出使用有监督空谱残差网络SSRN，设计了连续的空间和光谱残差模块分别提取空间和光谱信息。但3D-CNN本身具有同时在空间和光谱维度上采样的能力，因此单独设计特征提取模块使得网络设计冗余。本文采用3D-CNN作为基础结构，设计了基于双卷积池化结构的高光谱遥感影像分类方法。双卷积池化结构采用3×3×3卷积，在空间和光谱维上同时采样，参数量更小，网络更精简。该方法从高光谱遥感影像中提取空谱联合特征，既考虑高光谱遥感影像标签数据的缺乏问题，也考虑高光谱影像高维特性和模型深度之间的平衡问题。将双卷积池化结构作为网络标准组件，将数目作为网络的一个重要的超参数，针对不同数据特点的影像，通过合理设计双卷积池化结构的数目达到良好的分类结果。该结构避免在不同数据集上应用时，针对网络的超参数设定，需要大幅修改网络的参数，增强了网络的可扩展性。该模型将3D高光谱数据作为输入，不需要复杂的预处理和后处理，对产生的深度学习模型以端到端的方式训练。在同样的尺寸下，相较于其他深度学习模型，双卷积池化结构需要更少的参数，同时有效提高了分类处理的精度和效率。为了避免过拟合现象，模型采用了包括BN(batch normalization)^[18]、Dropout^[19]和激活函数^[20]在内的一系列正则化策略。

1 基于双卷积池化结构的3D-CNN高光谱影像分类方法

1.1 3D-CNN结构

2D-CNN^[21]最重要的优势是可以直接从原始2D输入图像中提取有效特征，在卷积层中执行2D卷积操作，从而提取上一层特征图的局部邻域特征。为了充分利用深度学习自动提取高光谱遥感影像特征的优势，3D-CNN利用3D卷积核进行卷积操作，不仅在空间域上采样，而且沿着光谱维度进行采样，同时提取空间和光谱信息。

在3D-CNN中，神经元在$(x, y, z)$的值$v^{xyz}_{ij}$为

$ v_{ij}^{xyz} = g\left( {\sum\limits_m^{} {\sum\limits_{p = 0}^{{P_i} - 1} {\sum\limits_{q = 0}^{{Q_i} - 1} {\sum\limits_{r = 0}^{{R_i} - 1} {} } } } w_{ijm}^{pqr}v_{\left( {i - 1} \right)m}^{\left( {x + p} \right)\left( {y + q} \right)(z + r)} + {b_{ij}}} \right) $

(1)

式中，$i$表示第$i$层神经元，$j$表示第$j$个特征图，$P_{i}$和$Q_{i}$是卷积核的高和宽，$R_{i}$是卷积核沿着光谱维度的维数大小，$m$表示与前一层中连接的特征个数，与每层的特征维度相关。$w^{pqr}_{ijm}$是与第$m$个特征中第$(p, q, r)$个神经元连接的权重，$b_{ij}$是在第$i$层神经元上第$j$个特征图的偏差值。$g$是激活函数，本文使用在小数据集上非常有效的Relu函数。

1.2 双卷积池化结构

3D-CNN深度学习模型由多层非线性神经元组成，可以从样本图像中学习具有代表性的高级抽象特征。由于CNN的特征表达能力很强^[22]，针对小样本训练数据，伴随网络深度的增加，模型的分类精度会随着卷积层的增加而下降，但是限制模型深度可能会降低分类精度。因此，本文设计了处理小样本高光谱影像数据的双卷积池化结构(doubleconvpool structure)(图 1)，包括2个卷积层、2个BN层和1个池化层。该双卷积设计，在连续2个卷积层之间不设置池化层，实现特征信息的保留和传递。在3D卷积核上设置L2正则，以避免在神经网络深度有限的前提下，因卷积层密集提取产生过拟合情况。在双卷积层之后设置池化层，采用Maxpooling^[23]池化，池化操作在保持特征不变性下，可以有效减少参数数量，更有利于模型收敛，但因其本质是max操作，频繁引入会对空间信息造成一定的损坏。在每个卷积之后，设置BN层实现数据归一化操作。在3D卷积之后设计激活函数，激活函数产生非线性操作，进一步增加神经网络的复杂性。实验表明，3×3×3的小卷积核是处理3维时空特征的有效3D卷积核^[24]，因此在该结构中，3D卷积核在空间采样中均使用3×3的小卷积核，在光谱维数上，随着神经网络深度的增加不断降低光谱采样维度。

图 1 本文设计的双卷积池化结构图

Fig. 1 Proposed doubleconvpool structure

1.3 3D-CNN高光谱遥感影像特征提取

3D-CNN高光谱遥感影像特征提取模型采用端到端深度学习框架。所有样本数据集分成训练集、验证集和测试集3组。数据集包含$N$个有标签的像素，如$\mathit{\boldsymbol{X}}_{1}、\mathit{\boldsymbol{X}}_{2}、\mathit{\boldsymbol{X}}_{3}$，分别对应经过one-hot编码后的$\mathit{\boldsymbol{Y}}_{1}、\mathit{\boldsymbol{Y}}_{2}、\mathit{\boldsymbol{Y}}_{3}$标签向量。为了充分利用高光谱影像提供的光谱和空间信息，模型使用来自原始数据中尺寸为$M×N×L$的相邻像素块作为输入，其中$M×N$是空间上的采样大小，$L$是输入图像的光谱维度。

在确定深度模型结构和配置超参数之后，使用训练集$\mathit{\boldsymbol{X}}_{1}$和标签数据$\mathit{\boldsymbol{Y}}_{1}$对模型进行训练。验证集$\mathit{\boldsymbol{X}}_{2}$通过测试在训练阶段产生的模型的分类性能来监视训练过程，从验证过程中选择分类精度最高的网络，并保留相应的权重参数。最后$\mathit{\boldsymbol{X}}_{3}$通过计算分类指标来评估已训练模型的分类能力。

以Indian Pines数据集为例，本文提出的3D-CNN模型如图 2所示，包含3个双卷积池化结构、1个全连接层和softmax分类器。模型选择11×11×200像素作为输入像素块尺寸，考虑到样本数据集Indian Pines的尺寸是145×145×200像素，对原始输入图像进行边缘填充，变成155×155×200像素，以充分提取边缘特征。在双卷积池化结构中，经实验后，选择3×3×3的小卷积核，考虑到高光谱影像光谱维度存在信息冗余的特性，在网络的第1层卷积上采用步长卷积来降维，在之后的双卷积池化结构层中使用标准的步长为1的卷积进行特征提取，使用池化操作来降维。在卷积核数目设置上，第1个双卷积池化结构使用16通道，第2个结构使用32通道，第3个结构使用64通道，即双卷积结构的卷积核数目下一层是上一层的2倍，这也是CNN结构设计中普遍采用的设计比率^[24-25]。输入数据与卷积层中的3D核进行卷积操作，输出结果经过BN层和激活函数。最后设置池化层进行下采样，池化的尺寸是2×2×2。在3个双卷积池化层之后，将输出结果输入到全连接层中，能够有效地将分类信息数据传递给分类器。基于双卷积池化结构的3D-CNN网络采用标准的反向传播算法进行训练，使用softmax损失作为训练分类器的损失函数。表 1为该网络模型的图像尺寸和参数变化情况, 选择相邻像素块大小为11×11×200像素，第1层卷积光谱维度采样步长为5，3个双卷积池化结构。

图 2 Indian Pines数据集上的3D-CNN模型结构图

Fig. 2 3D-CNN model structure on Indian Pines dataset

表 1 Indian Pines数据集上，3D-CNN模型的图像尺寸和参数变化
Table 1 Image sizes and parameter quantities of 3D-CNN model on Indian Pines dataset

下载CSV

层(组件类型)	输出尺寸	参数/个
input_1 (InputLayer)	none，11，11，200，1	0
conv3d_1 (Conv3D)	none，9，9，40，16	448
batch_normalization_1(Batch Normalization)	none，9，9，40，16	64
activation_1(Activation)	none，9，9，40，16	0
conv3d_2 (Conv3D)	none，9，9，40，16	6 928
batch_normalization_2(Batch Normalization)	none，9，9，40，16	64
activation_2 (Activation)	none，9，9，40，16	0
max_pooling3d_1 (MaxPooling3D)	none，4，4，20，16	0
conv3d_3 (Conv3D)	none，4，4，20，32	13 856
batch_normalization_3(Batch Normalization)	none，4，4，20，32	128
activation_3 (Activation)	none，4，4，20，32	0
conv3d_4 (Conv3D)	none，4，4，20，32	27 680
batch_normalization_4(Batch Normalization)	none，4，4，20，32	128
activation_4 (Activation)	none，4，4，20，32	0
max_pooling3d_2 (MaxPooling3D)	none，2，2，10，32	0
conv3d_5 (Conv3D)	none，2，2，10，64	55 360
batch_normalization_5(Batch Normalization)	none，2，2，10，64	256
activation_5 (Activation)	none，2，2，10，64	0
conv3d_6 (Conv3D)	none，2，2，10，64	110 656
batch_normalization_6(Batch Normalization)	none，2，2，10，64	256
activation_6 (Activation)	none，2，2，10，64	0
max_pooling3d_3 (MaxPooling3D)	none，1，1，5，64	0
flatten_1 (Flatten)	none，320	0
dense_1 (Dense)	none，128	41 088
batch_normalization_7(Batch Normalization)	none，128	512
activation_7 (Activation)	none，128	0
dropout_1 (Dropout)	none，128	0
dense_2 (Dense)	none，16	2 064
模型总参数/个		259 488
需训练参数/个		258 784
不需训练参数/个		704
注：none为batch_size, 是模型的一个超参数，实验中取值为16。

由表 1可知，模型总参数共259 488个，其中需训练参数258 784个，相较于现有的算法^[17]，有效控制了参数量，在一定程度上避免了因为小样本高光谱影像数据集的模型训练总参数量过大导致的过拟合现象, 在模型中应用正则化策略，包括激活函数、Dropout和L2正则。本文模型选择了在高光谱影像小数据集上表现出色的Relu函数。在全连接层后使用Dropout避免过拟合，Dropout将一些隐藏神经元的输出设为0，意味着丢弃的神经元在正向传播中不起作用，并且也不作用于反向传播过程。在不同的训练阶段，深度CNN通过随机丢弃神经元形成不同的网络，避免复杂的联合适应形成过拟合^[25]，本文设Dropout=0.5。在3D卷积中，本文对卷积核采用了L2正则。L2正则倾向于使模型更多考虑所有特征在最终分类上的权重，减少单一特征对分类过大的依赖，增强网络在不同数据集上针对不同数据特征的泛化性。

2 实验数据集

为了评估3D-CNN模型效果，引入印第安纳松树林(Indian Pines)测试区图和意大利帕维亚大学(Pavia University)城区图两个具有代表性的高光谱影像数据集^[16]。分类指标采用总体分类精度(OA)、平均分类精度(AA)和Kappa系数。

Indian Pines包括草地、建筑和作物等在内的16类地物，数据图像的大小为145×145像素，空间分辨率是20 m，在0.4~2.5 μm的波长范围内有220个波段。去除其中20个水汽吸收和低信噪比波段，保留剩下的200个波段进行实验。Pavia University数据集包括公路、树木和屋顶在内的9种地物，图像大小为610×340像素，空间分辨率为1.3 m，在0.43~0.86 μm波长范围内有115个波段，去除其中12个包含强噪声和水汽吸收的波段，保留剩下的103个波段进行实验。

在Indian Pines数据集中，训练集、验证集和测试集的比例为2 : 1 : 7；在Pavia University数据集中，训练集、验证集和测试集的比例为1 : 1 : 8。对每类地物进行随机打乱操作，以保证样本的随机分布，两个高光谱影像数据集的输入数据均被标准化为具有单位方差的值，以减轻模型的计算量，加速模型训练。

3 实验参数讨论

本文设计了基于双卷积池化结构的3D-CNN高光谱遥感影像分类模型，配置训练参数，通过反向传播损失函数的梯度来更新3D卷积的权重参数。重点分析控制训练过程和影响模型分类效果的6个因素，即双卷积池化结构的数目、正则化策略、高光谱首层卷积的光谱采样步长、卷积核大小、相邻像素块大小和学习率的设置。训练集较小，使用batch_size为16，采用RMSprop优化器优化损失函数。在训练过程中，保留在验证集上分类表现最高的最优模型，通过最优模型对测试集进行评估。实验硬件平台为1台CPU采用Intel i5-8500k，GPU采用GTX1080Ti，内存16 GB的台式机。训练和测试时间是反映模型训练效率和所需计算力比较直接的指标，与文献[17]相比，本文模型在实现高分类精度的同时，大量节约计算力，体现了模型的高效。

3.1 双卷积池化结构的数目对模型分类精度的影响

模型最终在Indian Pines数据集上采用3个卷积池化结构，在Pavia University数据集上采用2个双卷积池化结构。Pavia University数据集相较于Indian Pines数据集数量更大，分类种类较少，数据噪音相对较小，模型分类压力较小，使用2个双卷积池化结构就可以达到良好的分类精度。图 3是双卷积池化结构的数目在Indian Pines和Pavia University两个数据集上的总体精度变化。可见，在Pavia University数据集上，双卷积池化在浅层呈现了一定的振荡趋势，但在2层时分类精度最高。在Indian Pines数据集上表现相对明显，在2层时有一个明显的分类精度下降的趋势，但是在3层时分类精度最高，在3层之后分类精度呈现下降趋势。

图 3 双卷积池化结构的数目在两个数据集上的总体精度变化图

Fig. 3 The overall accuracy of the doubleconvpool structure on two datasets

3.2 正则化策略对模型分类精度的影响

有限训练样本往往会导致深度模型产生过拟合现象，为避免这种情况发生，通过在模型中使用BN层和Dropout层，以及对卷积核使用L2正则，作为训练过程的正则化方法。表 2为评估本文模型在无正则化方法、单设置BN层、单设置Dropout层以及同时存在BN层和Dropout层情况下，在Indian Pines数据集和Pavia University数据集上的表现。

表 2 不同正则化方法在两个数据集上的总体精度
Table 2 Overall accuracy of different regularization methods in two datasets

下载CSV

/ %
模型	Indian Pines	Pavia University
none	98.32	99.60
BN	99.11	99.50
Dropout	99.29	99.51
BN+Dropout	99.16	99.74

在Indian Pines数据集上，缺乏BN和Dropout操作时，精度最低；只使用Dropout时，分类精度最好。究其原因，一方面可能是由于本文所提的3D-CNN结构不够深，过拟合几率较小；另一方面，可能是模型采用了相对较多的正则化策略。在Pavia University数据集上，同时使用BN和Dropout两种正则化策略的模型总体精度最高。

图 4和图 5分别是模型在Indian Pines数据集和Pavia University数据集上，采用不同的正则化手段时，训练和验证的精度及损失变化情况。从图 4可以看出，同时使用了BN和Dropout两种正则化手段的网络在训练和验证时，精度和损失的变化更加平稳，有助于模型收敛。从图 5可知，使用BN和Dropout两种正则化的网络在训练和验证时的损失收敛更加稳定。

图 4 Indian Pines数据集上，不同正则化策略在训练和验证时的精度和损失变化图

((a)without regularization methods；(b)only BN；(c)only Dropout；(d)BN and Dropout)

Fig. 4 Accuracy and loss variation on training and validation sets for different regularizations on Indian Pines dataset

图 5 Pavia University数据集上，不同正则化策略在训练和验证时的精度和损失变化

((a)without regularization methods；(b)only BN；(c)only Dropout；(d)BN and Dropout)

Fig. 5 Accuracy and loss variation on training and validation sets for different regularizations on Pavia University dataset

3.3 高光谱采样步长对模型分类精度的影响

高光谱影像在光谱维度上存在一定的信息冗余，这也为很多使用PCA和2D-CNN的方法提供了理论基础。在3D-CNN结构中，通过在空间和光谱维度上同时采样获取光谱信息。在双卷积池化结构中，3D卷积核采用小卷积尺寸3×3×3，步长采用1，即保证卷积核滑过每一个像素。但是在光谱上采用步长为1的采样距离，会极大地增长参数量和计算量。由于光谱冗余的特性，本文在第1个卷积核上讨论了不同的光谱采样步长下，实现精度、训练速度和计算量之间的平衡关系。

表 3和表 4分别是不同光谱步长下，在Indian Pines数据集和Pavia University数据集上，模型的参数量、训练时间、测试时间和总体精度对比。由表 3可知，在Indian Pines数据集上，第1层卷积光谱维步长采用5，参数量相较于步长采用1时减少非常明显，训练时间和测试时间也有非常显著的下降。虽然总体精度存在一定损失，但对于要求实时分类的场景来说有很重要的意义。在步长采用10和11时，总体精度损失较大，而参数量、训练时间和测试时间减少的幅度并不大。因此首层步长采用5来降维是一个很好的参考选择。由表 4可知，在Pavia University数据集中，第1层卷积光谱维步长采用5也是不错的选择。

表 3 不同光谱步长下, 模型在Indian Pines数据集上的参数量、训练时间、测试时间和总体精度对比
Table 3 Comparison of parameter quantities, training time, test time and overall accuracy of models in different spectral strided convolution on Indian Pines dataset

下载CSV

光谱步长	参数量/个	训练时间/s	测试时间/s	总体精度/%
1	415 136	784.78	4.32	99.40
5	259 488	475.41	2.83	99.16
10	234 912	455.05	2.67	97.82
15	226 720	439.37	2.62	98.06

表 4 不同光谱步长下, 模型在Pavia University数据集上的参数量、训练时间、测试时间和总体精度对比
Table 4 Comparison of parameter quantities, training time, test time and overall accuracy of models in different spectral strided convolution on Pavia University dataset

下载CSV

光谱步长	参数量/个	训练时间/s	测试时间/s	总体精度/%
1	160 697	1 161.13	25.01	99.81
5	133 017	747.99	8.08	99.74
10	83 865	722.51	7.72	99.61
15	67 481	696.76	7.45	99.61

3.4 卷积核大小对模型分类精度的影响

传统模型设计中，卷积核尺寸越小越好，且一般采用奇数设计。因为采用偶数的卷积核尺寸会产生特征采样的不对称性，导致边缘信息未被卷积核扫描，得不到充分利用，奇数的卷积核在空间上关于中心像素对称，在左和右的方向均能得到像素信息。表 5和表 6分别是在Indian Pines数据集和Pavia University数据集上，不同卷积核尺寸的参数量、训练和测试时间及总体精度的对比。

表 5 Indian Pines数据集上，不同卷积核尺寸的参数量、训练和测试时间及总体精度对比
Table 5 Comparison of parameter quantities, training and test times, and overall accuracy for different convolution kernel sizes on Indian Pines dataset

下载CSV

卷积核尺寸	参数量/个	训练时间/s	测试时间/s	总体精度/%
3×3×3	259 488	475.41	2.83	99.16
5×5×5	1 079 744	510.71	2.98	98.71
7×7×7	2 805 088	613.44	3.36	96.65

表 6 Pavia University数据集上，不同卷积核尺寸的参数量、训练和测试时间及总体精度对比
Table 6 Comparison of parameter quantities, training and test times, and overall accuracy for different convolution kernel sizes on Pavia University dataset

下载CSV

卷积核尺寸	参数量/个	训练时间/s	测试时间/s	总体精度/%
3×3×3	133 017	747.99	8.08	99.74
5×5×5	248 761	750.46	8.38	99.41
7×7×7	642 905	869.24	9.22	99.36

由表 5和表 6可知，实验测试了3个版本3×3×3、5×5×5、7×7×7的卷积核，后两者参数上升非常明显，给内存和GPU带来了很大的计算压力，在训练和测试时间上也较3×3×3卷积有所增加，总体精度下降显著，小卷积核尺寸在图像特征提取上有明显优势。Indian Pines数据集上有3个双卷积池化结构，Pavia University数据集上有2个双卷积池化结构。实验结果表明，伴随双卷积池化结构的增加，小卷积的效果会更显著。3×3×3卷积采用了光谱步长为5，相邻像素块大小为11×11像素的版本。

3.5 相邻像素块大小对模型分类精度的影响

网络对输入的145×145×103像素图像进行边缘填充，变成155×155×103像素图像(以Indian Pines为例)。在155×155×103像素图像上依次选取一个$M×N×L$的相邻像素块，$M×N$是空间采样大小，$L$是全维度光谱。原始图像过大，不利于卷积的充分特征提取，运行速度变慢，短期占用内存上升，对硬件平台要求很高，因此采用输入相邻像素块的处理。相邻像素块的大小是一个重要的超参数，但相邻像素块范围不能过小，否则容易造成卷积核特征提取的感受野不充分，在局部得不到好的效果。

表 7和表 8分别是Indian Pines和Pavia University数据集上，不同相邻像素块大小的训练和测试时间及总体精度对比。可以看出，在Indian Pines数据集上，像素块大小从9到19，精度有一个明显的提升，但是训练和测试时间也随之增长，这一点在Pavia University数据集上表现也很明显，随着像素块范围的增大，总体精度的增长越来越小，有明显的阈值效应。

表 7 Indian Pines数据集上不同相邻像素块大小的训练和测试时间及总体精度对比
Table 7 Comparison of training and test time and overall accuracy for different neighboring pixel block sizes on Indian Pines dataset

下载CSV

相邻像素块大小($M=N$)/像素	训练时间/s	测试时间/s	总体精度/%
9	401.78	2.06	98.42
11	475.41	2.83	99.16
13	561.63	3.63	99.32
15	654.36	4.47	99.62
17	755.16	5.22	99.64
19	883.52	6.41	99.65

表 8 Pavia University数据集上不同相邻像素块大小的训练和测试时间及总体精度对比
Table 8 Comparison of training and test times and overall accuracy for different neighboring pixel block sizes on Pavia University dataset

下载CSV

相邻像素块大小($M=N$)/像素	训练时间/s	测试时间/s	总体精度/%
9	684.83	6.77	99.24
11	747.99	8.08	99.74
13	866.54	10.38	99.63
15	1 000.74	12.60	99.75
17	1 153.53	15.14	99.82
19	1 351.64	48.50	99.80

3.6 学习率的设定对模型分类精度的影响

调整梯度下降算法的学习率可以改变模型网络权重参数的更新幅度。为了使梯度下降有较好的性能，需要将学习率的值设定在一定范围内。学习率决定了参数移动到最优值的速度快慢。如果学习率过大，可能会越过最优值；如果学习率过小，优化的效率可能过低，算法长时间无法收敛，直接影响算法的性能。实验测试表明，Indian Pines数据集和Pavia University数据集的最优学习率都是0.000 3。

4 实验结果及分析

4.1 Indian Pines数据集实验结果

针对Indian Pines数据集，3D-CNN模型的输入数据尺寸为19×19×200像素，3×3×3卷积光谱维步长为5，使用了3个双卷积池化结构、1个全连接层和softmax分类器，学习率设为0.000 3。为了评估该模型性能，实验结果分别与SVM^[26]、SAE^[27]、CNN^[28]和SSRN模型^[17]进行对比分析(表 9)。其中，SVM是传统处理高光谱最有效的方法，SAE是无监督学习方法，处理小样本高光谱数据非常有利，而CNN方法使用的是目前最典型的3D-CNN，区别于本文设计的双卷积池化结构，SSRN则是利用残差结构的3D-CNN特征提取模型。由表 9可知，基于双卷积池化结构的3D-CNN在OA、AA和Kappa系数上均高于4种对比方法，总体分类精度达到了99.65 %，其中SVM和SSRN在第9类燕麦上的分类精度均为0，燕麦的训练集极小，可见SSRN方法在分类上存在不稳定的特点，对样本的类别没有识别出来。图 6为5种方法的分类结果。

表 9 不同方法在Indian Pines数据集上的分类精度对比
Table 9 Comparison of classification accuracy of different methods on Indian Pines dataset

下载CSV

/ %
类别	SVM	SAE	CNN	SSRN	本文方法
1	96.78	81.82	100	100	100
2	78.74	82.16	97.27	99.90	99.51
3	82.26	77.54	98.00	99.83	100
4	99.03	68.11	92.81	100	99.40
5	93.75	94.36	99.25	99.71	99.71
6	85.96	94.45	99.52	99.66	100
7	40.00	94.70	97.58	100	100
8	91.80	94.36	99.0	100	99.70
9	0	82.56	96.95	0	100
10	86.00	81.28	95.38	100	99.56
11	70.94	84.47	97.72	99.53	99.94
12	74.73	83.77	97.13	99.52	98.57
13	99.04	96.42	99.65	100	100
14	94.29	92.27	97.95	98.76	99.55
15	85.11	80.63	92.30	100	100
16	96.78	81.82	100	98.53	95.59
OA	81.67	85.47	97.41	99.62	99.65
AA	79.84	86.31	97.39	93.46	99.47
Kappa	78.76	83.42	97.05	99.57	99.60

图 6 模型在Indian Pines数据集上的分类结果

((a)false color image；(b)ground-truth labels；(c)SVM；(d)SAE；(e)CNN；(f)SSRN；(g)our method)

Fig. 6 Classification results of models on Indian Pines dataset

在Indian Pines数据集上，将本文基于双卷积池化结构的3D-CNN与传统的堆叠3D-CNN及SSRN方法，从参数量、训练和测试时间及总体精度等方面进行对比(表 10)。结果显示，传统堆叠CNN结构的参数量太高，所需计算力和内存要求均远高于SSRN和本文方法。SSRN结构设计非常繁琐，使用连续的空间和光谱残差模块分别提取空间和光谱特征，结构设计成本高，对不同的高光谱遥感影像数据集的可扩展性差。本文基于双卷积池化结构的3D- CNN方法，结构设计简单，面对不同的数据集可以使用不同数目的结构来增加模型的可扩展性。图 7为3个模型在训练和验证时的精度和损失变化。CNN的训练损失初始值很高，本文方法的损失变化虽然有一定的波动，但最终趋于稳定。

表 10 Indian Pines数据集上，本文方法与CNN和SSRN的参数量、训练和测试时间及总体精度对比
Table 10 Comparison of CNN, SSRN and our method about parameter quantities, training and test times, and overall accuracy on Indian Pines dataset

下载CSV

方法	参数量/个	训练时间/s	测试时间/s	总体精度/%
CNN	16 394 638	997.84	4.72	97.41
SSRN	346 784	930.98	4.84	99.62
本文	259 488	883.52	6.42	99.65

图 7 Indian Pines数据集上，CNN、SSRN和本文方法在训练和验证时的精度和损失变化

Fig. 7 Loss and accuracy changes of CNN, SSRN and our method in training and validation sets on Indian Pines dataset((a)CNN；(b)SSRN；(c)our method)

4.2 Pavia University数据集分类结果

在Pavia University数据集上，3D-CNN模型的输入尺寸是17×17×103像素，3×3×3卷积光谱维步长为5，使用2个双卷积池化结构、1个全连接层和softmax分类器，学习率是0.000 3。

由于Pavia University数据集相较于Indian Pines数据而言相对较大，特征提取充分，实验中使用更多双卷积池化结构对分类精度的提升并不明显，因此采用2个双卷积池化结构。实验选取了SVM、SAE、CNN和SSRN 4种方法进行对比(表 11)。可知，在数据集充分的情况下，双卷积池化结构的效果更好，在Pavia University数据集上总体精度达到了99.82 %。图 8为各种方法的分类结果。

表 11 不同方法在Pavia University数据集上的分类精度对比
Table 11 Comparison of classification accuracy of different methods on Pavia University dataset

下载CSV

/ %
类别	SVM	SAE	CNN	SSRN	本文方法
1	87.24	94.59	98.98	99.96	99.64
2	89.93	96.44	99.45	99.99	99.99
3	86.48	84.57	96.04	99.64	99.94
4	99.95	97.37	99.58	99.42	99.83
5	95.78	99.60	99.39	100	99.81
6	97.69	93.39	99.70	99.18	99.98
7	95.44	88.57	97.18	99.82	97.97
8	84.40	85.66	95.73	99.76	99.56
9	100	99.88	99.56	100	100
OA	90.58	94.25	98.85	99.79	99.82
AA	92.99	93.34	98.40	99.75	99.76
Kappa	87.21	92.35	98.47	99.87	99.64

图 8 模型在Pavia University数据集上的分类结果

((a)false color image；(b)ground-truth labels；(c)SVM；(d)SAE；(e)CNN；(f)SSRN；(g)our method)

Fig. 8 Classification results of models in Pavia University dataset

在Pavia University数据集上，将本文基于双卷积池化结构的3D-CNN方法与传统的堆叠3D-CNN及SSRN方法，从参数量、训练和测试时间及总体精度等方面进行对比(表 12)。结果显示，本文方法总体精度最高。3个模型在训练和验证时的精度和损失变化如图 9所示，SSRN的波动较另外两个稍微剧烈，在验证集都有不错的精度。

表 12 Pavia University数据集上，本文方法与CNN和SSRN的参数量、训练和测试时间及总体精度对比
Table 12 Comparison of CNN, SSRN and our method about parameter quantities, training and test times, and overall accuracy on Pavia University dataset

下载CSV

方法	参数量/个	训练时间/s	测试时间/s	总体精度/%
CNN	16 394 648	2 221.31	20.07	98.85
SSRN	199 453	1 518.86	14.41	99.79
本文	235 147	1 153.53	15.14	99.82

图 9 Pavia University数据集上，CNN、SSRN和本文方法在训练和验证时的精度和损失变化

Fig. 9 Loss and accuracy changes of CNN, SSRN and our method in training and validation sets on Pavia University dataset((a)CNN；(b)SSRN；(c)our method)

5 结论

采用3D-CNN作为基础结构，设计了基于双卷积池化结构的高光谱遥感影像分类方法。该网络结构既考虑高光谱遥感影像标签数据缺乏的问题，也考虑高光谱影像高维特性和模型深度之间的平衡问题，充分利用空谱联合提供的语义信息。双卷积池化结构可以根据数据集特点进行组合复用，模型采用端到端的方式进行训练，不需要做复杂的预处理或后处理。实验对比了SVM、SAE以及目前主流的CNN方法，本文模型在Indian Pines和Pavia University数据集上分别取得了99.65 %和99.82 %的总体分类精度，有效提高了高光谱遥感影像地物分类精度，而且比其他深度学习模型需要更少的参数，计算效率更高。该种结构的设计会在以后高光谱影像分类中得到广泛的应用。

3D-CNN具有同时提取高光谱遥感影像的空间和光谱信息的良好特性，未来将进一步深入研究基于3D-CNN结构的高光谱遥感影像分类方法。1)从特征提取有效性的角度出发，设计深度特征提取模型，结合密集连接和残差连接等有效的特征聚合方式，使3D-CNN结构更深，加速信息流动；2)根据高光谱影像特点，针对地物稀疏但高维等特性，引入注意力机制，加强特征图之间的关联，从模型深度和特征有效性两个方面，进一步提高深度学习，尤其是卷积神经网络在遥感影像分类领域的应用。

参考文献

[1] Du P J, Xia J S, Xue Z H, et al. Review of hyperspectral remote sensing image classification[J]. Journal of Remote Sensing, 2016, 20(2): 236–256. [杜培军, 夏俊士, 薛朝辉, 等. 高光谱遥感影像分类研究进展[J]. 遥感学报, 2016, 20(2): 236–256. ] [DOI:10.11834/jrs.20165022]

[2] Zhang B. Intelligent remote sensing satellite system[J]. Journal of Remote Sensing, 2011, 15(3): 415–431. [张兵. 智能遥感卫星系统(英文)[J]. 遥感学报, 2011, 15(3): 415–431. ] [DOI:10.11834/jrs.20110354]

[3] Chang C I. Hyperspectral Imaging:Techniques for Spectral Detection and Classification[M]. New York: Springer, 2003.

[4] Zhu J Z, Shi Q, Chen F E, et al. Research status and development trends of remote sensing big data[J]. Journal of Image and Graphics, 2016, 21(11): 1425–1439. [朱建章, 石强, 陈凤娥, 等. 遥感大数据研究现状与发展趋势[J]. 中国图象图形学报, 2016, 21(11): 1425–1439. ] [DOI:10.11834/jig.20161102]

[5] Licciardi G, Marpu P R, Chanussot J, et al. Linear versus nonlinear PCA for the classification of hyperspectral data based on the extended morphological profiles[J]. IEEE Geoscience and Remote Sensing Letters, 2012, 9(3): 447–451. [DOI:10.1109/LGRS.2011.2172185]

[6] Villa A, Benediktsson J A, Chanussot J, et al. Hyperspectral image classification with independent component discriminant analysis[J]. IEEE Transactions on Geoscience and Remote Sensing, 2011, 49(12): 4865–4876. [DOI:10.1109/TGRS.2011.2153861]

[7] Lunga D, Prasad S, Crawford M M, et al. Manifold-learning-based feature extraction for classification of hyperspectral data:a review of advances in manifold learning[J]. IEEE Signal Processing Magazine, 2014, 31(1): 55–66. [DOI:10.1109/MSP.2013.2279894]

[8] Fauvel M, Tarabalka Y, Benediktsson J A, et al. Advances in spectral-spatial classification of hyperspectral images[J]. Proceedings of the IEEE, 2013, 101(3): 652–675. [DOI:10.1109/JPROC.2012.2197589]

[9] Du P J, Xue Z H, Li J, et al. Learning discriminative sparse representations for hyperspectral image classification[J]. IEEE Journal of Selected Topics in Signal Processing, 2015, 9(6): 1089–1104. [DOI:10.1109/JSTSP.2015.2423260]

[10] Song B Q, Li J, Dalla Mura M, et al. Remotely sensed image classification using sparse representations of morphological attribute profiles[J]. IEEE Transactions on Geoscience and Remote Sensing, 2014, 52(8): 5122–5136. [DOI:10.1109/TGRS.2013.2286953]

[11] Li J, Bioucas-Dias J M, Plaza A. Semisupervised hyperspectral image segmentation using multinomial logistic regression with active learning[J]. IEEE Transactions on Geoscience and Remote Sensing, 2010, 48(11): 4085–4098. [DOI:10.1109/TGRS.2010.2060550]

[12] Zhang L P, Zhang L F, Du B. Deep learning for remote sensing data:a technical tutorial on the state of the art[J]. IEEE Geoscience and Remote Sensing Magazine, 2016, 4(2): 22–40. [DOI:10.1109/MGRS.2016.2540798]

[13] Chen Y S, Zhao X, Jia X P. Spectral-spatial classification of hyperspectral data based on deep belief network[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2015, 8(6): 2381–2392. [DOI:10.1109/JSTARS.2015.2388577]

[14] Zhao W Z, Du S H. Spectral-spatial feature extraction for hyperspectral image classification:a dimension reduction and deep learning approach[J]. IEEE Transactions on Geoscience and Remote Sensing, 2016, 54(8): 4544–4554. [DOI:10.1109/TGRS.2016.2543748]

[15] Yue J, Zhao W Z, Mao S J, et al. Spectral-spatial classification of hyperspectral images using deep convolutional neural networks[J]. Remote Sensing Letters, 2015, 6(6): 468–477. [DOI:10.1080/2150704X.2015.1047045]

[16] Makantasis K, Karantzalos K, Doulamis A, et al. Deep supervised learning for hyperspectral data classification through convolutional neural networks[C]//2015 IEEE International Geoscience and Remote Sensing Symposium. Milan, Italy: IEEE, 2015: 4959-4962.[DOI: 10.1109/IGARSS.2015.7326945]

[17] Zhong Z L, Li J, Luo Z M, et al. Spectral-spatial residual network for hyperspectral image classification:a 3-D deep learning framework[J]. IEEE Transactions on Geoscience and Remote Sensing, 2018, 56(2): 847–858. [DOI:10.1109/TGRS.2017.2755542]

[18] Ioffe S, Szegedy C. Batch normalization: accelerating deep network training by reducing internal covariate shift[C]//Proceedings of the 32nd International Conference on International Conference on Machine Learning. Lille, France: JMLR.org, 2015: 448-456.

[19] Srivastava N, Hinton G, Krizhevsky A, et al. Dropout:a simple way to prevent neural networks from overfitting[J]. The Journal of Machine Learning Research, 2014, 15(1): 1929–1958.

[20] Xu B, Wang N Y, Chen T Q, et al. Empirical evaluation of rectified activations in convolutional network[EB/OL].[2018-06-20]. https://arxiv.org/pdf/1505.00853.pdf.

[21] Hara K, Kataoka H, Satoh Y. Can spatiotemporal 3D CNNs retrace the history of 2D CNNs and ImageNet?[EB/OL].[2018-06-20]. https://arxiv.org/pdf/1711.09577.pdf.

[22] Tran D, Bourdev L, Fergus R, et al. Learning spatiotemporal features with 3D convolutional networks[C]//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE, 2015: 4489-4497.[DOI: 10.1109/ICCV.2015.510]

[23] Boureau Y L, Ponce J, LeCun Y. A theoretical analysis of feature pooling in visual recognition[C]//Proceedings of the 27th International Conference on Machine Learning. Haifa, Israel: The International Machine Learning Society, 2010: 111-118.

[24] Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks[C]//Proceedings of the 25th International Conference on Neural Information Processing Systems. Lake Tahoe, Nevada: Curran Associates Inc., 2012: 1097-1105.

[25] He K M, Zhang X Y, Ren S Q, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[C]//Proceedings of the 13th European Conference on Computer Vision- ECCV 2014. Zurich, Switzerland: Springer, 2014: 346-361.[DOI: 10.1007/978-3-319-10578-9_23]

[26] Tarabalka Y, Fauvel M, Chanussot J, et al. SVM- and MRF-based method for accurate classification of hyperspectral images[J]. IEEE Geoscience and Remote Sensing Letters, 2010, 7(4): 736–740. [DOI:10.1109/LGRS.2010.2047711]

[27] Chen Y S, Lin Z H, Zhao X, et al. Deep learning-based classification of hyperspectral data[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2014, 7(6): 2094–2107. [DOI:10.1109/JSTARS.2014.2329330]

[28] Chen Y S, Jiang H L, Li C Y, et al. Deep feature extraction and classification of hyperspectral images based on convolutional neural networks[J]. IEEE Transactions on Geoscience and Remote Sensing, 2016, 54(10): 6232–6251. [DOI:10.1109/TGRS.2016.2584107]