发布时间: 2021-08-16
摘要点击次数:
全文下载次数:
DOI: 10.11834/jig.210146
2021 | Volume 26 | Number 8

高光谱图像分类

3D卷积自编码器高光谱图像分类模型

石延新¹, 何进荣¹, 李照奎², 曾志高³

1. 延安大学数学与计算机科学学院, 延安 716000;

2. 沈阳航空航天大学计算机学院, 沈阳 110136;

3. 湖南工业大学计算机与通信学院, 株洲 412000

收稿日期: 2021-03-16; 修回日期: 2021-05-24; 预印本日期: 2021-05-31

基金项目: 国家自然科学基金项目（61902339）；陕西省自然科学基础研究计划项目（2021JM-418）；陕西省能源大数据智能处理省市共建重点实验室项目（IPBED14）；延安市科技专项资助项目（2019-01，2019-13）；谷歌支持的教育部产学合作协同育人项目学生项目（202002107065）；科技创新2030-"新一代人工智能"重大项目（2018AAA0100400）；湖南省自然科学基金项目（2018JJ2098）；延安大学2020年省级创新创业训练计划项目（S202010719116，S202010719068）

作者简介: 石延新, 2000年生, 男, 本科生, 主要研究方向为深度学习与遥感图像处理。E-mail: shiyanxin2000@163.com
何进荣, 通信作者, 男, 副教授, 主要研究方向为机器学习和计算机视觉。E-mail: hejinrong@yau.edu.cn
李照奎, 男, 教授, 主要研究方向为人工智能与模式识别、计算机视觉与遥感图像处理。E-mail: lzk@sau.edu.cn
曾志高, 男, 教授, 主要研究方向为计算机视觉、智能计算与模式识别。E-mail: zzgzzg99@163.com
*通信作者: 何进荣 hejinrong@yau.edu.cn

中图法分类号: TP237

文献标识码: A

文章编号: 1006-8961(2021)08-2021-16

摘要

目的高光谱图像分类是遥感领域的基础问题，高光谱图像同时包含丰富的光谱信息和空间信息，传统模型难以充分利用两种信息之间的关联性，而以卷积神经网络为主的有监督深度学习模型需要大量标注数据，但标注数据难度大且成本高。针对现有模型的不足，本文提出了一种无监督范式下的高光谱图像空谱融合方法，建立了3D卷积自编码器（3D convolutional auto-encoder，3D-CAE）高光谱图像分类模型。方法 3D卷积自编码器由编码器、解码器和分类器构成。将高光谱数据预处理后，输入到编码器中进行无监督特征提取，得到一组特征图。编码器的网络结构为3个卷积块构成的3D卷积神经网络，卷积块中加入批归一化技术防止过拟合。解码器为逆向的编码器，将提取到的特征图重构为原始数据，用均方误差函数作为损失函数判断重构误差并使用Adam算法进行参数优化。分类器由3层全连接层组成，用于判别编码器提取到的特征。以3D-CNN（three dimensional convolutional neural network）为自编码器的主干网络可以充分利用高光谱图像的空间信息和光谱信息，做到空谱融合。以端到端的方式对模型进行训练可以省去复杂的特征工程和数据预处理，模型的鲁棒性和稳定性更强。结果在Indian Pines、Salinas、Pavia University和Botswana等4个数据集上与7种传统单特征方法及深度学习方法进行了比较，本文方法均取得最优结果，总体分类精度分别为0.948 7、0.986 6、0.986 2和0.964 9。对比实验结果表明了空谱融合和无监督学习对于高光谱遥感图像分类的有效性。结论本文模型充分利用了高光谱图像的光谱特征和空间特征，可以做到无监督特征提取，无需大量标注数据的同时分类精度高，是一种有效的高光谱图像分类方法。

关键词

遥感图像分类; 空谱特征融合; 3D-CNN; 自编码器; 卷积神经网络(CNN); 深度学习

Hyperspectral image classification model based on 3D convolutional auto-encoder

Shi Yanxin¹, He Jinrong¹, Li Zhaokui², Zeng Zhigao³

1. College of Mathematics and Computer Science, Yan'an University, Yan'an 716000, China;

2. School of Computer, Shenyang Aerospace University, Shenyang 110136, China;

3. College of Computer and Communication, Hunan University of Technology, Zhuzhou 412000, China

Supported by: National Natural Science Foundation of China(61902339)

Abstract

Objective Hyperspectral image classification is a basic problem in the field of remote sensing, and it has been one of the research hotspots of numerous scholars. Hyperspectral images contain rich spectral and spatial information, and the classification accuracy of remote sensing images can be improved by using spectral and spatial features. Early traditional models, such as support vector machine and decision trees, could not fully utilize both information. With the development of deep learning technology, an increasing number of scholars use convolutional neural network as a model to extract the features of hyperspectral images. However, two dimensional convolutional neural network(2D-CNN) can only extract the spatial features of hyperspectral images and cannot fully use the band information of remote sensing data. 3D-CNN can efficiently simultaneously extract spectral and spatial features. The recurrent neural network cannot complete the task of hyperspectral image classification because of the difficulty of finding the optimal sequence length and over-fitting. At present, scholars focus on supervised deep learning model, which needs a substantial amount of labeled data to be effectively trained. However, labeled data are difficult and costly in reality. Therefore, the model must have good performance in the unknown world. An unsupervised normal form classification method for spatial-spectral fusion of hyperspectral images is proposed to address the problem that the existing models cannot fully use the spatial and spectral information and require a large amount of data for training. An unsupervised hyperspectral image classification model based on 3D convolution self-encoder is also established. Method The 3D convolution auto-encoder(3D-CAE) proposed in this work is composed of an encoder, a decoder, and a classifier. The hyperspectral image is inputted into an encoder after data pre-processing for unsupervised feature extraction to produce a set of feature maps. The network structure of the encoder is a 3D convolutional neural network of three convolution blocks, each of which is made up of two convolution layers and two global max-pooling layers. Batch normalization technique is added to the convolution blocks to prevent over-fitting. The decoder is an inverted encoder, which reconstructs the extracted feature graph into original data, and uses the mean square error function as the loss function to judge the reconstruction error and optimizes the parameters with the Adam algorithm. The classifier consists of three fully connected layers and uses ReLU as the activation function of the fully connected layer to classify the features extracted by the encoder. The backbone network with 3D-CNN as auto-encoder can fully use the spatial and spectral information of hyperspectral images to achieve spatial spectral fusion. The model is also trained end to end, eliminating the need for complex feature engineering and data pre-processing, making it more robust and stable. Result The seven methods on Indian Pines, Salinas, Pavia University, and Botswana datasets achieve the best results compared with the traditional single feature and deep learning methods. The overall classification accuracies are 0.948 7, 0.986 6, 0.986 2, and 0.964 9, the average classification accuracies are 0.936 0, 0.992 4, 0.982 9, and 0.965 9, and the Kappa values are 0.941 5, 0.985 1, 0.981 7, and 0.962 0, respectively. Comparative experimental results show that the spatial-spectral fusion and unsupervised learning are effective for hyperspectral remote sensing image classification. The ablation experiment is added because 3D-CAE is composed of a self-encoder and a classifier. Under the condition of the same self-encoder, four classifiers with different structures are used for classification. The experimental results are stable, and the validity of the self-encoder is proved. Five different proportions of datasets are used to prove the generalization of 3D-CAE. The training set proportions are 5%, 8%, 10%, 15%, and 20%. The loss of the auto-encoder and the classifier on the four datasets remained stable and low, and no oscillation was observed, indicating the better generalization of 3D-CAE. Finally, we analyze and discuss the parameters of each deep learning model. 3D-CAE has less parameters and the best classification performance, which proves its high efficiency. Conclusion The 3D-CAE model proposed in this work fully uses the spectral and spatial features of hyperspectral images. This model also achieves unsupervised feature extraction without substantial pre-processing and high classification accuracy without a large amount of labeled data. Thus, this model is an effective method for hyperspectral image classification.

Key words

remote sensing image classification; spatial spectral feature fusion; 3D-CNN; auto-encoder; convolutional neural network(CNN); deep learning

0 引言

遥感技术的高速发展使得高光谱遥感成像技术发生了巨大变化，逐渐成为21世纪遥感领域研究热点之一(童庆禧等，2006；杜培军等，2016)。高光谱遥感图像是指通过高光谱成像仪获得的图像，同时含有十分丰富的空间纹理信息和高光谱波段信息，具有光谱分辨率高、光谱范围广和光谱相关性强等特点(万亚玲等，2021)。

分类是高光谱遥感图像处理和应用的主要内容之一，其最终目的是从遥感图像数据中辨识实际地物，即给图像中每一个像元赋以唯一的类别。在计算机视觉领域，这一处理过程等同于语义分割，即为图像中的每一个像素点分配一个预定的语义标签(Zhao等，2016)。因此，对于地物鉴别而言，分析光谱波段特征和空间纹理特征即可得到详细的地物特征。

早期对于高光谱遥感图像分类任务，以传统机器学习为主的光谱特征或空间特征的分类策略是主要解决办法。光谱特征提取方法包括决策树模型(Swain和Hauska，1977)、独立成分分析(Villa等，2011)以及支持向量机(support vector machine, SVM)(Melgani和Bruzzone，2004)。但仅使用光谱特征难以直接对高光谱数据进行类别辨识(Chang，2003)，无法充分利用图像空间纹理特征的同时，参与监督学习的样本有限，可能造成因特征维度上升而分类精度下降的Hughes现象。空间特征提取方法包括基于图的特征(Chen等，2011)、递归滤波(Kang等，2014)和形态稀疏表示(Song等，2014)。空间特征提取是针对特定波段上2维图像的几何特征提取，并未考虑相邻波段之间的光谱相关性。为了充分挖掘高光谱数据中包含的判别信息，进一步提升分类精度，在特征降维时需要联合高光谱数据的空间信息和光谱信息(Fauvel等，2013；Liang等，2017)。因此，针对高光谱遥感图像分类任务，空谱融合提取特征的研究方法是高光谱数据分析的关键问题(朱建章等，2016)。

随着以AlexNet为首的卷积神经网络在ImageNet大型视觉识别挑战赛上精度大幅度超越基于统计的传统计算机视觉方法，深度学习方法逐渐应用到高光谱遥感图像处理中(Krizhevsky等，2017)。与传统方法相比，深度学习方法省去手工提取特征，能够自动提取从底层纹理到高层语义的抽象特征，使得分类过程以端到端的方式进行，可以极大程度地提高含有大量未知信息的遥感图像分类的精度。目前在高光谱图像分类领域所应用的主要是卷积神经网络(convolutional neural network, CNN)、基于自编码网络的栈式自编码网络(stacked autoencoder, SAE)、深度置信网络(deep belief network, DBN)(Zhang等，2016；Chen等，2015)、循环神经网络(recurrent neural network, RNN)(Mei等，2019b)和长短时记忆网络(long short-term memory, LSTM)(Hochreiter和Schmidhuber，1997；Byeon等，2015)。

Hu等人(2015)采用一个包含5层的1维卷积神经网络(1D-CNN)，通过提取光谱特征对高光谱图像进行分类，但缺点是无法充分利用空间的上下文信息。Zhao和Du(2016)首先使用主成分分析对原始遥感图像进行降维，利用2D-CNN在降维后的图像上提取空间纹理特征，使用平衡局部判别嵌入算法提取光谱特征，将光谱特征与空间特征合并进行分类以提高分类精度。2D-CNN的局限在于需要对原始图像进行降维处理，使其与RGB图像结构相似，降维中会导致信息丢失，浪费高光谱图像中的一些特殊属性，且无法有效利用光谱信息(张号逵等，2018)。Mou等人(2017)首次提出将高光谱图像的每一个像元表示为序列，使用RNN将光谱特征变为1D序列对高光谱图像进行分类，并且提出了一个新的激活函数——PRetanh。Ruẞwurm和Körner(2017)使用LSTM网络从SENTINEL 2A观测序列中提取特征。但RNN中序列的选择不当会导致模型过拟合且无法充分利用光谱和空间信息(Paoletti等，2020)。

SAE和DBN都属于无监督方法学习遥感图像特征，对空谱特征进行提取和分类，但在训练过程中会丢失原图像的空间信息。3D-CNN可根据端到端的策略同时提取光谱特征和空间特征，但目前3D-CNN均采用有监督的训练方式(李冠东等，2019)，由于真实世界高光谱遥感图像中有标记数据较少，大量数据均未标记，因此需要基于无监督或半监督训练方式的分类模型。本文采用卷积自编码器作为基础结构，依托CNN特征表达能力强的特性，设计了基于3D-CNN的卷积自编码器(3D convolutional auto-encoder，3D-CAE)高光谱遥感图像无监督分类方法。3D-CNN采用3×3×3卷积，该模型将3D高光谱遥感数据作为输入，不需要复杂的数据预处理和后处理。在3个维度上同时采样，以端到端的无监督学习方式直接提取高光谱遥感图像的光谱特征和空间特征，实现空谱融合的同时无需大量标记数据，参数量少且模型较小，可充分发挥高光谱图像的优势，有利于提高分类精度(Romero等，2016；Lu等，2017)。同时为了防止模型过拟合，加入了BN(batch normalization)技术。

1 无监督3D卷积自编码器

1.1 3D-CNN结构

传统2D-CNN的优势是可直接在RGB三通道图像上提取图像特征用以分类、检测和分隔等其他下游任务(Hara等，2018)。为了充分利用深度学习自动提取遥感图像特征的优势，3D-CNN对3维的数据块进行特征提取，卷积核由2维变为3维，且卷积的方向从单一空间上的2维变成光谱加空间的3维，实现空谱融合。3D-CNN进行3维卷积操作的公式为

$ v_{i j}^{x y z}=k\left(\sum\limits_{m} \sum\limits_{p=0}^{P_{i}-1} \sum\limits_{q=0}^{Q_{i}-1} \sum\limits_{r=0}^{R_{i}-1} w_{i j m}^{p q r} v_{(i-1) m}^{(x+p)(y+q)(z+r)}+b_{i j}\right) $

(1)

式中，$i$表示第$i$层神经元，$j$表示第$j$个特征图，$P_{i}$和$Q_{i}$分别对应卷积核的高和宽，$R_{i}$表示卷积核沿着光谱维度维数的大小，$k$是激活函数，本文均采用ReLU函数，$f$表示当前层与前一层特征的连接个数。$w^{pqr}_{ijm}$表示在$(p, q, r)$位置上的神经元与第$m$个特征连接的权重，$b_{ij}$表示偏置。

1.2 卷积自编码器

由于SAE无法提取高光谱遥感图像的空间特征，故需要卷积自编码器(convolutional auto-enoder, CAE)(Masci等，2011)。CAE的结构如图 1所示。

图 1 CAE结构

Fig. 1 The structure of CAE

自编码器(auto-encoder)是一种旨在将输入复制到输出的神经网络，通过将输入压缩成一种隐藏空间表示(latent-space representation)，并将这种输出重构输入。这种网络由编码器和解码器两部分组成，编码器负责将输入压缩在潜在空间，即提取图像特征，解码器旨在重构来自隐藏空间表示的输入，即将提取到的特征还原为原始图像。编码器公式表示为

$ \mathit{\boldsymbol{h}} = f(\mathit{\boldsymbol{x}}) $

(2)

式中，$\boldsymbol{x}$表示原始输入，$f$表示卷积操作，$\boldsymbol{h}$表示图像在隐藏空间的表示，具体对应式(1)中提取图像特征的卷积操作。解码器公式为

$ \mathit{\boldsymbol{r}} = g(\mathit{\boldsymbol{h}}) $

(3)

式中，$g$表示重构操作，对应式(1)的逆向操作，$\boldsymbol{r}$表示重构后的原始图像。因此卷积自编码器可表示为

$ g(f(\mathit{\boldsymbol{x}})) = \mathit{\boldsymbol{r}} $

(4)

最终优化目标为重构后的原始图像$\boldsymbol{r}$与原始图像$\boldsymbol{x}$尽可能一致，即通过训练卷积自编码器，将输入复制到输出中，使隐藏表示$\boldsymbol{h}$具有有效的特征属性(Li等，2016)。通过在复制任务上创建约束条件以实现最终优化目标。在卷积自编码器中，一种可以从自编码器中获得有效特征的方法是约束$\boldsymbol{h}$的维度小于输入数据$\boldsymbol{x}$，在这种约束条件下，自编码器被称为欠完备的(undercomplete)。通过训练不完整的隐藏表示迫使自编码器学习训练数据最具代表性的特征，此时无需学习有关数据分布的有用信息，无需标注后的数据，即无监督学习。

1.3 3D卷积自编码器模型

本文提出的3D-CAE高光谱遥感图像分类模型由3部分组成：数据预处理、特征提取和分类器，其中特征提取环节由编码器和解码器共同作用完成。模型具体结构如图 2所示。

图 2 本文3D-CAE结构

Fig. 2 The structure of the proposed 3D-CAE

1.3.1 数据预处理

高光谱遥感数据复杂，直接运算难度大，为加快神经网络的训练速度和提升训练的精度，需要对高光谱遥感图像数据进行预处理(Martin和Plaza，2012)。具体操作包括将所有数据缩放到[0, 1]之间，计算为

$ \mathit{\boldsymbol{x'}} = \frac{\mathit{\boldsymbol{x}}}{{{x_{{\rm{max}}}}}} $

(5)

式中，$\boldsymbol{x}$表示当前影像数据，$x_\text{max}$表示用于训练的影像数据的最大值，$\boldsymbol{x}′$表示预处理后的影像数据。

1.3.2 特征提取

编码器由3个卷积块堆叠组成，一个卷积块由3D卷积层、BN(batch normalization)层和最大池化层组合构成，具体结构如图 3所示。在卷积后设置BN层每一个batch上将前一层的激活值重新标准化，即确保上方3D卷积层输出的结果标准差接近1，平均值接近0，在深度神经网络训练过程中使得每一层神经网络的输入尽可能保持相同的分布，以避免梯度消失的问题，加快模型的训练速度，提高模型的泛化能力。一个卷积块包含两个3D卷积层可充分提取有效特征，且两个卷积层之间未直接添加池化层可保证第1层卷积提取后的特征完全被传递到第2层进行卷积操作。最大池化层将3D卷积层提取到的高光谱遥感图像数据特征进行下采样操作，池化操作具有特征不变性，即池化后保留的信息具有尺度不变的特征，是图像最显著的特征。池化操作还可替代特征降维，保留核心特征的同时减少参数量和计算量，防止过拟合，提高模型的泛化能力。卷积后使用激活函数进行非线性操作以增强神经网络的复杂性，使神经网络的表达能力更强。整体编码器结构如图 4所示。

图 3 卷积块结构

Fig. 3 The structure of the convolutional block

图 4 编码器结构

Fig. 4 The structure of the encoder

解码器由编码器逆向构成，将编码器的池化下采样替换为上采样操作即可。

编码器与解码器中卷积操作的激活函数均使用ReLU函数。ReLU公式为

$ t(\mathit{\boldsymbol{x}}) = {\rm{max}}(0, \mathit{\boldsymbol{x}}) $

(6)

式中，$\boldsymbol{x}$为输入数据。ReLU作为激活函数，定义了当前神经元在线性变换$\boldsymbol{w}^\text{T}\boldsymbol{x}+\boldsymbol{b}$之后的非线性输出结构。换言之，对于进入神经元的来自上一层神经网络的输入向量$\boldsymbol{x}$，使用ReLU函数的神经元会输出$\text{max}(0, \boldsymbol{w}^\text{T}\boldsymbol{x}+\boldsymbol{b})$至下一层神经元或作为整个神经网络的输出。

Krizhevsky等人(2017)通过大量实验发现处理3维数据立方体较为有效的卷积核大小为3×3×3，因此CAE中编码器和解码器每个卷积核大小均为3×3×3，随着网络的加深，每个卷积层的卷积核数量也在增加。

选择均方误差函数(mean squared error，MSE)作为解码器的损失函数，MSE用来检测模型的预测值和真实值之间的偏差，计算一个像元上数据的误差，求出梯度后反向传播给每层神经元的权重进行更新。具体为

$ {f_{{\rm{MSE}}}} = \frac{1}{M}\sum\limits_{m = 1}^M {{{({y_m} - {{y'}_m})}^2}} $

(7)

式中，$M$为样本总数，$y_m $为真实数据，$y′_m $为解码器重构后的数据。

1.3.3 分类器

分类器采用3层全连接神经网络，第1层神经元个数为256，第2层神经元个数为128，激活函数均使用ReLU，第3层神经元个数为高光谱图像数据的类别个数，激活函数使用Softmax函数，输出每一个类别的概率对应分级结果。Softmax函数计算为

$ {S_i} = \frac{{{e^i}}}{{\sum\limits_j {{e^j}} }} $

(8)

式中，$S_{i}$ 表示图像是第$i$个类别的概率，$e^{i}$表示特征映射图第$i$个类别上的函数值，$j$表示类别个数。分类器的损失函数为交叉熵损失函数，交叉熵用来评估当前训练得到的概率分布与真实分布的差异情况，越小越好，交叉熵函数为

$ Loss = - \sum\limits_i {{y_i}{\rm{ln}}{a_i}} $

(9)

式中，$y$表示真实类别，$a$表示Softmax函数的输出值，$i$表示输出类别的编号。分类器结构如图 5所示。

图 5 分类器结构

Fig. 5 The structure of the classifier

以Indian Pines数据集为例，本文提出的3D-CAE模型的具体网络参数如表 1所示。

表 1 Indian Pines数据集上3D-CAE模型的图像尺寸变化和参数变化
Table 1 Image sizes and parameter quantities of 3D-CNN model on Indian Pines dataset

下载CSV

层(组件类型)	输出尺寸	参数/个
input_1 (InputLayer)	(None, 200, 11, 11, 1)	0
conv3d_1 (Conv3D)	(None, 200, 11, 11, 16)	448
batch_normalization_1 (BatchNormalization)	(None, 200, 11, 11, 16)	64
conv3d_2 (Conv3D)	(None, 200, 11, 11, 16)	6 928
batch_normalization_2 (BatchNormalization)	(None, 200, 11, 11, 16)	64
max_pooling3d_1 (MaxPooling3D)	(None, 100, 5, 5, 16)	0
conv3d_3 (Conv3D)	(None, 100, 5, 5, 32)	13 856
batch_normalization_3 (BatchNormalization)	(None, 100, 5, 5, 32)	128
conv3d_4 (Conv3D)	(None, 100, 5, 5, 32)	27 680
batch_normalization_4 (BatchNormalization)	(None, 100, 5, 5, 32)	128
max_pooling3d_2 (MaxPooling3D)	(None, 50, 2, 2, 32)	0
conv3d_5 (Conv3D)	(None, 50, 2, 2, 64)	55 360
batch_normalization_5 (BatchNormalization)	(None, 50, 2, 2, 64)	256
conv3d_6 (Conv3D)	(None, 50, 2, 2, 64)	110 656
encoder_global (BatchNormalization)	(None, 50, 2, 2, 64)	256
encoder_out (MaxPooling3D)	(None, 25, 1, 1, 64)	0
up_sampling3d_1 (UpSampling3D)	(None, 50, 2, 2, 64)	0
batch_normalization_6 (BatchNormalization)	(None, 50, 2, 2, 64)	256
conv3d_7 (Conv3D)	(None, 50, 2, 2, 64)	110 656
batch_normalization_7 (BatchNormalization)	(None, 50, 2, 2, 64)	256
conv3d_8 (Conv3D)	(None, 50, 2, 2, 64)	110 656
up_sampling3d_2 (UpSampling3D)	(None, 100, 4, 4, 64)	0
batch_normalization_8 (BatchNormalization)	(None, 100, 4, 4, 64)	256
conv3d_9 (Conv3D)	(None, 100, 4, 4, 32)	55 328
batch_normalization_9 (BatchNormalization)	(None, 100, 4, 4, 32)	128
conv3d_10 (Conv3D)	(None, 100, 4, 4, 32)	27 680
up_sampling3d_3 (UpSampling3D)	(None, 200, 8, 8, 32)	0
batch_normalization_10 (BatchNormalization)	(None, 200, 8, 8, 32)	128
zero_padding3d_1 (ZeroPadding3D)	(None, 200, 11, 11, 32)	0
conv3d_11 (Conv3D)	(None, 200, 11, 11, 16)	13 840
batch_normalization_11 (BatchNormalization)	(None, 200, 11, 11, 16)	64
conv3d_12 (Conv3D)	(None, 200, 11, 11, 16)	6 928
conv3d_13 (Conv3D)	(None, 200, 11, 11, 1)	433
注：输出尺寸(None, w, p, q, t)，None为batchsize，是模型的一个超参数，本文实验中取值为16, w为光谱维度，p和q为某一光谱维度上的图像尺寸，t为通道数。

使用3D-CAE作为高光谱遥感图像分类模型的优势在于：

1) 无监督学习，无需大量标注数据即可训练模型，尤其是在当前数据标注需耗费大量人力物力的情况下；

2) 采用空谱融合的特征提取方式，使用3D-CNN可有效提取空间纹理特征和光谱波段特征；

3) 编码器和解码器均具有丰富的可扩展性，对于不同的任务需求可随意更换网络结构或添加针对当前任务的有效模块；

4) 自编码器提取图像特征的过程具有降维和降噪的特性，鲁棒性更强(Liang等，2018)。

2 实验

2.1 实验数据集

为比较3D-CAE与其他模型的效果，使用Indian Pines、Salinas、Pavia University和Botswana这4个数据集进行实验。具体数据集信息如表 2所示。

表 2 四个高光谱遥感图像分类数据集的具体参数
Table 2 Specific parameters of four hyperspectral remote sensing image classification datasets

下载CSV

	数据集
	Indian Pines	Salinas	Pavia University	Botswana
采集时间	1992年	1992年	2001年	2001年
采集地点	美国印第安纳州	美国加利福尼亚州	意大利北部	博茨瓦纳奥卡万戈三角洲
采集设备	AVIRIS	AVIRIS	ROSIS	Hyperion
数据大小/像素	145×145	512×217	610×340	610×340
空间分辨率/m	20	3.7	1.3	30
波段数	200	204	103	145
类别数	16	16	9	14
样本数	10 249	54 129	42 776	3 248

2.2 评价指标

为定量比较各模型之间的效果，选择总体分类精度(overall accuracy，OA)、平均分类精度(average accuracy，AA)和Kappa系数(Kappa)。

1) OA为被正确分类的类别像元数与总类别个数的比值，能够很好地表征分类精度，值越高越好。

2) AA为每个类别召回率求和后的平均值，值越高越好。

3) Kappa代表分类与完全随机分类产生错误减少的比例，值越高越好。

2.3 实验参数与环境

本文设计了基于3D-CNN的自编码器高光谱遥感图像分类模型，编码器提取特征，解码器重构图像。编码器提取的特征连接全连接层进行分类，通过损失函数梯度值的反向传播，来更新自编码器和分类器中权重的参数。选择编码器和分类器批读入数据的大小batchsize均为16，即每一轮迭代输入16个像元，取batchsize为2的整数倍能够充分利用显卡的并行计算能力。采用Adam优化器优化损失函数，Adam是随机梯度下降算法的扩展，可节省内存空间且计算高效，同时自编码器与分类器的学习率均设置为0.001。实验计算机配置为英特尔酷睿I7-8750H CPU，16 GB内存，GTX1060显卡，软件环境为以Keras为主的深度学习框架。

2.4 实验结果及分析

对比方法包括传统方法和深度学习方法。传统方法包含决策树(李爽和张二勋，2003)和支持向量机(王振武等，2016)，深度学习方法包含1D-CNN(Hu等，2015)、2D-CNN(Chen等，2016)、MLP(multi-layer perceptron)(Paoletti等，2019)、GRU(gate recurrent unit)(Paoletti等，2020)和DCAE(deep convolution autoencoder)(Mei等，2019a)。不同数据集在3D-CAE模型上的训练情况如图 6所示。

图 6 不同数据集在3D-CAE模型的训练结果

Fig. 6 The training results of the 3D-CAE on different datasets

((a)Indian Pines; (b)Salinas; (c)Pavia University; (d)Botswana)

2.4.1 Indian Pines数据集

针对Indian Pines数据集，3D-CAE模型的输入数据大小为11×11×200像素，即输入11个像元。

Indian Pines数据集上3D-CAE中自编码器的损失值在10次训练结束后已达到极低值，表明重构后的信息经还原后与原始信息相差无几，特征提取非常有效。分类器损失值在100轮训练之后稳定在较低值，表明分类效果较好。自编码器和分类器的具体训练情况如图 6(a)所示。

除本文方法外，其余方法均有分类准确率为0的情况，尤其是当训练样本较少的条件下较为明显，其余方法均无法正确分类苜蓿和未修剪草地等类别，证明了3D-CAE无需大量标注数据也可保持分类精度高的优势，OA、AA和Kappa系数分别达到了0.948 7、0.936 0和0.941 5，不同模型的具体实验结果如表 3所示，分类结果如图 7所示。

表 3 Indian Pines数据集的分类精度
Table 3 The classification results of Indian Pines

下载CSV

类别	训练样本数	测试样本数	决策树	SVM	1D-CNN	2D-CNN	MLP	GRU	DCAE	3D-CAE
1	7	39	0.076 9	0	0	0	0	0.375	0.282 1	0.923 1
2	294	1 134	0.461 2	0.403 0	0.985 9	0.943 6	0.407 4	0.793 2	0.913 6	0.943 6
3	150	680	0.071 2	0.216 2	0.129 4	0.982 4	0.092 6	0.692 3	0.785 3	0.933 8
4	43	194	0.041 2	0.041 2	0.185 6	1	0.108 2	0.576 9	0.706 2	0.933 0
5	96	387	0.408 3	0.108 5	0.495 3	0.307 5	0.568 5	0.932 9	0.819 1	0.914 7
6	167	563	0.738 9	0.964 5	0.806 4	0.966 3	0.968 0	0.977 2	0.900 5	0.952 0
7	8	20	0.350 0	0	0	0	0	0.7	0.55	1
8	97	381	0.574 8	0.994 8	0.994 8	0.997 4	0.994 8	0.988	0.976 4	0.976 4
9	3	17	0	0	0	0	0	0.125	0.294 1	0.941 2
10	193	779	0.335 0	0.313 2	0.007 7	0.924 3	0.295 3	0.782 1	0.881 9	0.923 0
11	469	1 986	0.505 5	0.893 8	0.000 5	0.998 5	0.861 5	0.854 5	0.738 2	0.963 2
12	130	463	0.142 5	0.453 6	0.008 6	1	0.067 0	0.746 2	0.777 5	0.941 7
13	51	154	0.889 6	0.987 0	0.746 8	0.987 0	0.844 2	0.983 3	0.993 5	0.993 5
14	238	1 027	0.172 3	0.991 2	0.960 1	0.998 1	0.872 4	0.884 5	0.808 2	0.957 2
15	86	300	0.416 7	0.05	0.553 3	0.990 0	0.163 3	0.898 3	0.653 3	0.940 0
16	17	76	0.618 4	0.934 2	0.907 9	0	0.947 4	0.966 7	0.934 2	0.973 7
OA	-	-	0.390 2	0.617 1	0.440 6	0.928 6	0.586 4	0.836 6	0.816 2	0.948 7
AA	-	-	0.450 2	0.554 8	0.580 3	0.709 5	0.506 2	0.767 3	0.829 2	0.936 0
Kappa	-	-	0.307 9	0.547 6	0.368 3	0.918 0	0.509 9	0.813 3	0.791 8	0.941 5
注：加粗字体表示每行最优结果，“-”表示无数据。

图 7 Indian Pines数据集分类图

Fig. 7 The classification of Indian Pines

((a)ground truth; (b)decision tree; (c)SVM; (d)1D-CNN; (e)2D-CNN; (f)MLP; (g)GRU; (h)DCAE; (i)3D-CAE)

2.4.2 Salinas数据集

针对Salinas数据集，3D-CAE模型的输入数据大小为11×11×200像素，即输入11个像元。

Salinas数据集上3D-CAE中自编码器的损失值在20轮迭代后达到极低值，表明3D-CAE只需极少的训练次数即可完成有效的特征提取，相比2D-CNN提取特征复杂且训练次数多，3D-CAE可以极大地提升模型训练效率。分类器在50次训练结束后，损失值维持在较低值且未发生振荡，表明3D-CAE模型稳定且高效，分类精度高，OA、AA和Kappa系数分别达到了0.986 6、0.992 4和0.985 1。自编码器和分类器的具体训练情况如图 6(b)所示。

MLP模型在未繁殖的葡萄园类别上出现较大误差，在数据量多的情况下，无法较好地辨识，GRU和DCAE同样无法达到最好效果；2D-CNN对绿花椰菜类别完全无法分类，而3D-CAE表现稳定，对每个类别都有最好的分类效果。不同模型的具体实验结果如表 4所示，分类结果如图 8所示。

表 4 Salinas数据集的分类精度
Table 4 The classification results of Salinas dataset

下载CSV

类别	训练样本数	测试样本数	决策树	SVM	1D-CNN	2D-CNN	MLP	GRU	DCAE	3D-CAE
1	385	1 624	0.992 6	0.971 7	0.989 5	0	0.992 0	0.983 7	1	1
2	735	2 991	0.997 0	0.991 6	0.992 0	0	0.995 3	1	0.998 3	0.999 0
3	413	1 563	0.961 6	0.682 7	0.993 6	0.996 8	0.995 5	1	0.963 5	0.998 7
4	265	1 129	0.988 5	0.985 8	0.990 3	1	0.993 8	1	0.991 1	0.990 3
5	530	2 148	0.968 3	0.979 5	0.894 3	0.995 3	0.992 8	0.983 0	0.997 7	0.996 7
6	790	3 169	0.990 5	0.998 4	0.999 1	0.998 4	0.998 7	1	1	1
7	677	2 902	0.994 1	0.992 8	0.994 1	0	0.993 8	1	0.998 6	0.999 0
8	2 293	8 978	0.849 4	0.910 7	0.931 8	0.999 6	0.835 4	0.895 4	0.936 2	0.946 5
9	1 243	4 960	0.980 8	0.986 1	0.997 2	1	0.997 6	1	1	1
10	659	2 619	0.876 7	0.855 3	0.935 5	1	0.908 7	0.989 5	0.981 3	0.995 4
11	223	845	0.868 6	0.462 7	0.934 9	0.996 4	0.887 6	0.985 7	0.987 0	0.991 7
12	391	1 563	0.957 7	1	0.996 1	0.999 3	0.996 1	1	1	0.999 3
13	181	735	0.893 9	0.979 6	0.970 1	0.982 3	0.979 6	1	0.997 3	0.997 3
14	226	844	0.950 2	0.912 3	0.956 2	0.997 6	0.940 8	0.953 8	0.994 1	1
15	1 465	5 803	0.530 2	0.349 3	0.393 6	0.999 3	0.662 2	0.737 2	0.864 0	0.991 0
16	349	1 458	0.969 8	0.976 0	0.991 1	1	0.969 1	0.991 2	0.995 9	1
OA	-	-	0.883 4	0.855 5	0.890 1	0.825 3	0.907 6	0.939 5	0.964 9	0.986 6
AA	-	-	0.932 4	0.911 7	0.943 1	0.753 8	0.947 6	0.97	0.982 4	0.992 4
Kappa	-	-	0.869 9	0.830	0.877 1	0.806 1	0.897 0	0.932 7	0.961 0	0.985 1
注：加粗字体表示每行最优结果，“-”表示无数据。

图 8 Salinas数据集分类图

Fig. 8 The classification of Salinas

((a)ground truth; (b)decision tree; (c)SVM; (d)1D-CNN; (e)2D-CNN; (f)MLP; (g)GRU; (h)DCAE; (i)3D-CAE)

2.4.3 Pavia University数据集

针对Pavia University数据集，3D-CAE模型的输入数据大小为11×11×200像素，即输入11个像元。

Pavia University数据集上3D-CAE中自编码器和分类器的训练损失值分别如图 6(c)所示。自编码器40次训练后损失值维持在较低值，分类器在50次训练后维持在较低值且未发生振荡。2D-CNN在分类时出现了较大误差，对于训练样本最多的类别分类准确，但是其余类别均未正确分类，而3D-CAE表现最好，OA、AA和Kappa系数分别达到了0.986 2、0.982 9和0.981 7。除3D-CAE和GRU外，各模型均无法较好地辨识碎石、砖块和阴影，其中砖块类别数据量较多，证明了3D-CAE在大量数据情况下也可取得较好效果。不同类型的具体实验结果如表 5所示，分类结果如图 9所示。

表 5 Pavia University数据集的分类精度
Table 5 The classification results of Pavia University

下载CSV

类别	训练样本数	测试样本数	决策树	SVM	1D-CNN	2D-CNN	MLP	GRU	DCAE	3D-CAE
1	1 319	5 312	0.955 4	0.990 4	0.991 5	0	0.994 0	0.851 8	0.982 3	0.981 6
2	3 747	14 902	0.508 3	0.887 7	0.663 2	1	0.609 2	0.978 4	0.991 2	0.997 9
3	421	1 678	0.415 4	0.237 2	0.465 4	0	0.429 1	0.794 3	0.933 2	0.962 5
4	579	2 485	0.852 7	0.883 7	0.917 1	0	0.894 2	0.934 9	0.985 3	0.985 9
5	279	1 066	0.868 7	0.970 9	0.986 9	0	0.985 0	0.992 2	1	1
6	991	4 038	0.701 8	0.673 8	0.908 9	0	0.874 2	0.821 3	0.949 7	0.984 4
7	265	1 065	0.047 9	0	0.080 8	0	0.031 9	0.759 3	0.960 3	0.934 4
8	750	2 932	0.143 2	0.175 0	0.197 5	0	0.108 1	0.895 6	0.981 8	0.965 2
9	204	743	0.996 0	0.998 7	0.998 7	0	1	1	0.994 7	0.990 6
OA	-	-	0.598 3	0.762 5	0.711 2	0.435 4	0.671 3	0.931 8	0.980 1	0.986 2
AA	-	-	0.595 7	0.681 1	0.724 6	0.048 3	0.769 2	0.903 1	0.981 3	0.982 9
Kappa	-	-	0.500 7	0.682 6	0.634 9	0	0.586 5	0.908 6	0.973 6	0.981 7
注：加粗字体表示每行最优结果。“-”表示无数据。

图 9 Pavia University数据集分类图

Fig. 9 The classification of Pavia University

((a)ground truth; (b)decision tree; (c)SVM; (d)1D-CNN; (e)2D-CNN; (f)3D-CAE; (g)GUR; (h)DCAE; (i)3D-CAE)

2.4.4 Botswana数据集

针对Botswana数据集，3D-CAE模型的输入数据大小为11×11×200像素，即输入11个像元。

自编码器和分类器的具体训练情况如图 6(d)所示，自编码器在10次训练后损失值维持在较低水平，分类器在100次训练后趋于平缓且未发生振荡，较为稳定。在其余模型分类准确率均出现0的情况下，3D-CAE依然稳定，OA、AA和Kappa分别达到了0.964 9、0.965 9和0.962 0，表现最好。不同类型的具体实验结果如表 6所示。

表 6 Botswana数据集的分类精度
Table 6 The classification results of Botswana

下载CSV

类别	训练样本数	测试样本数	决策树	SVM	1D-CNN	2D-CNN	MLP	GRU	DCAE	3D-CAE
1	48	222	0.986 5	1	1	1	1	1	1	0.995 5
2	19	82	0.878 0	0.804 9	0.841 5	0	1	1	1	1
3	46	205	0.039	0.795 1	0.985 4	0	0.985 4	1	0.961	0.951 2
4	50	165	0	0.490 9	0.969 7	1	0.890 9	0.988 4	0.987 9	0.963 6
5	62	207	0.222 2	0.256 0	0.241 5	0.884 1	0.608 7	0.869 8	0.932 4	0.826 1
6	52	217	0.502 3	0	0.318 0	1	0.603 7	0.832 6	0.861 8	0.880 2
7	52	207	0.985 5	0.946 9	0.932 4	1	0.975 8	0.990 3	0.995 2	1
8	41	162	0.006 2	0.401 2	0.024 7	1	0.709 9	0.993 8	1	1
9	64	250	0.328 0	0.972 0	0.928 0	0.992 0	0.944 0	0.948 2	0.776 0	0.968 0
10	51	197	0.248 7	0.010 2	0.035 5	1	0.208 1	0.929 6	0.923 9	1
11	48	257	0.070 0	0.182 9	0.108 9	1	0.801 6	0.991 8	1	1
12	47	134	0	0.014 9	0	0	0.582 1	0.896 6	1	0.992 5
13	52	216	0.111 1	0.046 3	0	1	0.504 6	0.958 1	0.972 2	0.990 7
14	17	78	0.038 5	0.282 1	0.025 6	0	0.641 0	1	1	0.987 2
OA	-	-	0.379 3	0.450 9	0.476 3	0.797 9	0.749 1	0.953 1	0.949 2	0.964 9
AA	-	-	0.410 6	0.490 5	0.457 0	0.587 8	0.797 1	0.957 1	0.951 7	0.965 9
Kappa	-	-	0.325 2	0.404 5	0.432 2	0.780 2	0.727 5	0.949 1	0.944 9	0.962 0
注：加粗字体表示每行最优结果，“-”表示无数据。

2.4.5 消融性实验

由于3D-CAE由自编码器和分类器两部分构成，故为证明自编码器的有效性，对3D-CAE模型进行消融性实验。4个数据集分别由4种结构的分类器进行分类，分类时使用的特征完全相同。分类器均采用两层隐藏层的神经网络，改变每一层的神经元个数，隐藏层神经元的节点数分别为64—32、128—64、256—128和512—256。其中128—64的隐藏层结构与512—256的隐藏层结构实验结果相差无几，充分证明了自编码器提取特征的有效性。具体实验结果如表 7所示。

表 7 消融性实验结果
Table 7 Ablation experiment results

下载CSV

数据集	评价指标	隐藏层神经元的节点数
数据集	评价指标	64—32	128—64	256—128	512—256
	OA	0.850 4	0.948 7	0.941 4	0.946 4
Indian Pines	AA	0.828 2	0.936 0	0.949 1	0.956 4
	Kappa	0.828 5	0.941 5	0.933 1	0.938 8
	OA	0.987 1	0.986 6	0.992 4	0.991 0
Salinas	AA	0.993 4	0.992 4	0.992 4	0.995 1
	Kappa	0.985 7	0.985 1	0.996 4	0.990 0
	OA	0.985 6	0.986 2	0.988 9	0.965 2
Pavia University	AA	0.982 0	0.982 9	0.986 6	0.950 4
	Kappa	0.981 0	0.981 7	0.985 2	0.958 3
	OA	0.989 2	0.964 9	0.993 0	0.984 6
Botswana	AA	0.987 8	0.965 9	0.992 9	0.981 2
	Kappa	0.988 3	0.962 0	0.992 4	0.983 3
注：加粗字体表示每行最优结果。

2.4.6 模型泛化性比较

为证明3D-CAE泛化性强，所有模型在4个数据集上使用5种不同训练集划分比例，取各模型OA的值进行比较并用折线图展示变化情况，其中训练集划分比例包含5%、8%、10%、15%和20%。

Indian Pines数据集上不同模型的实验结果如图 10(a)所示，其中本文方法在5个训练集划分比例下均保持最优结果，且随着训练集比例的增加，OA在增加的同时未发生振荡，即使训练样本较少，OA也可维持在最高值。

图 10 各模型不同训练集比例分类结果

Fig. 10 Classification results of different training set proportion of each model

((a)Indian Pines; (b)Salinas; (c)Pavia University; (d)Botswana)

Salinas数据集上不同模型的实验结果如图 10(b)所示，本文提出的3D-CAE在只有5%训练数据的情况下OA仍接近于1，且随着训练比例的增大，OA未出现较大振荡，表现稳定，其余各模型在5种划分情况下均未达到3D-CAE的分类精度。

Pavia University数据集上不同模型的实验结果如图 10(c)所示，3D-CAE在5种划分情况下表现几乎一致，稳定且OA保持在最高值。DCAE在20%做为训练集的情况下，达到了3D-CAE的结果。

Botswana数据集上不同模型的实验结果如图 10(d)所示，由于该数据集地物特征明显，较易辨识，故3D-CAE、GRU和DCAE在8%、10%和20%的情况下OA接近，但3D-CAE在5种划分情况下分类精度均保持在最高值，其他模型分类结果较差。

本文提出的3D-CAE模型在4个数据集的5种划分情况下，OA均保持在最高值且较为稳定，受训练数据数量的影响较小，表明3D-CAE具有较好的泛化性能，同时也体现出无监督方法的优势。

2.4.7 深度学习模型参数比较

为证明提出的3D-CAE参数量较少，对比不同模型之间的参数，具体参数如表 8所示。2D-CNN参数最多，在OA和Kappa两个评价指标上取得较好结果，而AA表现较差。GRU依靠RNN的优势，参数量较少，模型分类结果却较差。DCAE虽然参数接近2D-CNN，但除AA外，其他指标均落后于2D-CNN。本文提出的3D-CAE参数量少于有监督深度学习方法2D-CNN和无监督深度学习方法DCAE，仍在3个评价指标上达到最优。

表 8 深度学习模型参数
Table 8 Deep learning model parameters

下载CSV

模型	参数	OA	AA	Kappa
1D-CNN	201 756	0.440 6	0.580 3	0.368 3
2D-CNN	842 096	0.928 6	0.709 5	0.918 0
GRU	243 280	0.836 6	0.767 3	0.813 3
DCAE	833 823	0.816 2	0.829 2	0.791 8
3D-CAE	755 665	0.948 7	0.936 0	0.941 5
注：加粗字体表示每列最优结果。

3 结论

本文以卷积自编码器和3D-CNN为基础，提出了一种基于无监督学习的高光谱遥感图像分类方法。通过3D卷积自编码器(3D-CAE)同时提取高光谱数据的空间特征和光谱特征，充分利用高光谱数据的特性，做到空谱融合，编码器训练时还可对数据进行端到端地降噪和降维，提高效率，省去人工降噪。利用解码器对重构后的特征进行还原，以判断编码器提取特征是否有效，分类器使用编码器提取的特征进行分类。编码器和解码器的结构可根据实际问题选择不同的结构，整体模型无需复杂的预处理，在多个大型公开数据集上都取得较好结果。

未来将进一步研究无监督、半监督以及自监督高光谱图像分类，探索其他无需利用先验信息的高光谱遥感图像分类方法，以解决数据标注困难和有标注数据较少的现状。

参考文献

Byeon W, Breuel T M, Raue F and Liwicki M. 2015. Scene labeling with LSTM recurrent neural networks//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 3547-3555[DOI: 10.1109/CVPR.2015.7298977]

Chang C I. 2003. Hyperspectral Imaging: Techniques for Spectral Detection and Classification. New York: Springer US: 15-34

Chen X, Fang T, Huo H, Li D R. 2011. Graph-based feature selection for object-oriented classification in VHR airborne imagery. IEEE Transactions on Geoscience and Remote Sensing, 49(1): 353-365 [DOI:10.1109/TGRS.2010.2054832]

Chen Y S, Jiang H L, Li C Y, Jia X P, Ghamisi P. 2016. Deep feature extraction and classification of hyperspectral images based on convolutional neural networks. IEEE Transactions on Geoscience and Remote Sensing, 54(10): 6232-6251 [DOI:10.1109/TGRS.2016.2584107]

Chen Y S, Zhao X, Jia X P. 2015. Spectral-spatial classification of hyperspectral data based on deep belief network. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 8(6): 2381-2392 [DOI:10.1109/JSTARS.2015.2388577]

Du P J, Xia J S, Xue Z H, Tan K, Su H J, Bao R. 2016. Review of hyperspectral remote sensing image classification. Journal of Remote Sensing, 20(2): 236-256 (杜培军, 夏俊士, 薛朝辉, 谭琨, 苏红军, 鲍蕊. 2016. 高光谱遥感影像分类研究进展. 遥感学报, 20(2): 236-256) [DOI:10.11834/jrs.20165022]

Fauvel M, Tarabalka Y, Benediktsson J A, Chanussot J, Tilton J C. 2013. Advances in spectral-spatial classification of hyperspectral images. Proceedings of the IEEE, 101(3): 652-675 [DOI:10.1109/JPROC.2012.2197589]

Hara K, Kataoka H and Satoh Y. 2018. Can spatiotemporal 3D CNNs retrace the history of 2D CNNs and ImageNet?//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 6546-6555[DOI: 10.1109/CVPR.2018.00685]

Hochreiter S, Schmidhuber J. 1997. Long short-term memory. Neural Computation, 9(8): 1735-1780 [DOI:10.1162/neco.1997.9.8.1735]

Hu W, Huang Y Y, Wei L, Zhang F, Li H C. 2015. Deep convolutional neural networks for hyperspectral image classification. Journal of Sensors, 2015: #258619 [DOI:10.1155/2015/258619]

Kang X D, Li S T, Benediktsson J A. 2014. Feature extraction of hyperspectral images with image fusion and recursive filtering. IEEE Transactions on Geoscience and Remote Sensing, 52(6): 3742-3752 [DOI:10.1109/TGRS.2013.2275613]

Krizhevsky A, Sutskever I, Hinton G E. 2017. ImageNet classification with deep convolutional neural networks. Communications of the ACM, 60(6): 84-90 [DOI:10.1145/3065386]

Li G D, Zhang C J, Gao F, Zhang X Y. 2019. Doubleconvpool-structured 3D-CNN for hyperspectral remote sensing image classification. Journal of Image and Graphics, 24(4): 639-654 (李冠东, 张春菊, 高飞, 张雪英. 2019. 双卷积池化结构的3D-CNN高光谱遥感影像分类方法. 中国图象图形学报, 24(4): 639-654) [DOI:10.11834/jig.180422]

Li W J, Fu H H, Yu L, Gong P, Feng D L, Li C C, Clinton N. 2016. Stacked autoencoder-based deep learning for remote-sensing image classification: a case study of African land-cover mapping. International Journal of Remote Sensing, 37(23): 5632-5646 [DOI:10.1080/01431161.2016.1246775]

Li S, Zhang E X. 2003. The decision tree classification and its application in land cover. Areal Research and Development, 22(1): 17-21 (李爽, 张二勋. 2003. 基于决策树的遥感影像分类方法研究. 地域研究与发展, 22(1): 17-21) [DOI:10.3969/j.issn.1003-2363.2003.01.005]

Liang J, Zhou J, Qian Y T, Wen L, Bai X, Gao Y S. 2017. On the sampling strategy for evaluation of spectral-spatial methods in hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing, 55(2): 862-880 [DOI:10.1109/TGRS.2016.2616489]

Liang P, Shi W Z, Zhang X K. 2018. Remote sensing image classification based on stacked denoising autoencoder. Remote Sensing, 10(2): #16 [DOI:10.3390/rs10010016]

Lu X Q, Zheng X T, Yuan Y. 2017. Remote sensing scene classification by unsupervised representation learning. IEEE Transactions on Geoscience and Remote Sensing, 55(9): 5148-5157 [DOI:10.1109/TGRS.2017.2702596]

Martin G, Plaza A. 2012. Spatial-spectral preprocessing prior to endmember identification and unmixing of remotely sensed hyperspectral data. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 5(2): 380-395 [DOI:10.1109/JSTARS.2012.2192472]

Masci J, Meier U, Cireşan D and Schmidhuber J. 2011. Stacked convolutional auto-encoders for hierarchical feature extraction//Honkela T, Duch W, Girolami M and Kaski S, eds. Artificial Neural Networks and Machine Learning-ICANN. Berlin, Germany: Springer: 52-59[DOI: 10.1007/978-3-642-21735-7_7]

Mei S H, Ji L Y, Geng Y H, Zhang Z, Xu L, Du Q. 2019a. Unsupervised spatial-spectral feature learning by 3D convolutional autoencoder for hyperspectral classification. IEEE Transactions on Geoscience and Remote Sensing, 57(9): 6808-6820 [DOI:10.1109/TGRS.2019.2908756]

Mei X G, Pan E T, Ma Y, Dai X B, Huang J, Fan F, Du Q L, Zheng H, Ma J Y. 2019b. Spectral-spatial attention networks for hyperspectral image classification. Remote Sensing, 11(8): #963 [DOI:10.3390/rs11080963]

Melgani F, Bruzzone L. 2004. Classification of hyperspectral remote sensing images with support vector machines. IEEE Transactions on Geoscience and Remote Sensing, 42(8): 1778-1790 [DOI:10.1109/TGRS.2004.831865]

Mou L C, Ghamisi P, Zhu X X. 2017. Deep recurrent neural networks for hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing, 55(7): 3639-3655 [DOI:10.1109/TGRS.2016.2636241]

Paoletti M E, Haut J M, Plaza J, Plaza A. 2019. Deep learning classifiers for hyperspectral imaging: a review. ISPRS Journal of Photogrammetry and Remote Sensing, 158: 279-317 [DOI:10.1016/j.isprsjprs.2019.09.006]

Paoletti M E, Haut J M, Plaza J, Plaza A. 2020. Scalable recurrent neural network for hyperspectral image classification. The Journal of Supercomputing, 76(11): 8866-8882 [DOI:10.1007/s11227-020-03187-0]

Romero A, Gatta C, Camps-Valls G. 2016. Unsupervised deep feature extraction for remote sensing image classification. IEEE Transactions on Geoscience and Remote Sensing, 54(3): 1349-1362 [DOI:10.1109/TGRS.2015.2478379]

Rußwurm M and Körner M. 2017. Temporal vegetation modelling using long short-term memory networks for crop identification from medium-resolution multi-spectral satellite images//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Honolulu, USA: IEEE: 1496-1504[DOI: 10.1109/CVPRW.2017.193]

Song B Q, Li J, Mauro D M, Li P J, Plaza A, Bioucas-Dias J M, Benediktsson J A, Chanussot J. 2014. Remotely sensed image classification using sparse representations of morphological attribute profiles. IEEE Transactions on Geoscience and Remote Sensing, 52(8): 5122-5136 [DOI:10.1109/TGRS.2013.2286953]

Swain P H, Hauska H. 1977. The decision tree classifier: design and potential. IEEE Transactions on Geoscience Electronics, 15(3): 142-147 [DOI:10.1109/TGE.1977.6498972]

Tong Q X, Zhang B, Zheng L F. 2006. Hyperspectral Remote Sensing. Beijing: Higher Education Press (童庆禧, 张兵, 郑兰芬. 2006. 高光谱遥感: 原理、技术与应用. 北京: 高等教育出版社)

Villa A, Benediktsson J A, Chanussot J, Jutten C. 2011. Hyperspectral image classification with independent component discriminant analysis. IEEE Transactions on Geoscience and Remote Sensing, 49(12): 4865-4876 [DOI:10.1109/TGRS.2011.2153861]

Wan Y L, Zhong X W, Liu H and Qian Y R. 2021. Survey of application of convolutional neural network in classification of hyperspectral images[J/OL]. Computer Engineering and Applications. [2021-01-08] (万亚玲, 钟锡武, 刘慧, 钱育蓉. 2021. 卷积神经网络在高光谱图像分类中的应用综述[J/OL]. 计算机工程与应用. [2021-01-08]. https://kns.cnki.net/kcms/detail/11.2127.TP.20210107.0841.002.html)

Wang Z W, Sun J J, Yu Z Y, Bu Y Y. 2016. Review of remote sensing image classification based on support vector machine. Computer Science, 43(9): 11-17, 31 (王振武, 孙佳骏, 于忠义, 卜异亚. 2016. 基于支持向量机的遥感图像分类研究综述. 计算机科学, 43(9): 11-17, 31) [DOI:10.11896/j.issn.1002-137X.2016.9.002]

Zhang H K, Li Y, Jiang Y N. 2018. Deep learning for hyperspectral imagery classification: the state of the art and prospects. Acta Automatica Sinica, 44(6): 961-977 (张号逵, 李映, 姜晔楠. 2018. 深度学习在高光谱图像分类领域的研究现状与展望. 自动化学报, 44(6): 961-977) [DOI:10.16383/j.aas.2018.c170190]

Zhang L P, Zhang L F, Du B. 2016. Deep learning for remote sensing data: a technical tutorial on the state of the art. IEEE Geoscience and Remote Sensing Magazine, 4(2): 22-40 [DOI:10.1109/MGRS.2016.2540798]

Zhao J, Zhong Y F, Shu H, Zhang L P. 2016. High-resolution image classification integrating spectral-spatial-location cues by conditional random fields. IEEE Transactions on Image Processing, 25(9): 4033-4045 [DOI:10.1109/TIP.2016.2577886]

Zhao W Z, Du S H. 2016. Spectral-spatial feature extraction for hyperspectral image classification: a dimension reduction and deep learning approach. IEEE Transactions on Geoscience and Remote Sensing, 54(8): 4544-4554 [DOI:10.1109/TGRS.2016.2543748]

Zhu J Z, Shi Q, Chen F E, Shi X D, Dong Z M, Qin Q Q. 2016. Research status and development trends of remote sensing big data. Journal of Image and Graphics, 21(11): 1425-1439 (朱建章, 石强, 陈风娥, 史晓丹, 董泽民, 秦前清. 2016. 遥感大数据研究现状与发展趋势. 中国图象图形学报, 21(11): 1425-1439) [DOI:10.11834/jig.20161102]