李冠东,张春菊,高飞,张雪英(合肥工业大学土木与水利工程学院, 合肥 230009;南京师范大学虚拟地理环境教育部重点实验室, 南京 210046)
目的 高光谱遥感影像数据包含丰富的空间和光谱信息，但由于信号的高维特性、信息冗余、多种不确定性和地表覆盖的同物异谱及同谱异物现象，导致高光谱数据结构呈高度非线性。3D-CNN（3D convolutional neural network）能够利用高光谱遥感影像数据立方体的特性，实现光谱和空间信息融合，提取影像分类中重要的有判别力的特征。为此，提出了基于双卷积池化结构的3D-CNN高光谱遥感影像分类方法。方法 双卷积池化结构包括两个卷积层、两个BN（batch normalization）层和一个池化层，既考虑到高光谱遥感影像标签数据缺乏的问题，也考虑到高光谱影像高维特性和模型深度之间的平衡问题，模型充分利用空谱联合提供的语义信息，有利于提取小样本和高维特性的高光谱影像特征。基于双卷积池化结构的3D-CNN网络将没有经过特征处理的3D遥感影像作为输入数据，产生的深度学习分类器模型以端到端的方式训练，不需要做复杂的预处理，此外模型使用了BN和Dropout等正则化策略以避免过拟合现象。结果 实验对比了SVM（support vector machine）、SAE（stack autoencoder）以及目前主流的CNN方法，该模型在Indian Pines和Pavia University数据集上最高分别取得了99.65%和99.82%的总体分类精度，有效提高了高光谱遥感影像地物分类精度。结论 讨论了双卷积池化结构的数目、正则化策略、高光谱首层卷积的光谱采样步长、卷积核大小、相邻像素块大小和学习率等6个因素对实验结果的影响，本文提出的双卷积池化结构可以根据数据集特点进行组合复用，与其他深度学习模型相比，需要更少的参数，计算效率更高。
Doubleconvpool-structured 3D-CNN for hyperspectral remote sensing image classification
Li Guandong,Zhang Chunju,Gao Fei,Zhang Xueying(School of Civil Engineering, Hefei University of Technology, Hefei 230009, China;Key Laboratory of Virtual Geographical Environment(Nanjing Normal University), Ministry of Education, Nanjing 210046, China)
Objective Hyperspectral remote sensing image data are rich in spatial and spectral information. Continuous spectral segment information enhances the capability to distinguish between ground objects. This information has been widely used in the fields of image classification, target detection, agricultural monitoring, and environmental management. However, the data structure of hyperspectral remote sensing image is highly nonlinear due to the high-dimensional characteristics of the signal, information redundancy, and multiple uncertainties. Some classification models based on statistical patterns are difficult to classify and recognize original hyperspectral data directly. Training samples for supervised learning are extremely limited. A Hughes phenomenon occurs for a limited number of training samples, that is, the classification accuracy decreases as feature dimension increases. The traditional pixel-level hyperspectral remote sensing image classification method mostly adopts the framework of feature extraction and classifier. In the feature extraction, a series of spectral feature dimension reduction methods is proposed for the high spectral characteristics of hyperspectral data. However, these methods cannot solve the nonlinear problem of hyperspectral data. Some methods only use spectral information, which will greatly neglect the rich spatial structural information of high-resolution images. Classification results often have many discrete isolated points, and the classification accuracy is greatly reduced. Therefore, introducing spatial information is necessary. In recent years, image classification based on deep learning has become a research hotspot. In comparison with the traditional artificial design features, it can automatically extract the abstract features from the bottom to the high-level semantics and convert the images into easily recognizable advanced features. At present, mainstream methods include the use of image input 2D-CNN(2D convolutional neural network) after PCA(principal component analysis) dimensionality reduction and fusion with spectral information in the subsequent stage, to achieve the extraction of spatial spectrum information. However, these methods require separate extraction of spatial and spectral information, do not take advantage of the combined spatial-spectral information, and require complex preprocessing. Moreover, 3D-CNN is used to extract spatial-spectral information simultaneously. The 3D-CNN simultaneously acquires the spectral and spatial information of hyperspectral remote sensing images and utilizes the characteristics of hyperspectral remote sensing image data cubes to achieve full fusion of spectral and spatial information. It extracts important discriminative features from the classification and effectively solves the problem of spatial homogeneity and heterogeneity. Therefore, the use of 3D-CNN for spatial and spectral information extraction of hyperspectral remote sensing images has become the development trend of image classification. However, such methods use only simply stacked CNNs, do not fully consider the excellent features of 3D-CNN, and has low model scalability. This study proposes a 3D-CNN model based on a doubleconvpool structure. Method Doubleconvpool structure includes two convolution layers, two BN(batch normalization) layers, and one pooling layer. It not only considers the lack of label data in hyperspectral images but also the balance between the high-dimensional characteristics of hyperspectral images and model depth. Contrary to the use of only spatial or spectral information, the model fully uses the semantic information provided by the spatial-spectral information, thereby facilitating the feature extraction of hyperspectral images with small samples and high-dimensional characteristics. In a 3D-CNN based on doubleconvpool structure, the 3D remote sensing image without feature engineering is used as input data, and the deep learning model is trained in an end-to-end approach without complicated preprocessing. Moreover, the model uses regularization strategies, such as BN and Dropout, to avoid overfitting. We use the dual convolution pooling structure as a standard component of the network, and its number is used as an important hyperparameter of the network. For images with different data characteristics, acceptable classification results are achieved by rationally designing the number of doubleconvpool structures. The proposed method avoids the hyperparameter setting for the network when applied on different datasets and must greatly modify the network parameters. It also enhances the scalability of the network. Result The experiment compares SVM(support vector machine), SAE(stack autoencoder), and the current mainstream CNN method. The model has achieved 99.65% and 99.82% of the overall classification accuracy on the Indian Pines and Pavia University datasets, respectively. It effectively improves the classification accuracy of hyperspectral remote sensing images. We analyze and discuss the number of doubleconvpool structures, the regularization strategy, the spectral sampling stride of the first-layer convolution, the size of the convolution kernel, the size of neighboring pixel blocks, and the learning rate to provide a reasonable model under different constraints, such as training time and computational cost. Conclusion The doubleconvpool structure can be combined and multiplexed according to the characteristics of datasets. In comparison with other deep learning models, it requires less parameters and has higher computational efficiency. It further illustrates the deep learning, particularly the application potential of 3D-CNN on hyperspectral images.