发布时间: 2019-01-16 摘要点击次数: 全文下载次数: DOI: 10.11834/jig.180437 2019 | Volume 24 | Number 1 NCIG 2018会议专栏

1. 合肥工业大学计算机与信息学院, 合肥 230000;
2. 工业安全与应急技术安徽省重点实验室, 合肥 230000
 收稿日期: 2018-07-04; 修回日期: 2018-08-08 基金项目: 国家自然科学基金项目（61472380）；中央高校基本科研业务费专项基金项目（JD2017JGPY0011，JZ2017HGBZ0930） 第一作者简介: 方帅, 1978年生, 女, 教授, 硕士生导师, 博士后, 主要研究方向为计算机视觉、图像复原。E-mail:fangshuai@163.com;祝凤娟, 女, 硕士研究生, 主要研究方向为高光谱图像分类。E-mail:957735787@qq.com;张晶, 女, 副教授, 主要研究方向为遥感信息处理。E-mail:zhangjing@hfut.edu.cn. 中图法分类号: TP751.2 文献标识码: A 文章编号: 1006-8961(2019)01-0135-14

关键词

Sample optimized selection of hyperspectral image classification
Fang Shuai1,2, Zhu Fengjuan1, Dong Zhangyu1,2, Zhang Jing1
1. School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230000, China;
2. Anhui Province Key Laboratory of Industry Safety and Emergency Technology, Hefei 230000, China
Supported by: National Natural Science Foundation of China (61472380)

Abstract

Objective In recent years, an increasing number of applications have been realized for remote sensing images. Hyperspectral image classification is a widely used method for hyperspectral image processing. For the traditional high-spectral classification problem, the mainstream improvement methods were compared to optimize the classifier or the classification algorithm. This approach does not address the existing limitations; thus, we proposed new improvements on the basis of this problem. In other words, while improving the classifier, the sample space was optimized to obtain a group of representative training samples to ensure the overall classification accuracy. The traditional classification method considers only the improvement of the classification algorithm and the classifier and adopts various random selection methods for sample acquisition. The spectral properties of even the same kind of substances are different. In view of the differences in the in-class spectral characteristics, the conventional random sample selection strategy for random sample selection in a certain proportion of various substances cannot guarantee that the selected training samples can contain complete spectral features. To solve this problem, a sample space optimization strategy based on in-class reclustering was proposed. In this way, the selected training samples can be guaranteed to contain various spectral curves of each class, and the selected samples are uniformly distributed in each subclass of each class. Moreover, to further improve the classification accuracy, the classifier must also be improved. According to integrated learning, the accuracy of point classification with the same classification results for multiple classifiers is higher, whereas the error rate with deviations is higher. Therefore, for the low-confidence classification result with low classification accuracy, the high-confidence region with high accuracy was optimized again. In this paper, the optimization method was a correction strategy based on neighborhood high-confidence information, which optimized the classification results of the low-confidence region by obtaining the classification results of the high-confidence region to improve the accuracy of the results of low-confidence region and the overall classification accuracy. Given that the classification strategy used in this paper is point classification, the domain information was not considered. In fact, the category information at a certain point was the same as the category information in the same region affected by the neighborhood information. Therefore, we used the edge protection filter to smooth various types of information based on edge protection to ensure the similarity between the information of a certain point and the field information and to further improve the classification accuracy. Method In this paper, fuzzy C-means clustering algorithm was used to implement class-to-class clustering on each class of samples. As the spectral characteristics of the same kind of samples are different, the same kind of samples were grouped into several subclasses according to the difference in spectral characteristics. When selecting samples, we ensured that samples were selected from each subclass of each class to ensure that the sample covered the entire sample space. For the correction strategy of the neighborhood high-confidence information, this paper used edge protection filter to optimize the low-confidence information by using the high-confidence region information. First, two simple classifiers, namely, support vector machine (SVM) classifier and sparse representation-based classifier (SRC), were used to test the consistency of the classification results of the two classifiers. The point set with consistent classification results was the high-confidence region, and the point set with inconsistent results was the low-confidence region. Then, the results of the low-confidence region were optimized by using the edge protection filter. First, the hyperspectral images were processed by principal component analysis to obtain the first principal component. Given that the first principal component contained most of the structure information of the image, the first principal component was used as the guide diagram. Then, the high-confidence region was filtered, and a small number of the low-confidence region point sets were propagated by the high-confidence information. In this way, the low-confidence region can obtain a new category information to replace the original low-confidence category information to modify the low-confidence region classification results. The edge-protecting filter has the feature of edge-preserving smoothness. The aforementioned strategies demonstrated that our classification effect was greatly improved. In addition, even when a small proportion of training samples was selected, the samples can be trained with a high classifier after the sample space was optimized to ensure the stability of the classification accuracy. Result The experiments used three sets of experimental data, namely, the India Pines, Salinas, and PaviaU datasets. Then, two sets of experiments were set up according to the different sample selection proportion to compare the classic and the proposed classification algorithm. In the first experiment, we selected 10%, 1%, and 1% training samples. The experimental results revealed that the overall accuracy (OA) values of the three datasets reached up to 98.93%, 99.78%, and 99.40% respectively, which was 1% higher than that of other optimal algorithms. In the second small-scale sample experiment, we set the sample proportions to 1%, 0.3%, and 0.4%. The OA values for the India Pines, Salinas, and PaviaU datasets reached up to 90.48%, 99.68%, and 98.54%.The OA values of the three groups of data were 4%-6% higher than that of other algorithms for the same proportions. The experimental results suggested that the proposed algorithm is superior to other algorithms, and that the classification accuracy was greatly improved, particularly in experiments with small sample proportions. Conclusion In this paper, a representative and balanced sample was selected through sample space optimization strategy to ensure that the classification accuracy remained significant for small-scale samples. The correction strategy based on neighborhood high-confidence information offered a good optimization effect. Moreover, this algorithm adapted to many datasets and achieved good robustness. In summary, the results showed that the reduction of sample proportion resulted in the rapid decline and instability of the classification effect of the traditional classification algorithm. The proposed algorithm offers obvious advantages, which ensures not only the high accuracy but also the stability of the classification results.

Key words

remote sensing; hyperspectral classification; spectral characteristic; sample space optimization; class reclustering; high confidence region; edge protection filtering

0 引言

1) 适合小比例样本，以缩短运算时间；

2) 优化样本空间，即使样本选取比例很小时，分类结果依然很好；

3) 边缘保护滤波器的使用，能纠正低置信区域标签，同时保持原图的结构，从而得到一个精度高的分类图；

4) 高置信区域样本点的获取，得到正确度更高的分样本点，使边缘保护滤波器优势显著。

2.2.1 均值降维

1) 将$N$维的HSI图像$\mathit{\boldsymbol{I}}$分为$M$组，每组长度为$\left\lceil {N/M} \right\rceil$，若最后一组不足长度值，则取剩余波段为一组。

2) 对每一组组内$\left\lceil {N/M} \right\rceil$维数据对应点相加取均值(最后一组按实际维数取均值)，每组都得到一个1维图像，这样就得到了$M$维的去噪高光谱图。

2.2.2 本征图分解

 $\mathit{\boldsymbol{I}} = \mathit{\boldsymbol{S}} \cdot \mathit{\boldsymbol{R}}$ (1)

$M$维降维图的基础上，取$Z$个通道为一组，进行本征图分解，可以得到$M$维反射率图$\mathit{\boldsymbol{R}}$，及其对应的$\left\lceil {N/Z} \right\rceil$维的$\mathit{\boldsymbol{S}}$图。得到的$\mathit{\boldsymbol{R}}$图即为图 3中的$\mathit{\boldsymbol{R}}$

2.3 类内再聚类

1) 类内样本再聚类：将各类样本利用FCM[18]聚类为$k$个子类，如图 3中Sub_samples，$k$的选取根据每类样本数量占总样本比例选取，以$k_0$作为调节参数。对于相同的$k_0$，某类样本比例越高，$k$值越大，反之，则$k$值越小。

2) 样本均衡选取：类内聚类多个子类时，按比例选取对应子类的样本，确保样本遍布各个子类。得到训练样本Training samples。

Table 1 Comparison table of classification accuracy of each subclass before and after sample space optimization when India Pines takes 1% samples

 类别 样本数量 精度 优化效果 未优化 优化 1 1 0.978 3 1.000 0 1 2 15 0.688 7 0.864 1 1 3 9 0.638 4 0.715 6 1 4 3 0.859 1 0.951 8 1 5 5 0.687 6 1.000 0 1 6 8 0.989 3 0.990 3 1 7 1 0.212 6 0.600 0 1 8 5 1.000 0 1.000 0 1 9 1 1.000 0 0.655 2 0 10 10 0.719 6 0.895 4 1 11 25 0.951 2 0.888 8 0 12 6 0.789 6 0.748 6 0 13 3 0.966 3 1.000 0 1 14 13 0.899 0 0.995 2 1 15 4 0.544 3 0.997 3 1 16 1 1.000 0 0.968 4 0 注：类别1~16对应于图 9(a)中的类别: 1(Alfalfa)；2(Corn-not ill)；3(Corn-min till)；4(Corn)；5(Grass/pasture)；6(Grass/trees)；7(Grass/pasture-mowed)；8(Hay-windrowed)；9(Oats)；10(Soybeans-not ill)；11(Soybeans-min till)；12(Soybeans-clean till)；13(Wheat)；14(Woods)；15(Bldg-Grass-Tree-Drives)；16(Stone-steel towers)。

2.4.1 高置信区域

 $\mathit{HCS}\left( {{x_1}, {x_2}} \right) = \left\{ {\begin{array}{*{20}{c}} {{x_1}}&{{x_1} = {x_2}}\\ 0&{{x_1} \ne {x_2}} \end{array}} \right.$ (2)

4) 本征图分解相关参数。对于3组数据，设置$M$=32、$Z$=4。具体参数如表 2所示。

Table 2 Experimental parameters

 数据集 训练样本 聚类分割 IID EPFs $b$ $k_0$ $M$ $Z$ $r$ $\varepsilon$ India Pines 10% 2 100 32 4 3 0.001 Salinas 1% 2 100 32 4 3 0.001 PaviaU 1% 2 90 32 4 3 0.001

3.4.1 对比实验

1) India Pines的实验结果如图 10所示。可以看出，在SVM分类结果中存在很多噪声点；EPFs相对于SVM有更好的去噪效果，但依然存在许多错误的区域；IID分类图不存在大块的错误区域，但是噪声相对明显；LCMR噪声不明显，但是有较少的错误区域；MFASR没有噪声, 但还是有少量的大块细节错误；而本文方法不仅能有效地去噪，而且能减少错误块，分类图效果最佳，精度达98.93%。

2) Salinas的实验结果如图 11所示。亦能看出本文算法能兼顾类边缘部分和类内区域分类效果，分类效果最好。

3) PaviaU的实验结果如图 12所示。优化效果亦如India Pines和Salinas的结果，同样本文算法效果最佳。

3.4.2 小比例样本对比实验

Table 3 Comparison of indicators in India Pines using 1% test samples

 /% 指标 SVM EPFs MFASR IID PCA-EPFs LCMR 本文 OA 56.61 63.37 72.06 82.39 83.57 84.43 90.48 AA 58.21 52.44 70.7 82.08 88.23 85.33 89.71 Kappa 49.46 56.81 67.89 81.66 81.41 82.25 89.14 注：加粗字体表示各项指标对应的最优结果。

Table 4 Comparison of indicators in Salinas using 0.3% test samples

 /% 指标 SVM EPFs MFASR IID PCA-EPFs LCMR 本文 OA 84.5 87.87 86.39 98.81 97.06 95.53 99.68 AA 90.06 93.12 90.85 98.84 98.16 96.22 99.44 Kappa 82.68 86.42 84.82 98.67 96.74 95.02 99.64 注：加粗字体表示各项指标对应的最优结果。

Table 5 Comparison of indicators for PaviaU using 0.4% test samples

 /% 指标 SVM EPFs MFASR IID PCA-EPFs LCMR 本文 OA 82.8 88.37 78.06 97.57 95.01 94.57 98.54 AA 79.95 88.89 76.44 97.73 91.68 91.64 98.6 Kappa 76.84 84.15 70.67 96.77 93.43 92.81 98.07 注：加粗数字表示各项指标对应的最优结果。

Table 6 Operation time of different training sample proportions

 India Pines Salinas PaviaU 10% 1% 1% 0.30% 1% 0.40% 运算时间/s 12.22 2.17 18.09 13.95 15.35 11.83

4 结论

1) 采用集成学习思想的分类框架。利用两个简单分类器，确定高置信区域和低置信区域。采用边缘滤波对低置信区域分类结果二次优化，依据邻域高置信度区域信息，为其确定分类标签，从而提高分类精度。

2) 利用类内再聚类策略获取有代表性、均衡性的训练样本。

3) 本征图分解预处理。去除光照与物质表面形状所形成的与物质本质无关的阴影、纹理等无用空间信息，使得用于分类的光谱特征反映物质的本质特征，有利于后续分类。

参考文献

• [1] Fang L Y, Li S T, Kang X D, et al. Spectral-spatial hyperspectral image classification via multiscale adaptive sparse representation[J]. IEEE Transactions on Geoscience and Remote Sensing, 2014, 52(12): 7738–7749. [DOI:10.1109/TGRS.2014.2318058]
• [2] Kang X D, Li S T, Fang L Y, et al. Extended random walker-based classification of hyperspectral images[J]. IEEE Transactions on Geoscience and Remote Sensing, 2015, 53(1): 144–153. [DOI:10.1109/TGRS.2014.2319373]
• [3] Chang C C, Lin C J. LIBSVM:a library for support vector machines[J]. ACM Transactions on Intelligent Systems and Technology, 2011, 2(3): #27. [DOI:10.1145/1961189.1961199]
• [4] Fauvel M, Benediktsson J A, Chanussot J, et al. Spectral and spatial classification of hyperspectral data using SVMs and morphological profiles[J]. IEEE Transactions on Geoscience and Remote Sensing, 2008, 46(11): 3804–3814. [DOI:10.1109/TGRS.2008.922034]
• [5] Fang L Y, He N J, Li S T, et al. Extinction Profiles Fusion for Hyperspectral Images Classification[J]. IEEE Transactions on Geoscience & Remote Sensing, 2018, 56(3): 1803–1815. [DOI:10.1109/TGRS.2017.2768479]
• [6] Ghamisi P, Souza R, Benediktsson J A, et al. Extinction profiles for the classification of remote sensing data[J]. IEEE Transactions on Geoscience and Remote Sensing, 2016, 54(10): 5631–5645. [DOI:10.1109/TGRS.2016.2561842]
• [7] Fang L Y, Wang C, Li S T, et al. Hyperspectral image classification via multiple-feature-based adaptive sparse representation[J]. IEEE Transactions on Instrumentation and Measurement, 2017, 66(7): 1646–1657. [DOI:10.1109/TIM.2017.2664480]
• [8] Fang L Y, Li S T, Duan W H, et al. Classification of hyperspectral images by exploiting spectral-spatial information of superpixel via multiple kernels[J]. IEEE Transactions on Geoscience and Remote Sensing, 2015, 53(12): 6663–6674. [DOI:10.1109/TGRS.2015.2445767]
• [9] Lu T, Li S T, Fang L Y, et al. Set-to-set distance-based spectral-spatial classification of hyperspectral images[J]. IEEE Transactions on Geoscience and Remote Sensing, 2016, 54(12): 7122–7134. [DOI:10.1109/TGRS.2016.2596260]
• [10] Li J, Bioucas-Dias J M, Plaza A. Hyperspectral image segmentation using a new Bayesian approach with active learning[J]. IEEE Transactions on Geoscience and Remote Sensing, 2011, 49(10): 3947–3960. [DOI:10.1109/TGRS.2011.2128330]
• [11] Lou X L, Huang W G, Zhou C B, et al. A method for fast resampling of remote sensing imagery[J]. Journal of Remote Sensing, 2002, 6(2): 96–101. [楼琇林, 黄韦艮, 周长宝, 等. 遥感图像数据重采样的一种快速算法[J]. 遥感学报, 2002, 6(2): 96–101. ] [DOI:10.11834/jrs.20020204]
• [12] Han H, Wang W Y, Mao B H. Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning[C]//Proceedings of 2005 International Conference on Advances in Intelligent Computing. Hefei, China: Springer, 2005: 878-887.[DOI: 10.1007/11538059_91]
• [13] Cai Z W, Fan Q F, Feris R S, et al. A unified multi-scale deep convolutional neural network for fast object detection[C]//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, The Netherlands: Springer, 2016: 354-370.[DOI: 10.1007/978-3-319-46493-0_22]
• [14] Sun B, Kang X D, Li S T, et al. Random-walker-based collaborative learning for hyperspectral image classification[J]. IEEE Transactions on Geoscience and Remote Sensing, 2017, 55(1): 212–222. [DOI:10.1109/TGRS.2016.2604290]
• [15] Qiao T, Ren J C, Wang Z, et al. Effective denoising and classification of hyperspectral images using curvelet transform and singular spectrum analysis[J]. IEEE Transactions on Geoscience and Remote Sensing, 2017, 55(1): 119–133. [DOI:10.1109/TGRS.2016.2598065]
• [16] Shen J B, Yang X S, Li X L, et al. Intrinsic image decomposition using optimization and user scribbles[J]. IEEE Transactions on Cybernetics, 2013, 43(2): 425–436. [DOI:10.1109/TSMCB.2012.2208744]
• [17] Kang X D, Li S T, Fang L Y, et al. Intrinsic image decomposition for feature extraction of hyperspectral images[J]. IEEE Transactions on Geoscience and Remote Sensing, 2015, 53(4): 2241–2253. [DOI:10.1109/TGRS.2014.2358615]
• [18] Bezdek J C, Ehrlich R, Full W. FCM:the fuzzy c-means clustering algorithm[J]. Computers & Geosciences, 1984, 10(2-3): 191–203. [DOI:10.1016/0098-3004(84)90020-7]
• [19] Li S T, Kang X D, Hu J W. Image fusion with guided filtering[J]. IEEE Transactions on Image Processing, 2013, 22(7): 2864–2875. [DOI:10.1109/TIP.2013.2244222]
• [20] Villa A, Benediktsson J A, Chanussot J, et al. Hyperspectral image classification with independent component discriminant analysis[J]. IEEE Transactions on Geoscience and Remote Sensing, 2011, 49(12): 4865–4876. [DOI:10.1109/TGRS.2011.2153861]
• [21] Fang L Y, He N J, Li S T, et al. A new spatial-spectral feature extraction method for hyperspectral images using local covariance matrix representation[J]. IEEE Transactions on Geoscience and Remote Sensing, 2018, 56(6): 3534–3546. [DOI:10.1109/TGRS.2018.2801387]
• [22] Kang X D, Xiang X L, Li S T, et al. PCA-Based edge-preserving features for hyperspectral image classification[J]. IEEE Transactions on Geoscience and Remote Sensing, 2017, 55(12): 7140–7151. [DOI:10.1109/TGRS.2017.2743102]