方帅,祝凤娟,董张玉,张晶(合肥工业大学计算机与信息学院, 合肥 230000;工业安全与应急技术安徽省重点实验室, 合肥 230000)
目的 高光谱分类问题中，由于类内光谱特性存在差异性，导致常规的随机样本选择策略无法保证训练样本均匀覆盖样本空间。针对这一问题，提出基于类内再聚类的样本空间优化策略。同时为了进一步提高分类精度，针对低置信度分类结果，提出基于邻域高置信信息的修正策略。方法 采用FCM（fuzzy C-means）聚类算法对每类样本进行类内再聚类，在所聚的每个子类内选择适当样本。利用两个简单分类器SVM（support vector machine）和SRC（sparse representation-based classifier），对分类结果进行一致性检测，确定高、低置信区域，对低置信区域，利用主成分图作为引导图对置信度图进行滤波，使得高置信信息向低置信区域传播，从而修正低置信区域分类结果。以上策略可以保证即便在较少的训练样本的情况下，也能够训练出较高的分类器，大幅度提高分类精度。结果 使用3组实验数据，根据样本比例设置两组实验与经典以及最新分类算法进行对比。实验结果表明，本文算法均取得很大改进，尤其在样本比例较小的实验中效果显著。在小比例（一般样本选取比例的十分之一）训练样本实验中，对于India Pines数据集，OA（overall accuracy）值高达90.48%；在Salinas数据集上能达到99.68%；同样，PaviaU数据集的OA值为98.54%。3组数据集的OA值均比其他算法高出4% 6%。结论 综上表明，本文算法通过样本空间优化策略选取有代表性、均衡性的样本，保证小比例样本下分类精度依然显著；基于邻域高置信信息的修正策略起到很好的优化效果。同时，本文算法适应多种数据集，具有很好的鲁棒性。
Sample optimized selection of hyperspectral image classification
Fang Shuai,Zhu Fengjuan,Dong Zhangyu,Zhang Jing(School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230000, China;Anhui Province Key Laboratory of Industry Safety and Emergency Technology, Hefei 230000, China)
Objective In recent years, an increasing number of applications have been realized for remote sensing images. Hyperspectral image classification is a widely used method for hyperspectral image processing. For the traditional high-spectral classification problem, the mainstream improvement methods were compared to optimize the classifier or the classification algorithm. This approach does not address the existing limitations; thus, we proposed new improvements on the basis of this problem. In other words, while improving the classifier, the sample space was optimized to obtain a group of representative training samples to ensure the overall classification accuracy. The traditional classification method considers only the improvement of the classification algorithm and the classifier and adopts various random selection methods for sample acquisition. The spectral properties of even the same kind of substances are different. In view of the differences in the in-class spectral characteristics, the conventional random sample selection strategy for random sample selection in a certain proportion of various substances cannot guarantee that the selected training samples can contain complete spectral features. To solve this problem, a sample space optimization strategy based on in-class reclustering was proposed. In this way, the selected training samples can be guaranteed to contain various spectral curves of each class, and the selected samples are uniformly distributed in each subclass of each class. Moreover, to further improve the classification accuracy, the classifier must also be improved. According to integrated learning, the accuracy of point classification with the same classification results for multiple classifiers is higher, whereas the error rate with deviations is higher. Therefore, for the low-confidence classification result with low classification accuracy, the high-confidence region with high accuracy was optimized again. In this paper, the optimization method was a correction strategy based on neighborhood high-confidence information, which optimized the classification results of the low-confidence region by obtaining the classification results of the high-confidence region to improve the accuracy of the results of low-confidence region and the overall classification accuracy. Given that the classification strategy used in this paper is point classification, the domain information was not considered. In fact, the category information at a certain point was the same as the category information in the same region affected by the neighborhood information. Therefore, we used the edge protection filter to smooth various types of information based on edge protection to ensure the similarity between the information of a certain point and the field information and to further improve the classification accuracy. Method In this paper, fuzzy C-means clustering algorithm was used to implement class-to-class clustering on each class of samples. As the spectral characteristics of the same kind of samples are different, the same kind of samples were grouped into several subclasses according to the difference in spectral characteristics. When selecting samples, we ensured that samples were selected from each subclass of each class to ensure that the sample covered the entire sample space. For the correction strategy of the neighborhood high-confidence information, this paper used edge protection filter to optimize the low-confidence information by using the high-confidence region information. First, two simple classifiers, namely, support vector machine (SVM) classifier and sparse representation-based classifier (SRC), were used to test the consistency of the classification results of the two classifiers. The point set with consistent classification results was the high-confidence region, and the point set with inconsistent results was the low-confidence region. Then, the results of the low-confidence region were optimized by using the edge protection filter. First, the hyperspectral images were processed by principal component analysis to obtain the first principal component. Given that the first principal component contained most of the structure information of the image, the first principal component was used as the guide diagram. Then, the high-confidence region was filtered, and a small number of the low-confidence region point sets were propagated by the high-confidence information. In this way, the low-confidence region can obtain a new category information to replace the original low-confidence category information to modify the low-confidence region classification results. The edge-protecting filter has the feature of edge-preserving smoothness. The aforementioned strategies demonstrated that our classification effect was greatly improved. In addition, even when a small proportion of training samples was selected, the samples can be trained with a high classifier after the sample space was optimized to ensure the stability of the classification accuracy. Result The experiments used three sets of experimental data, namely, the India Pines, Salinas, and PaviaU datasets. Then, two sets of experiments were set up according to the different sample selection proportion to compare the classic and the proposed classification algorithm. In the first experiment, we selected 10%, 1%, and 1% training samples. The experimental results revealed that the overall accuracy (OA) values of the three datasets reached up to 98.93%, 99.78%, and 99.40% respectively, which was 1% higher than that of other optimal algorithms. In the second small-scale sample experiment, we set the sample proportions to 1%, 0.3%, and 0.4%. The OA values for the India Pines, Salinas, and PaviaU datasets reached up to 90.48%, 99.68%, and 98.54%.The OA values of the three groups of data were 4%-6% higher than that of other algorithms for the same proportions. The experimental results suggested that the proposed algorithm is superior to other algorithms, and that the classification accuracy was greatly improved, particularly in experiments with small sample proportions. Conclusion In this paper, a representative and balanced sample was selected through sample space optimization strategy to ensure that the classification accuracy remained significant for small-scale samples. The correction strategy based on neighborhood high-confidence information offered a good optimization effect. Moreover, this algorithm adapted to many datasets and achieved good robustness. In summary, the results showed that the reduction of sample proportion resulted in the rapid decline and instability of the classification effect of the traditional classification algorithm. The proposed algorithm offers obvious advantages, which ensures not only the high accuracy but also the stability of the classification results.