方帅,祝凤娟,董张玉,张晶(合肥工业大学计算机与信息学院，合肥 230000;工业安全与应急技术安徽省重点实验室，合肥 230000)
目的 高光谱分类问题中，由于类内光谱特性存在差异性，导致常规的随机样本选择策略，无法保证训练样本均匀覆盖样本空间。针对这一问题，提出基于类内再聚类的样本空间优化策略。同时为了进一步提高分类精度，针对低置信度分类结果，提出基于邻域高置信信息的修正策略。方法 采用FCM（Fuzzy C-means）聚类算法对每类样本进行类内再聚类，在所聚的每个子类内选择适当样本。利用两个简单分类器SVM（Support Vector Machine）和SRC（Sparse Representation-based Classifier），对分类结果进行一致性检测，确定高置信区域和低置信区域，对低置信区域，利用主成分图作为引导图对置信度图进行滤波，使得高置信信息向低置信区域传播，从而修正低置信区域分类结果。针对以上策略，可以保证即便在较少的训练样本情况下，也能够训练出较高的分类器，大幅度提高分类精度。结果 实验中，本文使用三组实验数据，根据样本比例设置两组实验与经典的以及最新分类算法进行对比。实验结果表明，本文的算法均取得很大改进，尤其在样本比例小的实验中效果显著。在小比例（一般样本选取比例的十分之一）训练样本实验中，对于India Pines数据集，OA（Overall Accuracy）高达90.48%； Salinas数据集上能达到99.68%；同样，PaviaU数据集的OA值为98.54%。三组数据OA比同比例其他算法高出4%-6%。结论 综上结果表明，本文算法通过样本空间优化策略选取有代表性、均衡性的样本，保证小比例样本下分类精度依然显著；基于邻域高置信信息的修正策略起道很好的优化效果。同时，本文算法适应多种数据集，具有很好的鲁棒性。
Sample optimized selection of hyperspectral image classification
Fang Shuai,Zhu Fengjuan,Dong Zhangyu,Zhang Jing(School;of Computer Science and Information Engineering, Hefei University of Technology, Hefei，230000;School;of Computer Science and Information Engineering, Hefei University of Technology, Hefei，230000;Anhui Province Key Laboratory of Industry Safety and Emergency Technology, Hefei, 230000)
objective In recent years, more and more applications have been made in remote sensing images. Hyperspectral image classification is also a widely used method for hyperspectral image processing. For the traditional high spectral classification problem, the mainstream improvement methods are compared to optimize the classifier, or optimize the classification algorithm. This approach will not cure the symptoms, so we propose new improvements on the basis of this problem. In other words, while improving the classifier, the sample space is optimized to obtain a group of representative training samples to ensure the overall classification accuracy. The traditional classification method only considers the improvement of classification algorithm and classifier, and adopts various random selection methods for sample acquisition. It is found that the spectral properties of even the same kind of substances are different. Due to the differences in the in-class spectral characteristics, the conventional random sample selection strategy for random sample selection in a certain proportion of various substances cannot guarantee that the selected training samples can contain complete spectral features. To solve this problem, a sample space optimization strategy based on in-class reclustering is proposed. Ensure that the selected training samples contain various spectral curves of each class, and the selected samples are uniformly distributed in each subclass of each class. At the same time, in order to further improve the classification accuracy, the improvement of classifier is also crucial. According to the idea of integrated learning, it is generally believed that the accuracy of point classification with the same classification results of multiple classifiers is higher, while the error rate with deviation is higher. Therefore, for the low confidence classification result with low classification accuracy, the high confidence region with high accuracy was optimized again. In this paper, the optimization method is a correction strategy based on neighborhood high confidence information, which optimizes the classification results of low confidence region by obtaining the classification results of high confidence region, so as to improve the accuracy of the results of low confidence region and improve the overall classification accuracy. Because the classification strategy used in this paper is point classification, domain information is not considered. In fact, the category information at a certain point is the same as the category information in the same region that is affected by the neighborhood information. Therefore, we use the edge protection filter to smooth various types of information based on edge protection, so as to ensure the similarity between the information of a certain point and the field information and improve the classification accuracy again.Method In this paper, FCM (Fuzzy c-means) clustering algorithm is used to implement class - to - class clustering for each class of samples. Because the spectral characteristics of the same kind of samples are different, the same kind of samples are grouped into several subclasses according to the difference of spectral characteristics. When selecting samples, we ensure that samples are selected from each subclass of each class. This ensures that the sample covers the entire sample space.For the correction strategy of neighborhood high confidence information, this paper USES edge protection filter to optimize low confidence information by using high confidence region information. Firstly, two simple classifiers, SVM (Support Vector Machine) Classifier and SRC (Sparse Representation based Classifier) Classifier, are used to test the consistency of the classification results of the two classifiers. The point set with consistent classification results is high confidence region, and the point set with inconsistent results is low confidence region. Then the results of low confidence region are optimized by using edge protection filter. First, the hyperspectral images were processed by PCA (Principal Component Analysis) to obtain the first Principal Component.Since the first principal component contains most of the structure information of the image, the first principal component is used as the guide diagram. Then, the high confidence region is filtered, and a small number of low confidence region point sets are propagated by high confidence information, so that the low confidence region can get a new category information to replace the original low confidence category information, so as to modify the low confidence region classification results. The edge - protecting filter has the feature of edge - preserving smoothness. In view of the above strategies, our classification effect has been greatly improved. At the same time, even when we select a small proportion of training samples, the samples after optimizing sample space can be trained with a high classifier to ensure the stability of classification accuracy.Result In the experiment, three sets of experimental data were used in this paper, namely India Pines data set, Salinas data set and PaviaU data set. Then, two sets of experiments were set up according to different sample selection proportion, which were compared with the classic and the latest classification algorithm. In the first experiment, we selected 10%, 1% and 1% training samples respectively. As for the experimental results, the OA (Overall Accuracy) values of the three data sets can reach 98.93%, 99.78% and 99.40% respectively, which is 1% higher than other optimal algorithms. In the second small scale sample experiment, we set the sample proportion to 1%, 0.3% and 0.4%. For the data set of India Pines, OA was as high as 90.48%.The Salinas data set can reach 99.68%.Similarly, the OA value of the PaviaU dataset was 98.54%.The OA of the three groups of data was 4-6% higher than other algorithms of the same proportion. The experimental results show that the algorithm in this paper is superior to other algorithms, and the classification accuracy is greatly improved, especially in experiments with small sample proportion.Conclusion In this paper, a representative and balanced sample is selected through the sample space optimization strategy to ensure that the classification accuracy is still significant in small scale samples. The correction strategy based on neighborhood high confidence information has a good optimization effect. At the same time, this algorithm ADAPTS to many data sets and has good robustness. In summary, the results show that the traditional classification algorithm, with the reduction of sample proportion, has a rapid decline in the classification effect, and the classification effect is not stable. The algorithm in this paper has obvious advantages, which not only ensures the high accuracy of the classification results, but also ensures the stability of the classification results.