跨数据集评估的高光谱图像分类

潘尔婷; 马泳; 黄珺; 樊凡; 李皞; 马佳义

发布时间： 2021-08-18
摘要点击次数： 3251
全文下载次数： 1433
DOI: 10.11834/jig.210123
2021 | Volume 26 | Number 8

跨数据集评估的高光谱图像分类

潘尔婷¹, 马泳^1,2, 黄珺^1,2, 樊凡^1,2, 李皞³, 马佳义^1,2(1.武汉大学电子信息学院, 武汉 430072;2.武汉大学宇航科学与技术研究院, 武汉 430079;3.武汉轻工大学数学与计算机科学学院, 武汉 430023)

摘要

目的随着高光谱成像技术的飞速发展，高光谱数据的应用越来越广泛，各场景高光谱图像的应用对高精度详细标注的需求也越来越旺盛。现有高光谱分类模型的发展大多集中于有监督学习，大多数方法都在单个高光谱数据立方中进行训练和评估。由于不同高光谱数据采集场景不同且地物类别不一致，已训练好的模型并不能直接迁移至新的数据集得到可靠标注，这也限制了高光谱图像分类模型的进一步发展。本文提出跨数据集对高光谱分类模型进行训练和评估的模式。方法受零样本学习的启发，本文引入高光谱类别标签的语义信息，拟通过将不同数据集的原始数据及标签信息分别映射至同一特征空间以建立已知类别和未知类别的关联，再通过将训练数据集的两部分特征映射至统一的嵌入空间学习高光谱图像视觉特征和类别标签语义特征的对应关系，即可将该对应关系应用于测试数据集进行标签推理。结果实验在一对同传感器采集的数据集上完成，比较分析了语义—视觉特征映射和视觉—语义特征映射方向，对比了5种基于零样本学习的特征映射方法，在高光谱图像分类任务中实现了对分类模型在不同数据集上的训练和评估。结论实验结果表明，本文提出的基于零样本学习的高光谱分类模型可以实现跨数据集对分类模型进行训练和评估，在高光谱图像分类任务中具有一定的发展潜力。

关键词

高光谱图像分类深度学习特征提取零样本学习高光谱语义特征

Hyperspectral image classification evaluated across different datasets

Pan Erting¹, Ma Yong^1,2, Huang Jun^1,2, Fan Fan^1,2, Li Hao³, Ma Jiayi^1,2(1.Electronic Information School, Wuhan University, Wuhan 430072, China;2.Institute of Aerospace Science and Technology, Wuhan University, Wuhan 430079, China;3.College of Mathematics and Computer Science, Wuhan Polytechnic University, Wuhan 430023, China)

Abstract

Objective Hyperspectral sensors are evolving towards miniaturization and portability with the rapid development of hyperspectral imaging technology. The acquisition of hyperspectral data has become easier and less expensive as a result of this breakthrough. The broad applications of hyperspectral images in various scenes have arisen an increasing demand for high-precision and detailed annotations. The growth of existing hyperspectral classification models mainly focuses on supervised learning, and many of them have reached an almost perfect performance. However, nearly all of these models are trained and evaluated in a single hyperspectral data cube. On this condition, the trained classification model cannot be directly transferred to a new hyperspectral dataset to obtain reliable annotations. The main reason is that different hyperspectral datasets are collected in irrelevant scenes and have covered inconsistent object categories. Accordingly, existing hyperspectral classification models have poor generalization capacity. The further development of the hyperspectral image classification model is also constrained. Consequently, a hyperspectral classification model with a generalization capability and ability to adapt to new and unseen classes across the different datasets must be developed. In this study, we propose a specific unique paradigm, which is to train and evaluate a hyperspectral classification model across different datasets. As previously mentioned, some new and unseen categories might be encountered when evaluating the classification model on another hyperspectral dataset. We introduce zero-shot learning into hyperspectral image classification to address this problem. Method Zero-shot learning can distinguish data from unseen categories, except for identifying data categories that have appeared in the training set. Zero-shot learning is based on the principle of allowing the model learn to understand the semantics of categories at first. Specifically, this mechanism employs auxiliary knowledge (such as attributes) to embed category labels into the semantic space and uses the data in the training set to learn the mapping relationship from images to semantics. On the basis of zero-shot learning, we introduce hyperspectral category label semantics as side information to address the unseen categories in the classification across different datasets. The overall workflow of our model can be divided into three steps. The first step is feature extraction, including hyperspectral image visual feature extraction and label semantics extraction. Hyperspectral image feature extraction aims to obtain high-level hyperspectral features with great capability to distinguish, and we refer to them as visual features. Most existing hyperspectral classification models are designed to extract robust hyperspectral features and have fine classification performance. Hence, we directly fine-tune current models to embed hyperspectral images into visual feature space. In the label semantic extraction, word2vec can map each word to a vector, representing the relationship between words. We employ word2vec model trained on a large external corpus to obtain hyperspectral label vectors. This model embeds a hyperspectral category label into a label semantic space. The second step is feature mapping. According to the choice of embedding space, this mechanism can be divided into visual to semantic feature mapping and semantic to visual feature mapping. Feature mapping is used to map these two features into the same feature space, and the model learns and optimizes the mapping on the basis of the correspondence between the hyperspectral data and annotations in the training set. The learned mapping establishes the correspondence between the hyperspectral visual features and the label semantic features. The third step is to employ this learned mapping to perform label reasoning on the testing set. Specifically, the mapping is applied to the hyperspectral visual features and label semantic features of the testing data, and the corresponding labels for the test set data are inferred by measuring the similarity of the two features. Result We selected a pair of datasets collected by the same type of hyperspectral sensor for comparative experiments, namely, the Salinas and Indian Pines datasets. This step is conducted to avoid the issue of differences in the physical representation of the object spectra resulting from hyperspectral data collection by different hyperspectral sensors. The workflow of our method is divided into three steps; thus, we adopt different models in each step for comparative experiments. We compared two visual feature extraction models, different feature mapping models, including visual to semantic feature mapping models, and semantic to visual feature mapping models. The quantitative evaluation metric is top-k accuracy. We listed the results of top-1 accuracy and top-3 accuracy. Experimental results show that the method that employs spatial-spectral united network (SSUN)as the visual feature extraction model and relation network (RN) as the zero-shot learning model reaches the best performance. Comparative experiments in different visual feature extraction models demonstrate that SSUN can obtain more distinguishable features because it considered the features in the spatial and spectral domains. When compared with the results from all models related to feature mapping, the semantic to visual feature mapping models outperform the other approaches. This result indicates that choosing visual feature space as the embedding space is a preferable alternative. We also analyze the reason for the current unsatisfied classification performances in detail. Conclusion In this study, we propose a specific pattern to train and evaluate the classification model across different datasets to improve the poor migration ability of the existing classification model. We introduce the semantic description of category labels in hyperspectral classification to cope with new and unseen categories in a new hyperspectral dataset and establish the association between seen and new datasets. The experimental results show that the feature extraction model SSUN improves the performance, and the semantic to visual feature mapping model RN outperforms several approaches. In conclusion, the proposed hyperspectral classification model based on zero-shot learning can be trained and evaluated across different datasets. The experimental results indicate that the proposed model has certain development potential in hyperspectral image classification tasks.

Keywords

hyperspectral image classification deep learning feature extraction zero-shot learning semantic features of hyperspectral data

在线采编平台

在线出版

年度会议

下载中心

年度信息