结合倒置特征金字塔和U-Net的高光谱图像分类

程嵩阳; 熊玉洁; 姚瑶; 李庆利

发布时间： 2021-08-18
摘要点击次数： 1899
全文下载次数： 877
DOI: 10.11834/jig.210194
2021 | Volume 26 | Number 8

结合倒置特征金字塔和U-Net的高光谱图像分类

程嵩阳¹, 熊玉洁^1,2, 姚瑶¹, 李庆利²(1.上海工程技术大学电子电气工程学院, 上海 201620;2.华东师范大学上海市多维度信息处理重点实验室, 上海 200241)

摘要

目的地物分类是对地观测研究领域的重要任务。高光谱图像具有丰富的地物光谱信息，可用于提升遥感图像地物分类的准确度。如何对高光谱图像进行有效的特征提取与表示是高光谱图像分类应用的关键问题。为此，本文提出了一种结合倒置特征金字塔和U-Net的高光谱图像分类方法。方法对高光谱数据进行主成分分析（principal component analysis，PCA）降维，获取作为网络输入的重构图像数据，然后使用U-Net逐层提取高光谱重构图像的空间特征。与此同时，利用倒置的特征金字塔网络抽取相应层级的语义特征；通过特征融合，得到既有丰富的空间信息又有较强烈的语义响应的特征表示。提出的网络利用注意力机制在跳跃连接过程中实现对背景区域的特征响应抑制，最终实现了较高的地物分类精度。结果分析了PCA降维方法和输入数据尺寸对分类性能的影响，并在Indian Pines、Pavia University、Salinas和Urban数据集上进行了对比实验。本文方法在4个数据集上分别取得了98.91%、99.85%、99.99%和87.43%的总体分类精度，与支持向量机（support vector machine，SVM）等相关算法相比，分类精度高出1%~15%。结论本文提出一种结合倒置特征金字塔和U-Net的高光谱图像分类方法，可以应用于有限训练样本下的高光谱图像分类任务，并在多个数据集上取得了较高的分类精度。实验结果表明倒置特征金字塔结构与U-Net结合的算法能够高效地实现高光谱图像的特征提取与表示，从而获得更精细的分类结果。

关键词

高光谱图像分类稀少样本倒置特征金字塔网络(IFPN) U-Net 特征融合

Hyperspectral image classification using an inverted feature pyramid network with U-Net

Cheng Songyang¹, Xiong Yujie^1,2, Yao Yao¹, Li Qingli²(1.School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai 201620, China;2.Shanghai Key Laboratory of Multidimensional Information Processing, East China Normal University, Shanghai 200241, China)

Abstract

Objective Terrain classification is an important research task in the field of earth observation using remote sensing technology. The hyperspectral image has rich spectral information; thus, it can be applied to the classification of remote sensing image. With the rapid development of the hyperspectral technology, the hyperspectral remote sensing image processing and analyzing technology has attracted wide attention of academia. The hyperspectral images have dozens or even hundreds of continuous narrow spectral bands compared with the traditional panchromatic band and multi-spectral remote sensing image, which provides detailed spectral and spatial feature information. Accordingly, these images have been widely used in various aspects, such as precision agriculture, city planning, and military defense. Hyperspectral images have high dimensional data, and redundancy and noise exist; thus, transformed data must be utilized for image processing. In the application of hyperspectral image classification, the manner by which to effectively represent the features of hyperspectral image is the most critical step in current studies. In this work, we propose an approach for hyperspectral image classification by using an inverted feature pyramid network and U-Net. Method The dimension of the hyperspectral remote sensing image data is high. Principal component analysis (PCA) method plays a significant role in transforming useful information in the images to the most important k characteristic, thus reducing the amount of data and enhancing the data features. After PCA, the data are segmented and collected by means of sliding window. The surrounding area of each pixel is defined as a patch, which is regarded as the input of the proposed network. The category of the pixel is the ground truth label. In the first stage, U-Net is used to extract spatial features of hyperspectral image at the pixel level. The left side of the network is the contraction path, which corresponds to the encoder part of the classic encoder-decoder. The right side of the network is the extension path, which can be regarded as a decoder. The feature maps in the extension path are the result of combining two parts of a feature map along two dimensions, making the acquired features more visible. In the first part, the feature maps from the same layer of contraction path and the feature maps from the upper layer of extension path are simultaneously fed to the attention mechanism. The feature region of this part has a higher weight value. The second part is obtained by deconvolution of the feature graph from the upper layer of the extension path. In a layered way, these feature maps with rich spatial information are fused with feature maps containing rich semantic information obtained by inverted feature pyramid network layers. Therefore, the obtained feature maps have reliable spatial and strong semantic information. Finally, the weight value of the effective features in the image is increased, and the region of irrelevant background is suppressed owing to the attention mechanism. Thus, the classification result of hyperspectral image is acquired. Result We conduct experiments to evaluate the effectiveness of the proposed method and attempt to investigate the influence of PCA retained principal component number and the size of input data for the performance of classification. We conduct contrast experiments on four publicly available hyperspectral image datasets to demonstrate the performance of the proposed method:Indian Pines, Pavia University, Salinas, and Urban. Experimental results show that the proposed method for hyperspectral image classification is effective, and the best PCA retained principal component numbers are 3, 20, 10, and 3. Meanwhile, the best input sizes of the proposed model are 64, 32, 32, and 64. We obtain 98.91%, 99.85%, 99.99%, and 87.43% overall classification accuracy rates, 98.07%, 99.39%, 99.09%, and 78.30% average classification accuracy rates, and 0.987, 0.998, 0.999, and 0.831 Kappa values for the four hyperspectral image datasets, respectively, which are higher than those of the other classification algorithms. Conclusion Hyperspectral images are capable of accurately presenting the rich terrain information contained in the specific region with the help of hundreds of continuous and subdivided spectral bands; however, useless information exists in each spectral band. The mechanism by which to effectively extract the key terrain information from the data of hyperspectral images and utilize them for classification is the most important and difficult problem. We propose to combine U-Net and the inverted pyramid network for hyperspectral image classification. First, we reduce the dimension of hyperspectral image data with the help of PCA method. We adopt the method of sliding window to build patches after the data dimension is reduced. These patches are fed into the model. U-Net is regarded as the backbone of the proposed network, and it aims to extract the characteristics of a hyperspectral image. Then, the rich characteristics of the spatial information are fused with the features from the inverted pyramid network. Subsequently, the abundant spectral and spatial information is obtained. The utilization of attention mechanism allows the model to effectively focus on spectral and spatial information and reduce the influence of signal-to-noise to classification performance. Experimental results show that the proposed method can be applied to hyperspectral image classification tasks with limited training samples and achieve good classification results. The classification accuracy of a hyperspectral image can also be improved by properly handling the input data. In our future work, we will attempt to investigate the manner by which to make the model's structure less complex while maintaining high hyperspectral image classification performance with less training data samples.

Keywords

hyperspectral image classification small samples inverted feature pyramid network(IFPN) U-Net feature fusion

在线采编平台

在线出版

年度会议

下载中心

年度信息