可解译深度网络的多光谱遥感图像融合

余典; 李坤; 张玮; 李对对; 田昕; 江昊

发布时间： 2023-01-14
摘要点击次数： 10071
全文下载次数： 782
DOI: 10.11834/jig.220575
2023 | Volume 28 | Number 1

可解译深度网络的多光谱遥感图像融合

余典¹, 李坤¹, 张玮¹, 李对对², 田昕¹, 江昊¹(1.武汉大学电子信息学院, 武汉 430072;2.中国资源卫星应用中心, 北京 100094)

摘要

目的多光谱图像融合是遥感领域中的重要研究问题，变分模型方法和深度学习方法是目前的研究热点，但变分模型方法通常采用线性先验构建融合模型，难以描述自然场景复杂非线性关系，导致成像模型准确性较低，同时存在手动调参的难题；而主流深度学习方法将融合过程当做一个黑盒，忽视了真实物理成像机理，因此，现有融合方法的性能依然有待提升。为了解决上述问题，提出了一种基于可解译深度网络的多光谱图像融合方法。方法首先构建深度学习先验描述融合图像与全色图像之间的关系，基于多光谱图像是融合图像下采样结果这一认知构建数据保真项，结合深度学习先验和数据保真项建立一种新的多光谱图像融合模型，提升融合模型准确性。采用近端梯度下降法对融合模型进行求解，进一步将求解步骤映射为具有明确物理成像机理的可解译深度网络架构。结果分别在Gaofen-2和GeoEye-1遥感卫星仿真数据集，以及QuickBird遥感卫星真实数据集上进行了主客观对比实验。相对于经典方法，本文方法的主观视觉效果有了显著提升。在Gaofen-2和GeoEye-1遥感卫星仿真数据集，相对于性能第2的方法，本文方法的客观评价指标全局相对无量纲误差(relative dimensionless global error in synthesis，ERGAS)有效减小了7.58%和4.61%。结论本文提出的可解译深度网络，综合了变分模型方法和深度学习方法的优点，在有效保持光谱信息的同时较好地增强融合图像空间细节信息。

关键词

遥感(RS) 多光谱图像(MSI) 图像融合深度学习(DL) 可解译网络近端梯度下降法(PGD)

Deep network-interpreted multispectral image fusion in remote sensing

Yu Dian¹, Li Kun¹, Zhang Wei¹, Li Duidui², Tian Xin¹, Jiang Hao¹(1.Electronic Information School, Wuhan University, Wuhan 430072, China;2.China Centre for Resources Satellite Data and Application, Beijing 100094, China)

Abstract

Objective Multispectral image fusion is one of the key tasks in the field of remote sensing (RS). Recent variational model-based and deep learning-based techniques have been developing intensively. However, traditional variational model-based approaches are employed based on linear prior, which is challenged to demonstrate the complicated nonlinear relationship for natural scenarios. Thus, the fusion model is restricted to optimal parameter selection and accurate model design. To resolve these problems, our research is focused on developing a deep network-interpreted for multispectral image and panchromatic image fusion. Method First, we explore a deep prior to describe the relationship between the fusion image and the panchromatic image. Furthermore, a data fidelity term is constructed based on the assumption that the multispectral image is considered to be the down-sampled version of the fusion result. A new fusion model is proposed by integrating the deep prior and the data fidelity term mentioned above. To obtain an accurate fusion result, we first resolve the proposed fusion model by the proximal gradient descent method, which introduces intermediate variables to convert the original optimization problem into several iterative steps. Then, we simplify the iteration function by assuming that the residual for each iteration follows Gaussian distribution. After next, we unroll the above optimization steps into a deep learning network that contains several sub-modules. Therefore, the optimization process of network parameters is driven for a clear physical-based deep fusion network-interpreted via the training data and the proposed physical fusion model both. Moreover, the handcrafted hyper-parameters in the fusion model are also tuned from specific training data, which can resolve the problem of the design of manual parameters in the traditional variational model methods effectively. Specifically, to build an interpretable end-to-end fusion network, we implement the optimization steps in each iteration with different network modules. Furthermore, to deal with the challenging issues of the diversity of sensor spectrum character between different satellites, we use two consecutive 3×3 convolution layers separated with a ReLU nonlinear active layer to represent the optical spectrum transform matrix. For upgrading the intermediate variable-introduced, it is regarded as a denoising problem in related to SwinResUnet. Thanks to the capabilities of extraction of local features and attention of global information, the SwinResUnet incorporates convolutional neural network (CNN) and Swin-Transformer layers into its network architecture. And, a U-Net is adopted as the backbone of SwinResUnet in the deep denoiser, which contains three groups of encoders and decoders with different feature scales. In addition, short connections are established in each group of encoder and decoder for enhancing feature transmission and avoiding gradient explosion. Finally, the L₁ norm for reference image and fusion image is used as the cost function. Result The experiments are composed of 3 aspects: 1) simulation experiment, 2) real experiment, and 3) ablation analysis. The Wald's protocol-based simulation experiment fuses images via down-sampled multispectral image (MSI) and panchromatic image (PAN). The real experiment is conducted by fusing original MSI and PAN. The comparison methods include: a) polynomial interpolation, b) gram-schmidt adaptive (GSA) and c) partial replacement-based adaptive component substitution (PRACS) (component substitution methods), d) Indusion and e) additive wavelet luminance proportional (AWLP) (multi-resolution analysis methods), f) simultaneously registration and fusion (SIRF) and g) local gradient constraints (LGC) (variational model optimization methods), h) pansharpening by using a convolutional neural network (PNN), i) deep network architecture for pansharpening (PanNet) and j) interpretable deep network for variational pansharpening (VPNet) (deep learning methods). We demonstrate the superiority of our method in terms of visual effect and quantitative analysis on the simulated Gaofen-2, GeoEye-1 satellite datasets, and the real QuickBird satellite dataset. The quantitative evaluation metrics mainly include: 1) relative dimensionless global error in synthesis (ERGAS), 2) spectral angle mapping, 3) global score Q²ⁿ, 4) structural similarity index, 5) root mean square error, 6) relative average spectral error, 7) universal image quality index, and 8) peak signal-to-noise ratio. As there is no reference image for real experiment, we employ some non-reference metrics like quality with no reference (QNR), D_s and D_λ. Visual comparison: the visual effect of the proposed method has a larger improvement over other state-of-the-art methods. Quantitative evaluation: compared with the second-best method, ERGAS can be efficiently reduced by 7.58% and 4.61% on the simulated Gaofen-2 and GeoEye-1 satellite datasets, respectively. Conclusion Our interpretable deep network combines the advantages of variational model-based and deep learning-based approaches, thus achieving a good balance between spatial and spectral qualities.

Keywords

remote sensing(RS) multispectral image(MSI) image fusion deep learning(DL) interpretable network proximal gradient descent(PGD)

在线采编平台

在线出版

年度会议

下载中心

年度信息