发布时间: 2023-01-16
摘要点击次数:
全文下载次数:
DOI: 10.11834/jig.220538
2023 | Volume 28 | Number 1

遥感图像处理

通道融合的渐进增强遥感图像全色锐化算法

贾雅男, 郭晓杰

天津大学智能与计算学部, 天津 300350

收稿日期: 2022-06-02; 修回日期: 2022-10-10; 预印本日期: 2022-10-17

基金项目: 国家自然科学基金面上项目(62072327)

作者简介: 贾雅男，男，硕士研究生，主要研究方向为遥感图像融合。E-mail: jyn@tju.edu.cn
郭晓杰，通信作者，男，副教授，主要研究方向为计算机视觉、模式识别、多媒体。E-mail：xj.max.guo@gmail.com
*通信作者: 郭晓杰 xj.max.guo@gmail.com

中图法分类号: TP399

文献标识码: A

文章编号: 1006-8961(2023)01-0305-12

摘要

目的遥感图像融合的目的是将低空间分辨率的多光谱图像和对应的高空间分辨率全色图像融合为高空间分辨率的多光谱图像。为了解决上采样多光谱图像带来的图像质量下降和空间细节不连续问题，本文提出了渐进式增强策略，同时为了更好地融合两种图像互补的信息，提出在通道维度上进行融合的策略。方法构建了一种端到端的网络，网络分为两个阶段：渐进尺度细节增强阶段和通道融合阶段。考虑到上采样低空间分辨率多光谱图像导致的细节模糊问题，在第1阶段将不同尺度的全色图像作为额外的信息，通过两个细节增强模块逐步增强多光谱图像；在第2阶段，全色图像在多光谱图像的每个通道上都通过结构保持模块进行融合，更好地利用两种图像的互补信息，获得高空间分辨率的多光谱图像。结果实验在GaoFen-2和QuickBird数据集上与表现优异的8种方法进行了比较，本文算法在有参考指标峰值信噪比(peak signal-to-noise ratio，PSNR)、结构相似度(structural similarity，SSIM)、相关系数(correlation coefficient，CC)和总体相对误差(erreur relative globale adimensionnelle de synthese，ERGAS)等评价指标上均取得最优值。在GaoFen-2数据集上PSNR、CC和ERGAS分别平均提高了0.872 dB、0.01和0.109，在QuickBird数据集上分别平均提高了0.755 dB、0.011和0.099。结论本文算法在空间分辨率和光谱保持方面都取得了良好的效果，生成了质量更高的融合结果。

关键词

全色锐化; 渐进增强; 通道融合; 深度学习; 多光谱图像

Remote sensing pan-sharpening based on channel fusion and progressive enhancement

Jia Yanan, Guo Xiaojie

College of Intelligence and Computing, Tianjin University, Tianjin 300350, China

Supported by: National Natural Science Foundation of China (62072327)

Abstract

Objective Remote sensing (RS) image fusion issue is focused on developing high-resolution multispectral (HRMS) images through the integration of low-resolution multispectral (LRMS) images and corresponding panchromatic (PAN) high spatial resolution images. Pan-sharpening has been widely developing as a pre-processing tool in the context of multiple vision applications like object detection, environmental surveillance, landscape monitoring, and scenario segmentation. The key issue for pan-sharpening is concerned of different and specific information gathering from multi-source images. The pan-sharpening methods can be divided into multiple methods in related to 1) component substitution (CS), 2) multi-resolution analysis (MRA), 3) model-based, and 4) deep-learning-based. The CS-based easy-use method is challenged for the severe spectral-distorted problem of multi-features-derived between PAN and LRMS images. The multi-resolution analysis (MRA) methods can be used to extract the spatial features from PAN images by multi-scale transformation. These features of high resolution are melted into the up-sampled LRMS images. Although spatial details can be preserved well by these methods, spectral information is likely to be corrupted by the features-melted. For model-based methods, an optimized algorithm is complicated and time-consuming for the model. The deep-learning-based method is qualified but two challenging problems to be resolved: 1) multispectral images up-sampling-derived image quality degradation; 2) multichannel variability-lacked insufficient integration. To alleviate the problems mentioned above, we implement a channel-fused strategy to mine two modalities of information better. Additionally, to resolve the image quality degradation caused by up-sampling multispectral images, a detailed progressive-enhanced module is proposed as well. Method Most of deep learning-based methods are linked to up-sample the multispectral image straightforward to maintain the same size as the panchromatic image, which degrades the image quality and lose some of the spatial details. To obtain enhanced results gradually, we carry out an implementation for progressive scale detail enhancement via the information of multi-scale panchromatic images. A channel fusion strategy is used to fuse two images in terms of an enhanced multispectral image and the corresponding panchromatic image. The effective and efficient information of the two modalities can be captured for the HRMS-predicted. The process of channel fusion can be summarized in three steps, which are 1) decomposition, 2) fusion, and 3) reconstruction. Specifically, each channel of the enhanced multispectral image is concatenated with the panchromatic image in the decomposition, and a shallow feature is obtained by two 3×3 convolutional layers. The following fusion step is based on a new fusion strategy in terms of 8 structure-preserved modules over the channels. Each structure-preserved module has four branches, the number of them is equal to the number of the channels in the multispectral image, and each branch can be used to extract features from convolutional layers, while residual connections are added to each branch for efficient information transfer. For the reconstruction, to reconstruct high-resolution multispectral images, the obtained features of each channel are first re-integrated through remapping the features. Result Our model is compared to 8 state-of-the-art saliency models, including the traditional approaches and deep learning methods on two datasets, called GaoFen-2 and QuickBird. The quantitative evaluation metrics are composed of peak signal-to-noise ratio (PSNR), structural similarity (SSIM), correlation coefficient (CC), spectral angle mapper (SAM), erreur relative globale adimensionnelle de synthese (ERGAS), quality-with-no-reference (QNR) $D_\lambda$ and $D_S $. Compared to the other results in two datasets, the PSNR, SSIM, CC, and ERGAS can be increased by 0.872 dB, 0.005, 0.01, and 0.109 on average for GaoFen-2 dataset, and each of the four factors are improved by 0.755 dB, 0.011, 0.004, and 0.099 on the QuickBird dataset. Furthermore, to clarify the effectiveness of different modules of the fusion algorithm, a series of ablation experiments are conducted. Conclusion A novel of framework is developed based on the two phases for pan-sharpening, which can effectively enhance the detail information of LRMS images and produce appealing HRMS images. The progressive detail enhancement step can enhance the LRMS images via the extra multi-scale PAN images-related information fusion, while the channel fusion step can fuse the channel features in terms of structure-preserved modules. To verify the effectiveness of our designs, a series of ablation studies are carried out. Experimental results on several widely-used datasets are also provided to demonstrate the advances of our method in comparison with other state-of-the-art methods.

Key words

pan-sharpening; progressive enhancement; channel fusion; deep learning; multispectral image

0 引言

高分辨率的多光谱图像在遥感图像应用中发挥着重要的作用，如目标检测(Cheng和Han，2016)、场景分类(Nogueira等，2017)、环境监测(Foody，2003)和土地测量(Mulders，2001)等。然而，由于机载存储技术和传输技术的限制(Thomas等，2008)，大多数卫星无法直接获取高分辨率的多光谱图像，只能提供低分辨率的多光谱图像和高分辨率的全色图像。低分辨率的多光谱图像拥有丰富的光谱信息，但缺乏空间细节信息；而其对应的高分辨率全色图像有丰富的细节信息但几乎没有光谱信息。显然，这两种图像都无法满足实际应用的需求，所以将两种图像融合为高分辨率的多光谱图像是很有必要的，这种图像融合技术称为全色与多光谱图像融合或全色锐化(唐霖峰等，2023)。

传统的全色与多光谱图像融合方法可大致分为成分替换法、多分辨率分析法和基于模型的方法。常见的成分替换算法有主成分分析(principal component analysis，PCA)方法(Ghadjati等，2019)、基于强度色彩饱和度(intensity-hue-saturation，IHS)的方法(Tu等，2001)和施密特正交变化(Gram-Schmidt，GS)方法(Laben和Brower，2000)。该类方法使用线性变换将上采样的低分辨率多光谱图像分解为光谱分量和空间分量，之后用全色图像替换空间分量。虽然成分替换法速度快，易于实现，但在融合过程中忽略了两种图像的差异性，直接替换导致图像的低频信息被破坏，造成了严重的光谱失真。多分辨率分析法(Liu，2000；Schowengerdt，1980)利用多尺度滤波器提取全色图像的高频信息，之后将其注入到上采样的低分辨率多光谱图像中。虽然在一定程度上解决了光谱失真的问题，但这类方法容易产生混叠效应和边缘伪影(方帅等，2020)。基于模型的方法(Ballester等，2006；Fang等，2013；Vivone等，2015, 2018)假设低分辨率多光谱图像和全色图像分别是高分辨率多光谱的图像的空间和光谱的退化版本。基于这个假设，这类方法通过不同的模型建立源图像和高分辨率多光谱图像之间的关系。然而，实际情况远比假设复杂得多，而设计一个优良的模型优化算法也是极为困难的。

近年来，基于深度学习的全色与多光谱图像融合算法成为一个新的研究热点。基于深度学习的方法有更加优秀的非线性拟合能力，可以获得比传统方法更好的融合效果。Masi等人(2016)提出了一个由3层卷积组成的遥感图像融合神经网络(pan-sharpening by convolutional neural networks，PNN)，将上采样的低分辨率多光谱图像与全色图像拼接起来作为输入，通过卷积重构为高分辨率的多光谱图像。然而，仅仅3层卷积无法完全发挥出深度神经网络优秀的拟合能力，还有很大的优化空间。所以，很多研究者通过加大网络的深度进行优化。如PANNet(Yang等，2017)利用更深的网络提取高频信息，并将提取到的高频信息注入到上采样的多光谱图像中。此外，Wei等人(2017)利用ResNet(He等，2016)架构改进PNN，提出了深度残差泛锐化神经网络(deep residual pan-sharpening neural network，DRPNN)。Wang等人(2019)用紧密连接的结构构造了一个44层的网络，增加了网络的深度，在一定程度上提升了性能。但随着网络深度的增加，训练会变得越来越困难，并且这种直接增加网络层数的方式没有充分利用两种模态的图像特性。

为了更好地从两种模态中获取不同的信息，充分挖掘两种图像在空间和光谱上的特性，很多方法使用多个分支的结构提取不同的特征。例如，Liu等人(2020)提出一个双流融合网络(two-stream fusion network，TFNet)，使用两个分支分别提取低分辨率多光谱图像的光谱信息和全色图像的空间信息，随后将两种特征整合起来重构为高分辨率多光谱图像。MPNet(multispectral pan-sharpening network)(Wang等，2021)分别使用2-D和3-D卷积从源图像中提取特征，并添加融合分支将提取到的特征进行融合。FDFNet(full-depth feature fusion network)(Jin等，2022b)使用3个分支进行全色与多光谱图像的融合，并增加了不同分支之间的交互。由于多光谱图像和全色图像分辨率不同，这些方法均采用直接将多光谱图像上采样以保持分辨率的一致性。然而，直接将多光谱图像的分辨率上采样到原来的4倍，导致多光谱图像在空间上存在明显间断的区域，为后续图像的恢复工作增加了难度，甚至一些区域无法完全恢复，如图 1(c)所示，选中区域道路不连贯，出现了中断。

图 1 GaoFen-2数据集的一个样例

Fig. 1 An example of GaoFen-2 dataset

((a) up-sampled low-resolution multispectral(LRMS) image; (b) panchromatic(PAN) image; (c) PANNet; (d) ours)

为了减少空间上的不连贯，一些方法尝试不直接对多光谱图像上采样，而是利用深度神经网络学习域之间的变换，得到增强的信息表达。SDPNet(surface and deep-level constraint-based pan-sharpening network)(Xu等，2021a)通过两个不同的编码—解码器结构的网络学习两个模态之间的转化，之后用这两个网络提取不同深度的特征作为增强的信息表示。Jin等人(2022a)利用拉普拉斯金字塔提取不同尺度的特征，在不同尺度上进行图像的恢复，从一定程度上避免了直接上采样带来的图像质量退化的影响。Wang等人(2021)提出了一种从光谱到空间的卷积SSConv(spectral-to-spatial convolution)将光谱特征转换到空间域上，避免了上采样的操作。然而，由于缺少参考图像的监督, 这些方法通常遵循Wald协议(Wald等，1997)把下采样的源图像当做训练数据，源图像作为参考图像进行训练，这种处理方式也降低了图像的质量。于是，一些研究者尝试使用基于生成对抗网络的无监督方法直接使用源图像进行训练。如PANGAN(pan-sharpening based on a generative adversarial network)(Ma等，2020)使用两个判别器分别使全色图像和多光谱图像与生成的高分辨率多光谱图像进行对抗学习。HPGAN(hyperspectral pansharpening using 3-D generative adversarial networks)(Xie等，2021)分别在全局、光谱和空间上对融合过程进行约束，提出了一种3D的从光谱到空间的生成器生成高分辨率多光谱图像。此外，李昌洁等人(2021)使用拉普拉斯金字塔结构的生成器提取多个尺度的特征，之后用这些特征构造出最后的结果。然而，这些基于生成对抗网络的方法在训练过程中很容易遭受模型坍塌和梯度消失的问题(Nagarajan和Kolter，2017)，难以收敛到一个好的结果。

为了解决上述问题，本文提出一种端到端的渐进式细节增强网络，得到增强的多光谱图像后，将多光谱图像和全色图像在通道上拼接进行融合，充分提取两种图像的特征，最后将提取到的特征重构为高分辨率的多光谱图像。如图 2所示，整个网络有两个阶段：渐进尺度增强和通道融合。在渐进尺度细节增强阶段，全色图像分别下采样到原来的1/2和1/4，之后与多光谱图像拼接起来通过两个细节增强模块得到多光谱图像的增强表示；在通道融合阶段，增强的多光谱图像在每个通道上与全色图像通过结构保持模块进行融合，之后将各个通道的特征重新整合在一起得到最终的融合结果。如图 1所示，本文方法的融合结果在光谱上保持了更多的色彩信息，在空间上也维持了细节的连续性。本文的主要贡献总结如下：

图 2 本文算法网络结构图

Fig. 2 The architecture of proposed network

1) 提出了一种端到端的渐进式细节增强和结构保持的网络，可以有效地解决空间上的不连续问题，取得了领先的性能；

2) 在通道层面上进行融合，在充分利用全色图像丰富的空间信息的同时，保持了低分辨率多光谱图像的光谱信息，获得更好的融合结果；

3) 进行了完备的消融实验，验证了各个模块的有效性，同时大量的对比实验证明本文方法的融合结果无论在视觉效果上还是客观指标评价上都达到了最优。

1 本文算法

在本文中, 将低分辩多光谱图像记为$\boldsymbol{I}_{\mathrm{LRMS}} \in$ $\mathbf{R}^{m n \times C}$, 其中$m$和$n$分别代表低分辨率多光谱图像的高度和宽度, $C$代表图像的通道数$(C=4) ; \boldsymbol{I}_{\mathrm{PAN}} \in$ $\mathbf{R}^{(r m)(r n) \times c}$代表全色图像, 其空间分辨率是多光谱图像的$r$倍, $c$代表全色图像的通道数$(c=1) ; \boldsymbol{I}_{\text {HRMS }} \in$ $\mathbf{R}^{(r m)(r n) \times C}$代表理想的高分辨率多光谱图像。如图 2所示, 在渐进尺度细节增强阶段, 通过两个细节增强模块将多光谱图像的分辨率逐步增大4倍; 在通道融合阶段, 利用提出的结构保持模块将全色图像和分解的增强图像在通道上进行融合, 最后将特征重建为高分辨率的多光谱图像。整个网络在多尺度的增强损失和重构损失的约束下进行端到端的训练。

1.1 渐进尺度细节增强

大多数基于深度学习的全色与多光谱融合算法存在的一个问题是直接将多光谱图像上采样以保持与全色图像一致的尺寸。然而这种简单的方式降低了图像质量，缺失了一部分空间细节。如果利用全色图像额外的信息帮助多光谱图像在融合前进行增强，那么多光谱图像缺失的空间信息就能得到一定的恢复。然而，直接将信息注入到多光谱图像中可能会导致出现边缘伪影的现象。为了解决上述问题，本文提出了一种渐进尺度细节增强的策略，利用不同尺度的全色图像的信息逐步得到增强的结果。

首先, 多光谱图像$\boldsymbol{I}_{\mathrm{LBMS}}$与下采样4倍的全色图像$\boldsymbol{I}_{\text {PAN } \downarrow 4}$在通道维度上拼接, 通过细节增强模块得到空间分辨率增大2倍的多光谱图像, 可以表示为

$ \boldsymbol{I}_{\mathrm{MS} \uparrow 2}=\phi\left(C\left(\boldsymbol{I}_{\mathrm{LRMS}}, \boldsymbol{I}_{\mathrm{PAN} \downarrow 4}\right)\right) $

(1)

式中, $C(\cdot)$代表在通道上进行拼接, $\phi(\cdot)$代表用细节增强模块对图像进行处理。之后将得到的放大2倍的多光谱图像$\boldsymbol{I}_{\mathrm{MS} \uparrow 2}$与下采样2倍的全色图像$\boldsymbol{I}_{\mathrm{PAN} \downarrow 2}$拼接起来, 再通过细节增强模块进行增强, 得到分辨率与全色图像一致的多光谱图像, 表示为

$ \boldsymbol{I}_{\mathrm{MS} \uparrow 4}=\phi\left(C\left(\boldsymbol{I}_{\mathrm{MS} \uparrow 2}, \boldsymbol{I}_{\mathrm{PAN} \downarrow 2}\right)\right) $

(2)

式中, $\boldsymbol{I}_{\mathrm{MS} \uparrow 4} $代表放大4倍的增强的多光谱图像。

在增强过程中，起到关键性作用的就是细节增强模块，如图 2所示，细节增强模块首先使用转置卷积层将图像的尺寸放大至原来的2倍，并使用ReLU(rectified linear unit)作为激活函数。之后再用两个 3×3 的卷积将两个模态的特征进行整合，得到增强的多光谱信息表示。整体流程可以表示为

$ \boldsymbol{F}_1=\psi\left(\boldsymbol{W}_{\mathrm{T}} * \boldsymbol{F}_{\mathrm{in}}\right) $

(3)

$ \boldsymbol{F}_2=\psi\left(\boldsymbol{W}_1 * \boldsymbol{F}_1\right) $

(4)

$ \boldsymbol{F}_3=\boldsymbol{W}_2 * \boldsymbol{F}_2 $

(5)

式中, $\boldsymbol{F}_{\text {in }}$代表输人细节增强模块的图像, $\boldsymbol{F}_i$代表第$i$个卷积层输出的特征, $\psi(\cdot)$代表ReLU激活函数, * 代表卷积操作, $W_{\mathrm{T}}$代表转置卷积的权重, $W_1$和$\boldsymbol{W}_2$分别是两个卷积的权重。

1.2 通道融合

在渐进尺度增强阶段，低分辨率的多光谱图像在全色图像额外信息的帮助下获得了一定的增强，并且使其尺寸与全色图像保持一致，但还未充分利用两种图像的光谱和空间信息。为了充分融合两种模态的互补特征，消除边缘伪影的影响，本文提出了在通道上融合的策略，先将多光谱图像在通道上进行拆分，让全色图像与多光谱图像的每一个通道都进行融合，之后将融合后的特征整合到一起，重构为高分辨率的多光谱图像。如图 2所示，通道融合的过程可以概括为3个步骤，分别是分解、融合和重组。

在分解阶段，多光谱的每个通道都与全色图像进行拼接，之后通过两个 3×3 的卷积投影到特征空间，得到一个浅层的特征表示，可以表示为

$ \begin{gathered} \boldsymbol{F}_{\mathrm{IN}_i}=\boldsymbol{W}_{d 2} * \psi\left(\boldsymbol{W}_{d 1} * C\left(\boldsymbol{I}_{\mathrm{MS}_i}, \boldsymbol{I}_{\mathrm{PAN}}\right)\right) \\ i=1, 2, 3, 4 \end{gathered} $

(6)

式中, $\boldsymbol{I}_{\mathrm{MS}_i}$代表多光谱图像的第$i$通道, $\boldsymbol{W}_{d 1}$和$\boldsymbol{W}_{d 2}$是两个卷积核的权重, $\boldsymbol{F}_{\mathrm{IN}_i}$代表生成的第$i$个特征。之后的融合阶段使用了一种新的融合策略, 用8个结构保持模块在通道上进行融合。如图 2所示, 每个结构保持模块有4个分支, 数量与多光谱图像的通道数相等, 每条分支使用卷积提取特征, 同时为了信息的有效传递, 还在每条分支上添加了残差连接。结构保持模块可以表示为

$ \left.\boldsymbol{F}_{\mathrm{FUSE}_i}=\boldsymbol{W}_{f 2} * \psi\left(\boldsymbol{W}_{f 1} * \boldsymbol{F}_{\mathrm{IN}_i}\right)\right), i=1,2,3,4 $

(7)

$ \boldsymbol{F}_{\mathrm{OUT}_i}=\boldsymbol{F}_{\mathrm{FUSE}_i}+\boldsymbol{F}_{\mathrm{IN}_i}, i=1, 2, 3, 4 $

(8)

式中, $\boldsymbol{W}_{f 1}$和$\boldsymbol{W}_{f 2}$是各个分支上卷积的权重, $\boldsymbol{F}_{\mathrm{OUT}_i}$代表结构保持模块输出的第$i$个特征。前一个模块输出的特征$\boldsymbol{F}_{\mathrm{OUT}_i}$当做下一个模块的输人特征$\boldsymbol{F}_{\mathrm{IN}_i}$, 这样随着结构保持模块数量的增加, 更多的图像特征被提取出来进行融合，为最后的图像重构做好准备。在重构阶段，首先将得到的各个通道的特征重新整合起来，接着使用卷积层将特征重新映射重构为高分辨率的多光谱图像。这个过程可以表示为

$ \boldsymbol{F}_{\mathrm{OUT}}=C\left(\boldsymbol{F}_{\mathrm{OUT}_1}, \boldsymbol{F}_{\mathrm{OUT}_2}, \boldsymbol{F}_{\mathrm{OUT}_3}, \boldsymbol{F}_{\mathrm{OUT}_4}\right) $

(9)

$ \boldsymbol{I}_{\text {HRMS }}=S\left(\boldsymbol{W}_{f 2} * \psi\left(\boldsymbol{W}_{f 1} * \boldsymbol{F}_{\text {OUT }}\right)\right) $

(10)

式中, $S(\cdot)$代表Sigmoid激活函数, $\boldsymbol{I}_{\mathrm{HRMS}}$代表生成的高分辨率的多光谱图像。通过以上步骤可以将特征重构为期望的高分辨率多光谱图像。

1.3 训练细节

训练的损失函数分为两个部分，分别是多尺度的增强损失和重构损失。其中重构损失主要为了保证生成的高分辨率多光谱图像与参考图像在结构上保持一致，表示为

$ L_{\text {recon }}=\left\|\boldsymbol{I}_{\mathrm{GT}}-\boldsymbol{I}_{\text {HRMS }}\right\|_1 $

(11)

式中, $\boldsymbol{I}_{\mathrm{GT}}$和$\boldsymbol{I}_{\mathrm{HRMS}}$分别代表参考图像和生成的高分辨率多光谱图像, $\|\cdot\|_1$代表$\ell_1$范数。除此之外, 本文还加人了多尺度的细节增强损失用来监督渐进式增强过程.可以表示为

$ L_{\mathrm{en}}=\left\|\boldsymbol{I}_{\mathrm{GT} \downarrow 2}-\boldsymbol{I}_{\mathrm{MS} \uparrow 2}\right\|_2+\left\|\boldsymbol{I}_{\mathrm{GT}}-\boldsymbol{I}_{\mathrm{MS} \uparrow 4}\right\|_2 $

(12)

式中，$\|\cdot\|_2 $代表$\ell_2 $范数。总的训练损失函数为

$ L=L_{\text {recon }}+L_{\text {en }} $

(13)

在这个损失函数的约束下，整个网络可以进行端到端的训练。

本文算法在PyTorch框架下实现，使用SGD(stochastic gradient descent)算法进行优化，共训练150轮，学习率初始值设置为0.000 5，每40轮衰减为原来的0.1，批量大小设置为8。

2 实验结果

2.1 数据集与评价指标

本文使用GaoFen-2(GF-2)和QuickBird(QB)数据集评测算法的性能，由于缺少参考图像作为监督，使用Wald协议(Wald等，1997)将源图像下采样作为训练数据。具体来说，首先分别将全色图像和多光谱图像下采样为源图像的1/4作为全色图像和多光谱图像的低分辨率版本，将原多光谱图像当做参考图像。之后，分别将低分辨率的多光谱图像、全色图像和参考图像裁剪成为64×64×4、256×256×1和256×256×4的图像块。最后，将其中的90%作为训练集进行训练，剩下的10%用于测试。

在测试阶段，为了定量评价各个算法的效果，使用峰值信噪比(peak signal-to-noise ratio，PSNR)(Horé和Ziou，2010)、结构相似度(structural similarity，SSIM)(Wang等，2004)、相关系数(correlation coefficient，CC)(Kaneko等，2003)、光谱角映射(spectral angle mapper，SAM)(Yuhas等，1992)和总体相对误差(erreur relative globale adimensionnelle de synthese，ERGAS)(Wald，2000)评测低分辨率对比实验中算法的性能。使用无参考质量指标(quality-with-no-reference，QNR)、光谱失真指数$ D_\lambda$和空间失真指数$D_S $(Alparone等，2008)评测全分辨率对比实验中算法的性能。

2.2 与其他算法的对比实验

2.2.1 低分辨率对比实验

为了验证本文算法的效果，选用4种传统算法和4种基于深度学习的算法作为比较。传统算法分别是GS(Laben和Brower，2000)、SFIM(smoothing filter-based intensity modulation)(Liu，2000)、CNMF(coupled nonnegative matrix factorization)(Yokoya等，2012)和MTF-GLP(modulation transfer function generalized Laplacian pyramid)(Vivone等，2018)。4种基于深度学习的方法分别是PANNet(Yang等，2017)、TFNet(Liu等，2020)、FDFNet(Jin等，2022b)和GPPNN(gradient projection based pan-sharpening neural network)(Xu等，2021b)。所有基于深度学习的方法都使用相同的数据集在配备NVIDIA GeForce RTX 3060的GPU，16 GB RAM的计算机上重新训练，传统方法都在Intel i5-9400F的CPU上运行。图 3和图 4分别展示了各算法在GF2和QB数据集上的视觉对比效果，为了便于观察，仅展示生成的高分辨率多光谱图像的R、G、B这3个通道。可以看到，基于深度学习的方法在颜色保持上普遍比传统方法做得更好，传统方法整体颜色偏灰暗，没有保持多光谱图像的光谱信息。在图 3和图 4中, 传统方法的融合结果都表现出严重的光谱退化问题，没有保持多光谱图像的颜色信息；而基于深度学习的方法虽然比传统方法保持了更多的光谱信息，但有些方法的融合结果出现了明显的色偏，如TFNet和GPPNN(图 3(h)、图 4(h)、图 3(j)和图 4(j))；PANNet的结果(图 3(g))有明显的边缘伪影。本文算法的融合结果在空间和光谱的保持上都有优异的表现，既没有出现明显的偏色现象，也没有边缘伪影出现，图像的整体质量有显著的提高。客观评价指标对比情况如表 1和表 2所示，可以看到，本文算法在5个有参考值的指标上均取得了最优的结果，这与视觉效果的表现一致。

图 3 GaoFen-2数据集的低分辨率视觉对比

Fig. 3 Reduced-resolution visual comparison on GaoFen-2 dataset

((a) LRMS image; (b) PAN image; (c) GS; (d) SFIM; (e) CNMF; (f) MTF-GLP; (g) PANNet; (h) TFNet; (i) FDFNet; (j) GPPNN; (k) ours; (l) ground truth)

图 4 QuickBird数据集的低分辨率视觉对比

Fig. 4 Reduced-resolution visual comparison on QuickBird dataset

((a) LRMS image; (b) PAN image; (c) GS; (d) SFIM; (e) CNMF; (f) MTF-GLP; (g) PANNet; (h) TFNet; (i) FDFNet; (j) GPPNN; (k) ours; (l) ground truth)

表 1 GaoFen-2数据集的数值比较
Table 1 Numerical comparison on the GaoFen-2 dataset

下载CSV

算法	评价指标
算法	PSNR↑/dB	SSIM ↑	CC ↑	SAM ↓	ERGAS ↓	$D_\lambda \downarrow $	$ D_S \downarrow$	QNR ↑
GS	33.698 0	0.825 9	0.845 2	0.033 2	2.861 5	0.012 6	0.131 3	0.858 1
SFIM	33.690 3	0.819 5	0.879 3	0.031 3	2.867 0	0.025 6	0.150 5	0.829 0
CNMF	34.454 8	0.841 6	0.892 2	0.034 2	2.696 7	0.026 3	0.163 2	0.816 3
MTF-GLP	32.418 9	0.809 1	0.881 1	0.038 6	3.311 1	0.038 3	0.144 8	0.824 4
PANNet	38.772 8	0.923 6	0.938 4	0.021 8	1.605 1	0.007 2	0.035 0	0.958 0
TFNet	43.055 9	0.970 9	0.976 3	0.016 1	0.960 1	0.004 9	0.011 1	0.983 9
FDFNet	41.530 2	0.957 4	0.967 3	0.017 7	1.168 7	0.003 5	0.012 5	0.984 0
GPPNN	42.564 1	0.971 2	0.975 9	0.016 3	1.014 6	0.004 6	0.019 4	0.976 1
本文	43.927 3	0.976 5	0.980 7	0.015 2	0.861 6	0.004 2	0.010 2	0.985 6
注：加粗字体表示各列结果最优值，下划线字体表示次优值。↑表示值越高越好，↓表示值越低越好。

表 2 QuickBird数据集的数值比较
Table 2 Numerical comparison on the QuickBird dataset

下载CSV

算法	评价指标
算法	PSNR↑/dB	SSIM ↑	CC ↑	SAM ↓	ERGAS ↓	$D_\lambda \downarrow $	$D_S \downarrow $	QNR ↑
GS	28.421 3	0.759 4	0.795 9	0.074 6	5.608 0	0.021 3	0.142 0	0.839 6
SFIM	29.202 8	0.741 1	0.864 8	0.067 1	5.778 8	0.031 2	0.189 5	0.785 2
CNMF	28.755 5	0.830 7	0.876 3	0.076 8	5.238 5	0.043 2	0.212 5	0.753 4
MTF-GLP	27.617 1	0.731 4	0.846 2	0.077 9	5.923 9	0.052 1	0.141 3	0.813 9
PANNet	32.205 6	0.864 7	0.934 5	0.047 1	2.530 6	0.023 6	0.016 1	0.960 6
TFNet	35.644 5	0.928 9	0.970 2	0.034 8	1.701 7	0.010 4	0.029 7	0.960 0
FDFNet	34.468 0	0.911 7	0.962 7	0.038 0	1.915 5	0.007 3	0.018 2	0.974 5
GPPNN	35.376 3	0.927 5	0.967 6	0.035 0	1.784 1	0.011 1	0.018 6	0.970 4
本文	36.398 7	0.939 4	0.974 5	0.032 2	1.575 6	0.009 4	0.019 9	0.970 7
注：加粗字体表示各列结果最优值，下划线字体表示次优值。↑表示值越高越好，↓表示值越低越好。

2.2.2 全分辨率对比实验

为了验证算法在未下采样图像上的泛化能力，进行全分辨率图像的对比实验。直接将源多光谱图像和全色图像裁剪成为128×128×4和512×512的图像块进行测试。图 5和图 6分别展示了各算法在GF-2和QuickBird数据集上的视觉对比效果。从图 5来看，传统方法中SFIM(图 5(d))和CNMF(图 5(e))都存在明显的色偏现象；观察红色边框区域可以发现，本文算法产生的融合结果减少了伪影的出现，在图 5(k)中，没有出现在低分辨率多光谱图像中的条纹状伪影，在视觉效果上明显优于其他对比方法。同时注意观察图 6的放大区域，可以发现本文算法显著保持了图像中道路区域的连续性，而其他的方法显然无法保持，出现了明显的图像内容间断现象，如PANNet(图 6(g))、FDFNet(图 6(i))和GPPNN(图 6(j))等。从表 1可以看出，本文算法在QNR(quality-with-no-reference)、$D_\lambda $和$D_S $这3个无参考指标上都取得了最优或者次优的成绩，这说明了本文模型对于未下采样图像同样具有较强的泛化能力。在表 2中可以看出本文算法在指标上没有达到最优，这是因为$D_\lambda $衡量了生成的结果与低分辨率多光谱图像的光谱信息差异，$D_S $代表着融合结果与全色图像的空间信息偏差。本文算法相比于其他方法恢复了更多的细节，并且消除了全色图像的网格伪影，导致在$D_S $指标上表现不够好，从而影响QNR的结果。从图像上可以直观看出本文算法的优越性，如图 6(k)，本文算法恢复了道路的连续性，并且去除了全色图像中的网格伪影。

图 5 GaoFen-2数据集的全分辨率视觉对比

Fig. 5 Full-resolution visual comparison on GaoFen-2 dataset

((a) LRMS image; (b) PAN image; (c) GS; (d) SFIM; (e) CNMF; (f) MTF-GLP; (g) PANNet; (h) TFNet; (i) FDFNet; (j) GPPNN; (k) ours)

图 6 QuickBird数据集的全分辨率视觉对比

Fig. 6 Full-resolution visual comparison on QuickBird dataset

((a) LRMS image; (b) PAN image; (c) GS; (d) SFIM; (e) CNMF; (f) MTF-GLP; (g) PANNet; (h) TFNet; (i) FDFNet; (j) GPPNN; (k) ours)

2.3 消融实验

为了探究各个模块的有效性，开展消融实验，如表 3所示，进行4种不同设置的实验。

表 3 在GaoFen-2数据集上消融实验的结果
Table 3 The results of ablation experiments on the GaoFen-2 dataset

下载CSV

实验配置	细节增强模块	结构保持模块的数量	多尺度细节增强损失	PSNR↑/dB	SSIM ↑	CC ↑	SAM ↓	ERGAS ↓
1	×	8	×	41.532 4	0.964 2	0.970 3	0.020 6	1.120 3
2	√	4	√	38.285 9	0.923 3	0.935 2	0.026 5	1.666 8
3	√	10	√	43.725 8	0.976 9	0.980 6	0.015 2	0.879 3
4	√	8	×	42.957 7	0.973 2	0.977 5	0.018 1	0.957 4
本文	√	8	√	43.927 3	0.976 5	0.980 7	0.015 2	0.861 6
注：加粗字体为各列最优结果，×表示未采用，√表示采用。

1) 渐进尺度细节增强过程中，使用两个细节增强模块将多光谱图像的分辨率逐步增大到与全色图像保持一致。第1个实验中，直接使用上采样的多光谱图像作为下一阶段的输入，以验证细节增强模块的有效性。在去掉细节增强模块的情况下，所有参考指标均出现了明显下降，这表明本文所提出的细节增强模块对于该任务来说是至关重要的。

2) 在第2个实验中，旨在减少结构保持模块的数量，以验证其有效性以及合理的堆叠数量。将结构保持模块的数量减少到4个，从表 3可以看到，去掉一半的结构保持模块，网络性能下降，说明该模块在网络中发挥了重要的作用。

3) 将结构保持模块的数量增加到10，其余配置与设置2)一致。可以看到，继续增加该模块性能并不会提高，这可能是因为网络深度和数据规模不匹配。

4) 去掉多尺度细节增强损失，探究该损失函数是否有效。可以看到，去掉后网络的数值表现出一定程度的下降，证明了该损失函数对于提升模型表现的价值。因此，为取得网络的最佳表现，该损失函数的设置是必要的。

2.4 模型参数量与时间

为比较不同模型的大小及平均处理用时，在GaoFen-2全分辨率测试集上进行测试，该测试集共包含180对多光谱/全色图像，其中多光谱图像尺寸为128×128×4，全色图像尺寸为512×512。结果如表 4所示。可以看到基于深度学习的方法比传统方法运行速度更快，TFNet的运行速度最快，平均每个图像对只用0.006 s，PANNet的模型参数量最小，只有0.3 M，但本文算法在时间、模型参数量和效果上达到了平衡，以很小的时间和空间代价取得了较大的性能提升。

表 4 时间和模型参数比较结果
Table 4 Comparison results of time and model size

下载CSV

算法	模型参数量/M	平均时间/s
GS	-	0.177
SFIM	-	0.227
CNMF	-	3.239
MTF-GLP	-	0.456
PANNet	0.3	0.007
TFNet	9.5	0.006
FDFNet	1.0	0.014
GPPNN	1.5	0.013
本文	1.4	0.010
注：加粗字体为各列最优结果，“-”表示该方法没有可训练的参数。

3 结论

本文提出了一种端到端的基于通道融合的渐进式细节增强遥感图像融合算法，通过通道融合充分利用两个模态的互补信息生成高分辨率的多光谱图像。同时为了解决之前方法上采样多光谱图像带来的图像质量下降和空间细节缺失的问题，提出了渐进式增强多光谱图像，代替直接上采样的操作。具体来说，算法分为两个阶段：渐进尺度细节增强和通道融合。在渐进尺度细节增强中，多光谱图像与不同尺度的全色图像拼接在一起，利用全色图像中丰富的细节信息增强多光谱图像；在通道融合中，将增强的多光谱图像按通道拆分为4个子图，每个子图都与全色图像进行融合，最后将得到的特征重构为高空间分辨率的多光谱图像。本文开展的所有消融实验验证了各个模块的有效性，同时对比实验的结果表明本文算法在视觉效果和数值表现上均明显超过了此前的算法，表明了本文模型的高效性。目前本文方法仅在4通道的多光谱图像融合上取得了良好的效果，若要扩展到更多通道的数据，如Landsat8等，则需要修改通道融合中各模块的通道数量，不够灵活。所以未来的研究方向应着眼于构建一种自适应多光谱图像通道数的融合结构。

参考文献

Alparone L, Aiazzi B, Baronti S, Garzelli A, Nencini F, Selva M. 2008. Multispectral and panchromatic data fusion assessment without reference. Photogrammetric Engineering and Remote Sensing, 74(2): 193-200 [DOI:10.14358/PERS.74.2.193]

Ballester C, Caselles V, Igual L, Verdera J, Rougé B. 2006. A variational model for P+XS image fusion. International Journal of Computer Vision, 69(1): 43-58 [DOI:10.1007/s11263-006-6852-x]

Cheng G, Han J W. 2016. A survey on object detection in optical remote sensing images. ISPRS Journal of Photogrammetry and Remote Sensing, 117: 11-28 [DOI:10.1016/j.isprsjprs.2016.03.014]

Fang F M, Li F, Shen C M, Zhang G X. 2013. A variational approach for pan-sharpening. IEEE Transactions on Image Processing, 22(7): 2822-2834 [DOI:10.1109/TIP.2013.2258355]

Fang S, Chao L, Cao F Y. 2020. New pan-sharpening method based on adaptive weight mechanism. Journal of Image and Graphics, 25(3): 546-557 (方帅, 潮蕾, 曹风云. 2020. 自适应权重注入机制遥感图像融合. 中国图象图形学报, 25(3): 546-557) [DOI:10.11834/jig.190280]

Foody G M. 2003. Remote sensing of tropical forest environments: towards the monitoring of environmental resources for sustainable development. International Journal of Remote Sensing, 24(20): 4035-4046 [DOI:10.1080/0143116031000103853]

Ghadjati M, Moussaoui A, Boukharouba A. 2019. A novel iterative PCA-based pansharpening method. Remote Sensing Letters, 10(3): 264-273 [DOI:10.1080/2150704X.2018.1547443]

He K M, Zhang X Y, Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 770-778 [DOI: 10.1109/CVPR.2016.90]

Horé A and Ziou D. 2010. Image quality metrics: PSNR vs. SSIM//Proceedings of the 20th International Conference on Pattern Recognition. Istanbul, Turkey: IEEE: 2366-2369 [DOI: 10.1109/ICPR.2010.579]

Jin C, Deng L J, Huang T Z, Vivone G. 2022a. Laplacian pyramid networks: a new approach for multispectral pansharpening. Information Fusion, 78: 158-170 [DOI:10.1016/j.inffus.2021.09.002]

Jin Z R, Zhuo Y W, Zhang T J, Jin X X, Jing S Q, Deng L J. 2022b. Remote sensing pansharpening by full-depth feature fusion. Remote Sensing, 14(3): #466 [DOI:10.3390/rs14030466]

Kaneko S, Satoh Y, Igarashi S. 2003. Using selective correlation coefficient for robust image registration. Pattern Recognition, 36(5): 1165-1173 [DOI:10.1016/S0031-3203(02)00081-X]

Laben C A and Brower B V. 2000. Process for enhancing the spatial resolution of multispectral imagery using pan-sharpening. U.S., No. 6011875

Li C J, Song H H, Zhang K H, Zhang X L, Liu Q S. 2021. Spatiotemporal fusion of satellite images via conditional generative adversarial learning. Journal of Image and Graphics, 26(3): 714-726 (李昌洁, 宋慧慧, 张开华, 张晓露, 刘青山. 2021. 条件生成对抗遥感图像时空融合. 中国图象图形学报, 26(3): 714-726) [DOI:10.11834/jig.200219]

Liu J G. 2000. Smoothing filter-based intensity modulation: a spectral preserve image fusion technique for improving spatial details. International Journal of Remote Sensing, 21(18): 3461-3472 [DOI:10.1080/014311600750037499]

Liu X Y, Liu Q J, Wang Y H. 2020. Remote sensing image fusion based on two-stream fusion network. Information Fusion, 55: 1-15 [DOI:10.1016/j.inffus.2019.07.010]

Ma J Y, Yu W, Chen C, Liang P W, Guo X J, Jiang J J. 2020. Pan-GAN: an unsupervised pan-sharpening method for remote sensing image fusion. Information Fusion, 62: 110-120 [DOI:10.1016/j.inffus.2020.04.006]

Masi G, Cozzolino D, Verdoliva L, Scarpa G. 2016. Pansharpening by convolutional neural networks. Remote Sensing, 8(7): #594 [DOI:10.3390/rs8070594]

Mulders M A. 2001. Advances in the application of remote sensing and GIS for surveying mountainous land. International Journal of Applied Earth Observation and Geoinformation, 3(1): 3-10 [DOI:10.1016/S0303-2434(01)85015-7]

Nagarajan V and Kolter J Z. 2017. Gradient descent GAN optimization is locally stable//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA: Curran Associates Inc. : 5591-5600

Nogueira K, Penatti O A B, Dos Santos J A. 2017. Towards better exploiting convolutional neural networks for remote sensing scene classification. Pattern Recognition, 61: 539-556 [DOI:10.1016/j.patcog.2016.07.001]

Schowengerdt R A. 1980. Reconstruction of multispatial, multispectral image data using spatial frequency content. Photogrammetric Engineering and Remote Sensing, 46(10): 1325-1334

Tang L F, Zhang H, Xu H, Ma J Y. 2023. Deep learning-based image fusion: a survey. Journal of Image and Graphics, 28(1): 3-36 (唐霖峰, 张浩, 徐涵, 马佳义. 2023. 基于深度学习的图像融合方法综述. 中国图象图形学报, 28(1): 3-36) [DOI:10.11834/jig.220422]

Thomas C, Ranchin T, Wald L, Chanussot J. 2008. Synthesis of multispectral images to high spatial resolution: a critical review of fusion methods based on remote sensing physics. IEEE Transactions on Geoscience and Remote Sensing, 46(5): 1301-1312 [DOI:10.1109/tgrs.2007.912448]

Tu T M, Su S C, Shyu H C, Huang P S. 2001. A new look at IHS-like image fusion methods. Information Fusion, 2(3): 177-186 [DOI:10.1016/S1566-2535(01)00036-7]

Vivone G, Restaino R, Chanussot J. 2018. Full scale regression-based injection coefficients for panchromatic sharpening. IEEE Transactions on Image Processing, 27(7): 3418-3431 [DOI:10.1109/TIP.2018.2819501]

Vivone G, Simões M, Mura M D, Restaino R, Bioucas-Dias J M, Licciardi G A, Chanussot J. 2015. Pansharpening based on semiblind deconvolution. IEEE Transactions on Geoscience and Remote Sensing, 53(4): 1997-2010 [DOI:10.1109/TGRS.2014.2351754]

Wald L, Ranchin T, Mangolini M. 1997. Fusion of satellite images of different spatial resolutions: assessing the quality of resulting images. Photogrammetric Engineering and Remote Sensing, 63(6): 691-699

Wald L. 2000. Quality of high resolution synthesised images: Is there a simple criterion//Third Conference "Fusion of Earth data: merging point measurements, raster maps and remotely sensed images". SEE/URISCA: 99-103

Wang D, Bai Y P, Wu C Y, Li Y, Shang C J, Shen Q. 2021. Convolutional LSTM-based hierarchical feature fusion for multispectral pan-sharpening. IEEE Transactions on Geoscience and Remote Sensing, 60: 1-16 [DOI:10.1109/TGRS.2021.3104221]

Wang D, Li Y, Ma L, Bai Z W, Chan J C W. 2019. Going deeper with densely connected convolutional neural networks for multispectral pansharpening. Remote Sensing, 11(22): #2608 [DOI:10.3390/rs11222608]

Wang Y D, Deng L J, Zhang T J and Wu X. 2021. SSconv: explicit spectral-to-spatial convolution for pansharpening//Proceedings of the 29th ACM International Conference on Multimedia. [s. l. ]: ACM: 4472-4480 [DOI: 10.1145/3474085.3475600]

Wang Z, Bovik A C, Sheikh H R, Simoncelli E P. 2004. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4): 600-612 [DOI:10.1109/TIP.2003.819861]

Wei Y C, Yuan Q Q, Shen H F, Zhang L P. 2017. Boosting the accuracy of multispectral image pansharpening by learning a deep residual network. IEEE Geoscience and Remote Sensing Letters, 14(10): 1795-1799 [DOI:10.1109/LGRS.2017.2736020]

Xie W Y, Cui Y H, Li Y S, Lei J, Du Q, Li J J. 2021. HPGAN: hyperspectral pansharpening using 3-D generative adversarial networks. IEEE Transactions on Geoscience and Remote Sensing, 59(1): 463-477 [DOI:10.1109/TGRS.2020.2994238]

Xu H, Ma J Y, Shao Z F, Zhang H, Jiang J J, Guo X J. 2021a. SDPNet: a deep network for pan-sharpening with enhanced information representation. IEEE Transactions on Geoscience and Remote Sensing, 59(5): 4120-4134 [DOI:10.1109/TGRS.2020.3022482]

Xu S, Zhang J S, Zhao Z X, Sun K, Liu J M and Zhang C X. 2021b. Deep gradient projection networks for pan-sharpening//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 1366-1375 [DOI: 10.1109/CVPR46437.2021.00142]

Yang J F, Fu X Y, Hu Y W, Huang Y, Ding X H and Paisley J. 2017. PanNet: a deep network architecture for pan-sharpening//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 1753-1761 [DOI: 10.1109/ICCV.2017.193]

Yokoya N, Yairi T, Iwasaki A. 2012. Coupled nonnegative matrix factorization unmixing for hyperspectral and multispectral data fusion. IEEE Transactions on Geoscience and Remote Sensing, 50(2): 528-537 [DOI:10.1109/TGRS.2011.2161320]

Yuhas R H, Goetz A F H and Boardman J W. 1992. Discrimination among semi-arid landscape endmembers using the spectral angle mapper (SAM) algorithm [EB/OL]. [2022-06-02]. https://ntrs.nasa.gov/api/citations/19940012238/downloads/19940012238.pdf