条件生成对抗遥感图像时空融合

李昌洁; 宋慧慧; 张开华; 张晓露; 刘青山

doi:10.11834/jig.200219

遥感图像处理 | 浏览量 : 0 下载量: 21 CSCD: 3

PDF
导出
分享
收藏
专辑

条件生成对抗遥感图像时空融合
Spatiotemporal fusion of satellite images via conditional generative adversarial learning
2021年26卷第3期页码：714-726
收稿日期：2020-06-08，

修回日期：2020-06-24，

录用日期：2020-7-1，

纸质出版日期：2021-03-16
DOI： 10.11834/jig.200219
稿件说明：

移动端阅览

李昌洁, 宋慧慧, 张开华, 张晓露, 刘青山. 条件生成对抗遥感图像时空融合[J]. 中国图象图形学报, 2021,26(3):714-726. DOI： 10.11834/jig.200219.

Changjie Li, Huihui Song, Kaihua Zhang, Xiaolu Zhang, Qingshan Liu. Spatiotemporal fusion of satellite images via conditional generative adversarial learning[J]. Journal of image and graphics, 2021, 26(3): 714-726. DOI： 10.11834/jig.200219.

摘要

目的

卫星遥感技术在硬件方面的局限导致获取的遥感图像在时间与空间分辨率之间存在矛盾，而时空融合提供了一种高效、低成本的方式来融合具有时空互补性的两类遥感图像数据(典型代表是Landsat和MODIS (moderate-resolution imaging spectroradiometer)图像)，生成同时具有高时空分辨率的融合数据，解决该问题。

方法

提出了一种基于条件生成对抗网络的时空融合方法，可高效处理实际应用中的大量遥感数据。与现有的学习模型相比，该模型具有以下优点：1)通过学习一个非线性映射关系来显式地关联MODIS图像和Landsat图像；2)自动学习有效的图像特征；3)将特征提取、非线性映射和图像重建统一到一个框架下进行优化。在训练阶段，使用条件生成对抗网络建立降采样Landsat和MODIS图像之间的非线性映射，然后在原始Landsat和降采样Landsat之间训练多尺度超分条件生成对抗网络。预测过程包含两层：每层均包括基于条件生成对抗网络的预测和融合模型。分别实现从MODIS到降采样Landsat数据之间的非线性映射以及降采样Landsat与原始Landsat之间的超分辨率首建。

结果

在基准数据集CIA (coleam bally irrigation area)和LGC (lower Gwydir catchment)上的结果表明，条件生成对抗网络的方法在4种评测指标上均达到领先结果，例如在CIA数据集上，RMSE (root mean squared error)、SAM (spectral angle mapper)、SSIM (structural similarity)和ERGAS (erreur relative global adimensionnelle desynthese)分别平均提高了0.001、0.15、0.008和0.065；在LGC数据集上分别平均提高了0.001 2、0.7、0.018和0.008 9。明显优于现有基于稀疏表示的方法与基于卷积神经网络的方法。

结论

本文提出的条件生成对抗融合模型，能够充分学习Landsat和MODIS图像之间复杂的非线性映射，产生更加准确的融合结果。

Abstract

Objective

Spatiotemporal fusion of satellite images is an important problem in the research of remote sensing fusion. With the intensification of global environmental changes

satellite remote sensing data plays an indispensable role in monitoring crop growth and landform changes. In the field of dynamic monitoring

high temporal resolution becomes an important attribute of required remote sensing data because continuous observation is basic requirement for dynamic monitoring. Moreover

the fragmentation of the global terrestrial landscape makes these applications require remote sensing data with higher spatial resolutions. However

remote sensing data with high spatial and high temporal resolutions are difficult to be captured by current satellite platforms due to constraints of technology and cost. For example

Landsat images mainly have a high spatial resolution but a low temporal resolution. By contrast

MODIS(moderate-resolution imaging spectroradiometer) images have a high temporal resolution but a low spatial resolution. Spatiotemporal fusion provides an effective method to fuse the two types of remote sensing data featured by complementary spatial and temporal properties (Landsat and MODIS images are typical representatives) to generate fused data with high spatial and high temporal resolutions

which can also bring great convenience to our research on the actual terrain and landform changes.

Method

A spatiotemporal fusion method based on the conditional generative adversarial network (CGAN)

which can effectively handle massive remote sensing data in practical applications

is proposed to solve this problem. As for CGAN

GAN(generative advensarial network) is extended to CGAN that introduces the internal ground truth image as the condition variable to guide discriminator network learning

making the training of the network more directional and easier. In this study

the asymmetric Laplacian pyramid network is used as the generator of the CGAN

and the VGG(visual geometry group) net is taken as the discriminator of the CGAN. The asymmetric Laplacian pyramid network mainly consists of two branches: a high-frequency branch (mainly extracts the image details or residual images) and a low-frequency extraction branch (extracts shallow features). The two branches progressively reconstruct the images in a coarse-to-fine manner. The discriminator of the CGAN is the VGG19 (visual geometry group 19-layer net) network

where the ReLU activation function is replaced by the Leaky ReLU function

and the number of channels of the convolutional kernels is increased by a factor of 2 from 64 to 1 024. Then

a fully connected layer and a sigmoid activation function are used to obtain the probability of the sample class. In this study

a CGAN model is designed for the nonlinear mapping and a CGAN superresolution model for downsampled Landsat to reconstruct original Landsat images. Compared with existing shallow learning methods

especially for the sparse-representation-based ones

the proposed CGAN based model has the following merits: 1) explicitly correlating MODIS and downsampled Landsat images by learning a nonlinear mapping relationship

2) automatically learning and extracting effective image features and image details

and 3) unifying feature extraction

nonlinear mapping

and image reconstruction into one optimization framework. In the training stage

a nonlinear mapping is first trained between the MODIS and downsampled Landsat data using the CGAN model. Then

multiscale superresolution CGAN is trained between the downsampled Landsat and original Landsat data. The prediction procedure contains two layers

and each layer consists of a CGAN-based prediction and a fusion model. The fusion model takes the high pass model which will be explained in the next paper. One of the two layers achieves nonlinear mapping from the MODIS to downsampled Landsat data

and the other layer is the superresolution reconstructed network of the set that is used to perform image superresolution of two and five times of upsampling scales

respectively.

Result

Four indicators are commonly used to evaluate the performance of spatiotemporal fusion of remote sensing images. The first one is root mean square error

which measures the radiometric between the fusion result and ground truth. The spectral angle mapper is leveraged as the second index to measure the spectral distortion of the result. The structural similarity is taken as the third metric

measuring the similarity of the overall spatial structures between the fusion result and ground truth. Finally

the erreur relative global adimensionnelle de synthese is selected as the last index to evaluate the overall fusion result. Extensive evaluations are executed on two groups of commonly used Landsat-MODIS benchmark datasets. For the fusion results

a quantitative evaluation of the visual effects of all predicted dates and one key date shows that the method can achieve more accurate fusion results compared with sparse representation-based methods and deep convolutional networks.

Conclusion

A CGAN model that introduces an external condition to reconstruct images better is proposed. A non-linear mapping CGAN is trained to deal with the highly nonlinear correspondence relations between between downsampled Landset and MODIS data. Moreover

a multiscale superresolution CGAN is trained to bridge the huge spatial resolution gap (10 times) between original and downsampled Landsat data. Experimental verification is performed on existing methods

such as sparse representation-based methods and deep convolutional neural network methods. Experiment results show that our model outperforms several state-of-the-art spatiotemporal fusion approaches.

关键词

Keywords

references

Berk A, Anderson G P, Bernstein L S, Acharya P K, Dothe H, Matthew M W, Adler-Golden S M, Chetwynd J H Jr, Richtsmeier S C, Pukall B, Allred C L, Jeong L S and Hoke M L. 1999. MODTRAN4 radiative transfer modeling for atmospheric correction. Proceedings of Spie-the International Society for Optical Engineering, #3756[DOI:10.1117/12.366388]

Bretreger D, Yeo I Y, Quijano J, Awad J, Hancock G and Willgoose G. 2019. Monitoring irrigation water use over paddock scales using climate data and landsat observations. Agricultural Water Management, 221: 175-191[DOI:10.1016/j.agwat.2019.05.002]

Denton E, Chintala S, Szlam A and Fergus R. 2015. Deep generative image models using a Laplacian pyramid of adversarial networks[EB/OL].[2020-06-01] . https://arxiv.org/pdf/1506.05751.pdf https://arxiv.org/pdf/1506.05751.pdf

Emelyanova I V, McVicar T R, van Niel T G, Li L T and van Dijk A I J M. 2013. Assessing the accuracy of blending landsat-MODIS surface reflectances in two landscapes with contrasting spatial and temporal dynamics: a framework for algorithm selection. Remote Sensing of Environment, 133: 193-209[DOI:10.1016/j.rse.2013.02.007]

Fang S, Yao Z J and Cao F Y. 2020. Spatio-temporal method of satellite image fusion based on linear model. Journal of Image and Graphics, 25(3): 579-592

方帅, 姚振稷, 曹风云. 2020. 线性模型的遥感图像时空融合. 中国图象图形学报, 25(3): 579-592[DOI:10.11834/jig.190279]

Fu Y C, Li J F, Weng Q H, Zheng Q M, Li L, Dai S and Guo B Y. 2019. Characterizing the spatial pattern of annual urban growth by using time series Landsat imagery. Science of the Total Environment, 666: 274-284[DOI:10.1016/j.scitotenv.2019.02.178]

Gao F, Masek J, Schwaller M and Hall F. 2006. On the blending of the Landsat and MODIS surface reflectance: predicting daily Landsat surface reflectance. IEEE Transactions on Geoscience and Remote Sensing, 44(8): 2207-2218[DOI:10.1109/tgrs.2006.872081]

Goodfellow I J, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A and Bengio Y. 2014. Generative adversarial networks[EB/OL].[2020-06-01] . https://arxiv.org/pdf/1406.2661.pdf https://arxiv.org/pdf/1406.2661.pdf

Habib A, Chen B, Khalid B, Tan S C, Che H Z, Mahmood T, Shi G Y and Butt M T. 2019. Estimation and inter-comparison of dust aerosols based on MODIS, MISR and AERONET retrievals over Asian desert regions. Journal of Environmental Sciences, 76: 154-166[DOI:10.1016/j.jes.2018.04.019]

Hilker T, Wulder M A, Coops N C, Linke J, McDermid G, Masek J G, Gao F and White J C. 2009. A new data fusion model for high spatial- and temporal-resolution mapping of forest disturbance based on Landsat and MODIS. Remote Sensing of Environment, 113(8): 1613-1627[DOI:10.1016/j.rse.2009.03.007]

Hou Z Y, Mcroberts R E, Ståhl G, Packalen P, Greenberg J A and Xu Q. 2018. How much can natural resource inventory benefit from finer resolution auxiliary data? Remote Sensing of Environment, 209: 31-40[DOI:10.1016/j.rse.2018.02.039]

Huang B and Song H H. 2012. Spatiotemporal reflectance fusion via sparse representation. IEEE Transactions on Geoscience&Remote Sensing, 50(10): 3707-3716[DOI:10.1109/TGRS.2012.2186638]

Khan M M, Alparone L and Chanussot J. 2009. Pansharpening quality assessment using the modulation transfer functions of instruments. IEEE Transactions on Geoscience&Remote Sensing, 47(11): 3880-3891[DOI:10.1109/tgrs.2009.2029094]

Ledig C, Theis L, Huszár F, Caballero J, Cunningham A, Acosta A, Aitken A, Tejani A, Totz J, Wang Z H and Shi W Z. 2017. Photo-realistic single image super-resolution using a generative adversarial network//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE, 2: 105-114[ DOI: 10.1109/CVPR.2017.19 http://dx.doi.org/10.1109/CVPR.2017.19 ]

Li F Q, Jupp D L B, Reddy S, Lymburner L, Mueller N, Tan P and Islam A. 2010. An evaluation of the use of atmospheric and BRDF correction to standardize landsat data. IEEE Journal of Selected Topics in Applied Earth Observations&Remote Sensing, 3(3): 257-270[DOI:10.1109/jstars.2010.2042281]

Meroni M, Fasbender D, Rembold F, Atzberger C and Klisch A. 2019. Near real-time vegetation anomaly detection with MODIS NDVI: timeliness vs. accuracy and effect of anomaly computation options. Remote Sensing of Environment, 221: 508-521[DOI:10.1016/j.rse.2018.11.041]

Mirza M and Osindero S. 2014. Conditional generative adversarial nets[EB/OL].[2020-06-01] . https://arxiv.org/pdf/1411.1784.pdf https://arxiv.org/pdf/1411.1784.pdf

Qiao J J, Song H H, Zhang K H, Zhang X L and Liu Q S. 2019. Image super-resolution using conditional generative adversarial network. IET Image Processing, 13(14): 2673-2679[DOI:10.1049/iet-ipr.2018.6570]

Shen H F, Wu P H, Liu Y L, Ai T H, Wang Y and Liu X P. 2013. A spatial and temporal reflectance fusion model considering sensor observation differences. International Journal of Remote Sensing, 34(12): 4367-4383[DOI:10.1080/1431161.2013.777488]

Simonyan K and Zisserman A. 2014. Very deep convolutional networks for large-scale image recognition[EB/OL].[2014-09-04] . https://arxiv.org/pdf/1409.1556.pdf https://arxiv.org/pdf/1409.1556.pdf

Song H H and Huang B. 2013. Spatiotemporal satellite image fusion through one-pair image learning. IEEE Transactions on Geoscience and Remote Sensing, 51(4): 1883-1896[DOI:10.1109/TGRS.2012.2213095]

Song H H, Liu Q S, Wang G J, Hang R L and Huang B. 2018. Spatiotemporal satellite image fusion using deep convolutional neural networks. IEEE Journal of Selected Topics in Applied Earth Observations&Remote Sensing, 11(3): 821-829[DOI:10.1109/JSTARS.2018.2797894]

Sun A M, Feng Z K, Ge X Q, Luo Y, Li S S, Yan F and Feng X X. 2015. Bosten Lake surface area changes analysis using long temporal Landsat image series. Journal of Image and Graphics, 20(8): 1122-1132

孙爱民, 冯钟葵, 葛小青, 罗宇, 李山山, 闫芳, 冯旭祥. 2015. 利用长时间序列Landsat分析博斯腾湖面积变化. 中国图象图形学报, 20(8): 1122-1132[DOI:10.11834/jig.20150815]

Sun P J, Zhang J S, Pan Y Z, Xie D F and Yuan Z M Q. 2016. Temporal-spatial-fusion model for area extraction of paddy rice using multi-temporal remote sensing images. Journal of Remote Sensing, 20(2): 328-343

孙佩军, 张锦水, 潘耀忠, 谢登峰, 袁周米琪. 2016. 构建时空融合模型进行水稻遥感识别. 遥感学报, 20(2): 328-343[DOI:10.11834/jrs.20165008]

Wang Q M, Zhang Y H, Onojeghuo A O, Zhu X L and Atkinson P M. 2017. Enhancing spatio-temporal fusion of MODIS and landsat data by incorporating 250 m MODIS data. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 10(9): 4116-4123[DOI:10.1109/jstars.2017.2701643]

Wang Z, Bovik A C, Sheikh H R and Simoncelli E P. 2004. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4): 600-612[DOI:10.1109/TIP.2003.819861]

Wu B, Huang B and Zhang L P. 2015a. An error-bound-regularized sparse coding for spatiotemporal reflectance fusion. IEEE Transactions on Geoscience&Remote Sensing, 53(12): 6791-6803[DOI:10.1109/TGRS.2015.2448100]

Wu M Q, Huang W J, Niu Z and Wang C Y. 2015b. Generating daily synthetic landsat imagery by combining landsat and MODIS data. Sensors, 15(9): 24002-24025[DOI:10.3390/s150924002]

Wu M Q, Niu Z, Wang C Y, Wu C Y and Wang L. 2012. Use of MODIS and Landsat time series data to generate high-resolution temporal synthetic Landsat data using a spatial and temporal reflectance fusion model. Journal of Applied Remote Sensing, 6(1): 063507[DOI:10.1117/1.JRS.6.063507]

Yuhas R H, Goetz A F H and Boardman J W. 1992. Discrimination among semi-arid landscape endmembers using the spectral angle mapper (SAM) algorithm//Proceeding of Summaries 3rd Annual JPL Airborne Geoscience Workshop: [s.l.]: JPL: 147-149

Zhang W, Li A N, Jin H A, Bian J H, Zhang Z J, Lei G B, Qin Z H and Huang C Q. 2013. An enhanced spatial and temporal data fusion model for fusing landsat and MODIS surface reflectance to generate high temporal landsat-like data. Remote Sensing, 5(10): 5346-5368[DOI:10.3390/rs5105346]

Zhang X L, Song H H, Zhang K H, Qiao J J and Liu Q S. 2020. Single image super-resolution with enhanced Laplacian pyramid network via conditional generative adversarial learning. Neurocomputing, 398: 531-538[DOI:10.1016/j.neucom.2019.04.097]

Zhu W B, Lü A F, Jia S F, Yan J B and Mahmood R. 2017. Retrievals of all-weather daytime air temperature from MODIS products. Remote Sensing of Environment, 189: 152-163[DOI:10.1016/j.rse.2016.11.011]

Zhu X L, Cai F Y, Tian J Q and Williams T K A. 2018. Spatiotemporal fusion of multisource remote sensing data: literature survey, taxonomy, principles, applications, and future directions. Remote Sensing, 10(4): #527[DOI:10.3390/rs10040527]

Zhu X L, Chen J, Gao F, Chen X H and Masek J G. 2010. An enhanced spatial and temporal adaptive reflectance fusion model for complex heterogeneous regions. Remote Sensing of Environment, 114(11): 2610-2623[DOI:10.1016/i.rse.2010.05.032]

Zhukov B, Oertel D, Lanzl F and Reinhackel G. 1999. Unmixing-based multisensor multiresolution image fusion. IEEE Transactions on Geoscience and Remote Sensing, 37(3): 1212-1226[DOI:10.1109/36.763276]

文章被引用时，请邮件提醒。

提交

深度嵌套式Transformer网络的高光谱图像空谱解混方法

高光谱图像智能分类研究综述与展望

走向通用行人重识别：预训练大模型技术在行人重识别的应用综述

针对视觉深度学习模型的物理对抗攻击研究综述

全栈全谱：医疗影像人工智能的探索与应用