Transformer特征引导的双阶段地图智能生成

方政; 付莹; 刘利雄

doi:10.11834/jig.220887

地理信息技术 | 浏览量 : 0 下载量: 1 CSCD: 0

PDF
导出
分享
收藏
专辑

Transformer特征引导的双阶段地图智能生成
A dual of Transformer features-related map-intelligent generation method
2023年28卷第10期页码：3281-3294
纸质出版日期： 2023-10-16 ，
DOI： 10.11834/jig.220887
稿件说明：

移动端阅览

方政，付莹，刘利雄. 2023. Transformer特征引导的双阶段地图智能生成. 中国图象图形学报， 28(10):3281-3294

Fang Zheng， Fu Ying， Liu Lixiong. 2023. A dual of Transformer features-related map-intelligent generation method. Journal of Image and Graphics， 28(10):3281-3294
方政，付莹，刘利雄. 2023. Transformer特征引导的双阶段地图智能生成. 中国图象图形学报， 28(10):3281-3294 DOI： 10.11834/jig.220887.

Fang Zheng， Fu Ying， Liu Lixiong. 2023. A dual of Transformer features-related map-intelligent generation method. Journal of Image and Graphics， 28(10):3281-3294 DOI： 10.11834/jig.220887.

摘要

目的

现有的地图智能生成技术没有考虑到地图生成任务存在的地理要素类内差异性和地理要素域间差异性，这使得生成的地图质量难以满足实际需要。针对地理要素类内差异性和地理要素域间差异性，提出了一种Transformer特征引导的双阶段地图智能生成方法。

方法

首先基于最新的Transformer网络，设计了一个基于该网络的特征提取模块，该模块提取遥感图像中的地理要素特征用于引导地图生成，解决了地理要素类内差异性导致的地图生成困难的问题。然后设计双阶段生成框架，该框架具备两个生成对抗网络，第1个生成对抗网络为初步生成对抗网络，利用遥感图像和Transformer特征得到初步的地图图像；第2个生成对抗网络为精修生成对抗网络利用初步地图图像生成高质量的精修地图图像，缓解了地理要素域间差异性导致的地图地理要素生成不准确问题。

结果

在AIDOMG （aerial image dataset for online map generation）数据集上的9个区域进行了实验，与10种经典的和最新方法进行了比较，提出方法取得了最优的结果。其中，在海口区域，相比于Creative GAN方法，FID（Frechet inception distance）值降低了16.0%，WD（Wasserstein distance）降低了4.2%，1-NN（1-nearest neighbor）降低了5.9%；在巴黎区域，相比于Creative GAN方法，FID值降低了2.9%，WD降低了1.0%，1-NN降低了2.1%。

结论

提出的Transformer特征引导的双阶段地图智能生成方法通过高质量的Transformer特征引导和双阶段生成框架解决了地理要素类内差异性和地理要素域间差异性所带来的地图生成质量较差的问题。

Abstract

Objective

Map intelligent generation technique is focused on generating map images quickly and cost efficiently. For existing intelligent map generation technique， to get quick-responsed and low-cost map generation， remote sensing image is taken as the input， and its generative adversarial network （GAN） is used to generate the corresponding map image. Inevitably， it is challenged that the intra-class differences within geographical elements in remote sensing images and the differences of geographical elements between domains in the map generation task are still not involved in. The intra-class difference of geographical elements refers that similar geographical elements in remote sensing images have several of appearances， which are difficult to be interpreted. Geographical elements segmentation is required for map generation in relevance to melting obvious intra-class differences into corresponding categories. The difference of geographical elements between different domains means that the corresponding geographical elements in remote sensing images and map images are not exactly matched well. For example， the edges of vegetation elements in remote sensing images are irregular， while the edges of vegetation elements in map images are flat. Another challenge for map generation is to generate and keep consistency to the features of map elements. Aiming at the intra-class difference of geographical elements and the superposition of geographical elements， we develop a dual of map-intelligent generation method based on Transformer features.

Method

The model consists of three sorts of modules relevant to feature extraction， preliminary and refined generative adversarial contexts. First， feature extraction module is developed based on the latest Transformer network. It consists of a backbone and segmentation branch in terms of Swin-Transformer structure. Self-attention mechanism based Transformer can be used to construct the global relationship of the image， and it has a larger receptive field and it can extract feature information effectively. The segmentation branch is composed of a pyramid pooling module （PPM） and a feature pyramid network （FPN）. To get more effective geographic element features， feature pyramid is employed to extract multi-level feature information， and the high-level geographic element semantic information can be integrated into the middle-level and low-level geographic element semantic information， and the PPM is used to introduce the global semantic information as well. Next， feature information is sent to the segmentation branch， which uses the actual segmentation results as a guidance to generate effective geographical element features. To guide map generation and resolve the problem of map generation caused by the differences in geographical elements， this module can be used to extract the features of geographical elements in remote sensing images. Third， the preliminary generative adversarial module has a preliminary generator and a discriminator. The preliminary generator is a multi-scale generator， consisting of a local generator and a global generator， and it is used to generate the high-resolution images. Both of local and global generators are linked to encoder/decoder structures. The input of the preliminary generator is derived of remote sensing image and geographical element features， and the output is originated from preliminary map image. The discriminator is also recognized as a multi-scale discriminator， which consists of three sorts of sub discriminators for the high-resolution images. The input of the discriminator is the generated map and the real map， and the output is the single channel confidence map. Finally， a refined generator is used for refined generative adversarial module， and a discriminator with the preliminary generative adversarial module is shared in as well. The structure of the refined generator is same as the preliminary generator， which is also as a multi-scale generator in terms of local and global generators. The input of the refinement generator is originated from a preliminary map image and the output is derived of a fine map image. A dual of generation framework is constructed in terms of refined and preliminary generative adversarial-related modules. In general， to obtain preliminary map images， the preliminary generative adversarial module is as inputs based on remote sensing images and geographical element features. The preliminary map image is rough， and there are incomplete geographical elements， such as uneven road edges and fractures. For the refined generative adversarial module， to learn the geometric characteristics of geographical elements in the real map， obtain high-quality fine map images， and alleviate the problem of inaccurate local map generation caused by the differences of geographical elements between domains， the generated primary map image is taken as the input， and the real map is taken as the guide as well.

Result

Experiments are carried out on 9 regions on the aerial image dataset for online map generation （AIDOMG） dataset in comparison with 10 sort of popular methods. For the Haikou area， Frechet inception distance （FID） is reduced by 16.0%， Wasserstein distance （WD） is reduced by 4.2%， and the 1-nearest neighbor （1-NN） is reduced by 5.9% as well. For the Paris area， FID is decreased by 2.9%， WD is decreased by 1.0%， and 1-NN decreased by 2.1% simultaneously. Comparative analyses demonstrate that our method proposed can improve the results of map generation effectively. At the same time， ablation studies of the model can show the effectiveness of each module， and each module can be added and the model results is improved gradually as well.

Conclusion

To solve the problem of poor map generation quality caused by the intra-class inconsistency of geographical elements effectively， a dual of Transformer features-related map-intelligent generation method is proposed， and the differences of geographical elements between domains can be illustrated via high-quality Transformer-guided feature and a dual of generation framework further.

关键词

Transformer特征遥感图像地图图像地图智能生成生成对抗网络（GAN）

Keywords

Transformer featureremote sensing imagemap imageintelligent map generationgenerative adversarial network （GAN）

references

Ablameyko S V， Beregov B S and Kryuchkov A N. 1993. Computer-aided cartographical system for map digitizing//Proceedings of the 2nd International Conference on Document Analysis and Recognition. Tsukuba， Japan： IEEE： 115-118 ［DOI： 10.1109/ICDAR.1993.395769http://dx.doi.org/10.1109/ICDAR.1993.395769］

Chen L W， Fang Z and Fu Y. 2022. Consistency-aware map generation at multiple zoom levels using aerial image. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing， 15： 5953-5966 ［DOI： 10.1109/JSTARS.2022.3170591http://dx.doi.org/10.1109/JSTARS.2022.3170591］

Chen X， Chen S Q， Xu T， Yin B G， Peng J， Mei X M and Li H F. 2021. SMAPGAN： generative adversarial network-based semisupervised styled map tile generation method. IEEE Transactions on Geoscience and Remote Sensing， 59（5）： 4388-4406 ［DOI： 10.1109/TGRS.2020.3021819http://dx.doi.org/10.1109/TGRS.2020.3021819］

Choi Y， Choi M， Kim M， Ha J W， Kim S and Choo J. 2018. StarGAN： unified generative adversarial networks for multi-domain image-to-image translation//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City， USA： IEEE： 8789-8797 ［DOI： 10.1109/CVPR.2018.00916http://dx.doi.org/10.1109/CVPR.2018.00916］

de Boer P T， Kroese D P， Mannor S and Rubinstein R Y. 2005. A tutorial on the cross-entropy method. Annals of Operations Research， 134（1）： 19-67 ［DOI： 10.1007/s10479-005-5724-zhttp://dx.doi.org/10.1007/s10479-005-5724-z］

Deng J， Dong W， Socher R， Li L J， Li K and Li F F. 2009. ImageNet： a large-scale hierarchical image database//Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami， USA： IEEE： 248-255 ［DOI： 10.1109/CVPR.2009.5206848http://dx.doi.org/10.1109/CVPR.2009.5206848］

Dosovitskiy A， Beyer L， Kolesnikov A， Weissenborn D， Zhai X H， Unterthiner T， Dehghani M， Minderer M， Heigold G， Gelly S， Uszkoreit J and Houlsby N. 2021. An image is worth 16x16 words： Transformers for image recognition at scale ［EB/OL］. ［2022-09-02］. https://arxiv.org/pdf/2010.11929.pdfhttps://arxiv.org/pdf/2010.11929.pdf

Fu Y， Liang S Z， Chen D D and Chen Z L. 2022. Translation of aerial image into digital map via discriminative segmentation and creative generation. IEEE Transactions on Geoscience and Remote Sensing， 60： #4703715 ［DOI： 10.1109/TGRS.2021.3110894http://dx.doi.org/10.1109/TGRS.2021.3110894］

Ganguli S， Garzon P and Glaser N. 2019. GeoGAN： a conditional GAN with reconstruction and style loss to generate standard layer of maps from satellite images ［EB/OL］. ［2022-09-02］. https://arxiv.org/pdf/1902.05611.pdfhttps://arxiv.org/pdf/1902.05611.pdf

Goodfellow I J， Pouget-Abadie J， Mirza M， Xu B， Warde-Farley D， Ozair S， Courville A and Bengio Y. 2014. Generative adversarial nets//Proceedings of the 27th International Conference on Naural Information Processing Systems. Montreal， Canada： MIT Press： 2672-2680

Haunold P and Kuhn W. 1993. A keystroke level analysis of manual map digitizing//Proceedings of 1993 European Conference on Spatial Information Theory： A Theoretical Basis for GIS. Marciana Marina， Italy： Springer： 406-420 ［DOI： 10.1007/3-540-57207-4_27http://dx.doi.org/10.1007/3-540-57207-4_27］

Isola P， Zhu J Y， Zhou T H and Efros A A. 2017. Image-to-image translation with conditional adversarial networks//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu， USA： IEEE： 5967-5976 ［DOI： 10.1109/CVPR.2017.632http://dx.doi.org/10.1109/CVPR.2017.632］

Jiang L M， Zhang C X， Huang M Y， Liu C X， Shi J P and Loy C C. 2020. TSIT： a simple and versatile framework for image-to-image translation//Proceedings of the 16th European Conference on Computer Vision. Glasgow， UK： Springer： 206-222 ［DOI： 10.1007/978-3-030-58580-8_13http://dx.doi.org/10.1007/978-3-030-58580-8_13］

Ledig C， Ledig L， Huszar F， Caballero J， Cunningham A， Acosta A， Aitken A， Tejani A， Totz J and Totz Z. 2017. Photo-realistic single image super-resolution using a generative adversarial network//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu， USA： IEEE： 936-944 ［DOI： 10.1109/CVPR.2017.19http://dx.doi.org/10.1109/CVPR.2017.19］

Liang J， Zeng H and Zhang L. 2021. High-resolution photorealistic image translation in real-time： a Laplacian pyramid translation network//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville， USA： IEEE： 9387-9395 ［DOI： 10.1109/CVPR46437.2021.00927http://dx.doi.org/10.1109/CVPR46437.2021.00927］

Lin T Y， Dollár P， Girshick R， He K M， Hariharan B and Belongie S. 2017. Feature pyramid networks for object detection//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu， USA： IEEE： 936-944 ［DOI： 10.1109/CVPR.2017.106http://dx.doi.org/10.1109/CVPR.2017.106］

Liu Z， Lin Y T， Cao Y， Hu H， Wei Y X， Zhang Z， Lin S and Guo B N. 2021. Swin Transformer： hierarchical vision Transformer using shifted windows//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal， Canada： IEEE： 9992-10002 ［DOI： 10.1109/ICCV48922.2021.00986http://dx.doi.org/10.1109/ICCV48922.2021.00986］

Long J， Shelhamer E and Darrell T. 2015. Fully convolutional networks for semantic segmentation//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston， USA： IEEE： 3431-3440 ［DOI： 10.1109/CVPR.2015.7298965http://dx.doi.org/10.1109/CVPR.2015.7298965］

Mirza M and Osindero S. 2014. Conditional generative adversarial nets ［EB/OL］. ［2022-09-02］. https://arxiv.org/pdf/1411.1784.pdfhttps://arxiv.org/pdf/1411.1784.pdf

Park T， Liu M Y， Wang T C and Zhu J Y. 2019. Semantic image synthesis with spatially-adaptive normalization//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach， USA： IEEE： 2332-2341 ［DOI： 10.1109/CVPR.2019.00244http://dx.doi.org/10.1109/CVPR.2019.00244］

Simonyan K and Zisserman A. 2015. Very deep convolutional networks for large-scale image recognition ［EB/OL］. ［2022-09-02］. https://arxiv.org/pdf/1409.1556.pdfhttps://arxiv.org/pdf/1409.1556.pdf

Song H Z and Wu X J. 2019. High-quality image generation model for face aging/processing. Journal of Image and Graphics， 24（4）： 592-602

宋昊泽，吴小俊. 2019. 人脸老化/去龄化的高质量图像生成模型. 中国图象图形学报， 24（4）： 592-602 ［DOI： 10.11834/jig.180272http://dx.doi.org/10.11834/jig.180272］

Tang H， Xu D， Sebe N， Wang Y Z， Corso J J and Yan Y. 2019. Multi-channel attention selection GAN with cascaded semantic guidance for cross-view image translation//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach， USA： IEEE： 2412-2421 ［DOI： 10.1109/CVPR.2019.00252http://dx.doi.org/10.1109/CVPR.2019.00252］

Tao T， Lyu G N， Zhang S L and Li Y N. 2007. Research progress and prospect of map symbol sharing in GIS. Journal of Image and Graphics， 12（8）： 1326-1332

陶陶，闾国年，张书亮，李艳娜. 2007. GIS地图符号共享研究进展与展望. 中国图象图形学报， 12（8）： 1326-1332 ［DOI： 10.11834/jig.20070802http://dx.doi.org/10.11834/jig.20070802］

Wang T C， Liu M Y， Zhu J Y， Tao A， Kautz J and Catanzaro B. 2018. High-resolution image synthesis and semantic manipulation with conditional GANs//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City， USA： IEEE： 8798-8807 ［DOI： 10.1109/CVPR.2018.00917http://dx.doi.org/10.1109/CVPR.2018.00917］

Xiang X Y， Liu D， Yang X， Zhu Y H， Shen X H and Allebach J P. 2022. Adversarial open domain adaptation for sketch-to-photo synthesis//Proceedings of 2022 IEEE/CVF Winter Conference on Applications of Computer Vision. Waikoloa， USA： IEEE： 944-954 ［DOI： 10.1109/WACV51458.2022.00102http://dx.doi.org/10.1109/WACV51458.2022.00102］

Zhao H S， Shi J P， Qi X J， Wang X G and Jia J Y. 2017. Pyramid scene parsing network//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu， USA： IEEE： 6230-6239 ［DOI： 10.1109/CVPR.2017.660http://dx.doi.org/10.1109/CVPR.2017.660］

Zhong L， Onishi R， Wang L， Ruan L F and Tan S J. 2021. A scalable blockchain-based high-definition map update management system//Proceedings of 2021 IEEE International Smart Cities Conference. Manchester， UK： IEEE： 1-4 ［DOI： 10.1109/ISC253183.2021.9562840http://dx.doi.org/10.1109/ISC253183.2021.9562840］

Zhu J Y， Park T， Isola P and Efros A A. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice， Italy： IEEE： 2242-2251 ［DOI： 10.1109/ICCV.2017.244http://dx.doi.org/10.1109/ICCV.2017.244］

文章被引用时，请邮件提醒。

提交

双判别器深度残差GAN高光谱图像融合

LLFlowGAN：以生成对抗方式约束可逆流的低照度图像增强

融合通道位置注意力机制和并行空洞卷积的人脸年龄合成

混合双注意力机制生成对抗网络的图像修复模型