对抗学习遥感图像场景识别
Remote sensing image scene recognition based on adversarial learning
- 2021年26卷第11期 页码:2732-2740
收稿:2020-07-28,
修回:2020-10-29,
录用:2020-11-5,
纸质出版:2021-11-16
DOI: 10.11834/jig.200419
移动端阅览

浏览全部资源
扫码关注微信
收稿:2020-07-28,
修回:2020-10-29,
录用:2020-11-5,
纸质出版:2021-11-16
移动端阅览
目的
2
在高分辨率遥感图像场景识别问题中,经典的监督机器学习算法大多需要充足的标记样本训练模型,而获取遥感图像的标注费时费力。为解决遥感图像场景识别中标记样本缺乏且不同数据集无法共享标记样本问题,提出一种结合对抗学习与变分自动编码机的迁移学习网络。
方法
2
利用变分自动编码机(variational auto-encoders,VAE)在源域数据集上进行训练,分别获得编码器和分类器网络参数,并用源域编码器网络参数初始化目标域编码器。采用对抗学习的思想,引入判别网络,交替训练并更新目标域编码器与判别网络参数,使目标域与源域编码器提取的特征尽量相似,从而实现遥感图像源域到目标域的特征迁移。
结果
2
利用两个遥感场景识别数据集进行实验,验证特征迁移算法的有效性,同时尝试利用SUN397自然场景数据集与遥感场景间的迁移识别,采用相关性对齐以及均衡分布适应两种迁移学习方法作为对比。两组遥感场景数据集间的实验中,相比于仅利用源域样本训练的网络,经过迁移学习后的网络场景识别精度提升约10%,利用少量目标域标记样本后提升更为明显;与对照实验结果相比,利用少量目标域标记样本时提出方法的识别精度提升均在3%之上,仅利用源域标记样本时提出方法场景识别精度提升了10%~40%;利用自然场景数据集时,方法仍能在一定程度上提升场景识别精度。
结论
2
本文提出的对抗迁移学习网络可以在目标域样本缺乏的条件下,充分利用其他数据集中的样本信息,实现不同场景图像数据集间的特征迁移及场景识别,有效提升遥感图像的场景识别精度。
Objective
2
While dealing with high-resolution remote sensing image scene recognition
classical supervised machine learning algorithms are considered effective on two conditions
namely
1) test samples should be in the same feature space with training samples
and 2) adequate labeled samples should be provided to train the model fully. Deep learning algorithms
which achieve remarkable results in image classification and object detection for the past few years
generally require a large number of labeled samples to learn the accurate parameters. The main image classification methods select training and test samples randomly from the same dataset
and adopt cross validation to testify the effectiveness of the model. However
obtaining scene labels is time consuming and expensive for remote sensing images. To deal with the insufficiency of labeled samples in remote sensing image scene recognition and the problem that labeled samples cannot be shared between different datasets due to different sensors and complex light conditions
deep learning architecture and adversarial learning are investigated. A feature transfer method based on adversarial variational autoencoder (VAE) is proposed.
Method
2
Feature transfer architecture can be divided into three parts. The first part is the pretrain module. Given the limited samples with scene labels
the unsupervised learning model
VAE
is adopted. The VAE is unsupervised trained on the source dataset
and the encoder part in the VAE is finetuned together with classifier network using labeled samples in the source dataset. The second part is adversarial learning module. In most of the research
adversarial learning is adopted to generate new samples
while the idea is used to transfer the features from source domain to target domain in this paper. Parameters of the finetuned encoder network for the source dataset are then used to initialize the target encoder. Using the idea of adversarial training in generative adversarial networks (GAN)
a discrimination network is introduced into the training of the target encoder. The goal of the target encoder is to extract features in the target domain to have as much affinity to those of the source domain as possible
such that the discrimination network cannot distinguish the features are from either the source domain or target domain. The goal of the discrimination network is to optimize the parameters for better distinction. It is called adversarial learning because of the contradiction between the purpose of encoder and discrimination network. The features extracted by the target encoder increasingly resemble those by the source encoder by training and updating the parameters of the target encoder and the discrimination network alternately. In this manner
by the time the discrimination network can no longer differentiate between source features and target features
we can assume that the target encoder can extract similar features to the source samples
and remote sensing feature transfer between the source domain and target domain is accomplished. The third part is target finetuning and test module. A small number of labeled samples in target domain is employed to finetune the target encoder and source classifier
and the other samples are used for evaluation.
Result
2
Two remote sensing scene recognition datasets
UCMerced-21 and NWPU-RESISC45
are adopted to prove the effectiveness of the proposed feature transfer method. SUN397
a natural scene recognition dataset is employed as an attempt for the cross-view feature transfer. Eight common scene types between the three datasets
namely
baseball field
beach
farmland
forest
harbor
industrial area
overpass
and river/lake
are selected for the feature transfer task. Correlation alignment (CORAL) and balanced distribution adaptation (BDA) are used as comparisons. In the experiments of adversarial learning between two remote sensing scene recognition datasets
the proposed method boosts the recognition accuracy by about 10% compared with the network trained only by the samples in the source domain. Results improve more substantially when few samples in the target domain are involved. Compared with CORAL and BDA
the proposed method improves scene recognition accuracy by more than 3% when using a few samples in the target domain and between 10%~40% without samples in the target domain. When using the information of a natural scene image
the improvement is not as much as that of a remote sensing image
but the scene recognition accuracy using the proposed feature transfer method is still increased by approximately 6% after unsupervised feature transfer and 36% after a small number of samples in the target domain are involved in finetuning.
Conclusion
2
In this paper
an adversarial VAE-based transfer learning network is proposed. The experimental results show that the proposed adversarial learning method can make the most of sample information of other dataset when the labeled samples are insufficient in the target domain. The proposed method can achieve the feature transfer between different datasets and scene recognition effectively
and remarkably improve the scene recognition accuracy.
Chaib S, Liu H, Gu Y F and Yao H X. 2017. Deep feature fusion for VHR remote sensing scene classification. IEEE Transactions on Geoscience and Remote Sensing, 55(8): 4775-4784[DOI: 10.1109/TGRS.2017.2700322]
Cheng G, Han J W and Lu X Q. 2017. Remote sensing image scene classification: benchmark and state of the art. Proceedings of the IEEE, 105(10): 1865-1883[DOI: 10.1109/JPROC.2017.2675998]
Cheng G, Yang C Y, Yao X W, Guo L and Han J W. 2018. When deep learning meets metric learning: remote sensing image scene classification via learning discriminative CNNs. IEEE Transactions on Geoscience and Remote Sensing, 56(5): 2811-2821[DOI: 10.1109/TGRS.2017.2783902]
Gong X, Xie Z, Liu Y Y, Shi X G and Zheng Z. 2018. Deep salient feature based anti-noise transfer network for scene classification of remote sensing imagery. Remote Sensing, 10(3): #410[DOI: 10.3390/rs10030410]
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A and Bengio Y. 2014. Generative adversarial nets//Proceedings of the 27th International Conference on Neural Information Processing Systems. Montreal, Canada: MIT Press: 2672-2680
Hinton G E and Salakhutdinov R R. 2006. Reducing the dimensionality of data with neural networks. Science, 313(5786): 504-507[DOI: 10.1126/science.1127647]
Hu F, Xia G S, Hu J W and Zhang L P. 2015. Transferring deep convolutional neural networks for the scene classification of high-resolution remote sensing imagery. Remote Sensing, 7(11): 14680-14707[DOI: 10.3390/rs71114680]
Kingma D P and Welling M. 2014. Auto-encoding Variational Bayes[EB/OL]. [2020-07-28] . http://arxiv.org/pdf/1312.6114.pdf http://arxiv.org/pdf/1312.6114.pdf
Li G D, Zhang C J, Wang M K, Zhang X Y and Gao F. 2019. Transfer learning using convolutional neural network for scene classification within high resolution remote sensing image. Science of Surveying and Mapping, 44(4): 116-123, 174
李冠东, 张春菊, 王铭恺, 张雪英, 高飞. 2019. 卷积神经网络迁移的高分影像场景分类学习. 测绘科学, 44(4): 116-123, 174[DOI: 10.16251/j.cnki.1009-2307.2019.04.018]
Long M S, Wang J M, Ding G G, Sun J G and Yu P S. 2013. Transfer feature learning with joint distribution adaptation//Proceedings of 2013 IEEE International Conference on Computer Vision. Sydney, Australia: IEEE: 2200-2207[ DOI: 10.1109/ICCV.2013.274 http://dx.doi.org/10.1109/ICCV.2013.274 ]
Matasci G, Volpi M, Kanevski M, Bruzzone L and Tuia D. 2015. Semisupervised transfer component analysis for domain adaptation in remote sensing image classification. IEEE Transactions on Geoscience and Remote Sensing, 53(7): 3550-3564[DOI: 10.1109/TGRS.2014.2377785]
Sun B C, Feng J S and Saenko K. 2016. Return of frustratingly easy domain adaptation/Proceedings of the 30th AAAI Conference on Artificial Intelligence. Phoenix, USA: AAAI Press: 2058-2065
Tong W, Chen W, Han W, Li X and Wang L. 2020. Channel-attention-based densenet network for remote sensing image scene classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 13: 4121-4132[DOI: 10.1109/JSTARS.2020.3009352]
Wang G L, Fan B, Xiang S M and Pan C H. 2017a. Aggregating rich hierarchical features for scene classification in remote sensing imagery. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 10(9): 4104-4115[DOI: 10.1109/JSTARS.2017.2705419]
Wang J D, Chen Y Q, Hao S J, Feng W J and Shen Z Q. 2017b. Balanced distribution adaptation for transfer learning//Proceedings of 2017 IEEE International Conference on Data Mining (ICDM). New Orleans, USA: IEEE: 1129-1134[ DOI: 10.1109/ICDM.2017.150 http://dx.doi.org/10.1109/ICDM.2017.150 ]
Wang Q, Liu S T, Chanussot J and Li X L. 2019. Scene classification with recurrent attention of VHR remote sensing images. IEEE Transactions on Geoscience and Remote Sensing, 57(2): 1155-1167[DOI: 10.1109/TGRS.2018.2864987]
Wu R, Li Y, Han H, Chen X and Lin Y. 2019. Remote sensing image analysis based on transfer learning: a survey//Proceedings of International Conference on Advanced Hybrid Information Processing. Nanjing, China: Springer: 408-415[ DOI: 10.1007/978-3-030-19086-6_45 http://dx.doi.org/10.1007/978-3-030-19086-6_45 ]
Xiao J X, Ehinger K A, Hays J, Torralba A and Oliva A. 2016. SUN database: exploring a large collection of scene categories. International Journal of Computer Vision, 119(1): 3-22[DOI: 10.1007/s11263-014-0748-y]
Yang Y and Newsam S. 2010. Bag-of-visual-words and spatial extensions for land-use classification//Proceeding of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems. San Jose, USA: Association for Computing Machinery: 270-279[ DOI: 10.1145/1869790.1869829 http://dx.doi.org/10.1145/1869790.1869829 ]
Yao Y, Liang H, Li X, Zhang J and He J. 2017. Sensing urban land-use patterns by integrating Google Tensorflow and scene-classification models//The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences. XLII-2/W7: 981-988[ DOI: 10.5194/isprs-archives-XLII-2-W7-981-2017 http://dx.doi.org/10.5194/isprs-archives-XLII-2-W7-981-2017 ]
Zhang J P, Li T, Lu X C and Cheng Z. 2016. Semantic classification of high-resolution remote-sensing images based on mid-level features. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 9(6): 2343-2353[DOI: 10.1109/JSTARS.2016.2536943]
Zhao B, Zhong Y F, Zhang L P and Huang B. 2016. The Fisher kernel coding framework for high spatial resolution scene classification. Remote Sensing, 8(2): #157[DOI: 10.3390/rs8020157]
Zhu Q Q, Zhong Y F, Wu S Q, Zhang L P and Li D R. 2018. Scene classification based on the sparse homogeneous-heterogeneous topic feature model. IEEE Transactions on Geoscience and Remote Sensing, 56(5): 2689-2703[DOI:10.1109/TGRS.2017. 2781712]
相关作者
相关机构
京公网安备11010802024621