李彤,张钧萍(哈尔滨工业大学电子与信息工程学院, 哈尔滨 150001;上海卫星工程研究所, 上海 201109)
目的 在高分辨率遥感图像场景识别问题中，经典的监督机器学习算法大多需要充足的标记样本训练模型，而获取遥感图像的标注费时费力。为解决遥感图像场景识别中标记样本缺乏且不同数据集无法共享标记样本问题，提出一种结合对抗学习与变分自动编码机的迁移学习网络。方法 利用变分自动编码机（variational auto-encoders，VAE）在源域数据集上进行训练，分别获得编码器和分类器网络参数，并用源域编码器网络参数初始化目标域编码器。采用对抗学习的思想，引入判别网络，交替训练并更新目标域编码器与判别网络参数，使目标域与源域编码器提取的特征尽量相似，从而实现遥感图像源域到目标域的特征迁移。结果 利用两个遥感场景识别数据集进行实验，验证特征迁移算法的有效性，同时尝试利用SUN397自然场景数据集与遥感场景间的迁移识别，采用相关性对齐以及均衡分布适应两种迁移学习方法作为对比。两组遥感场景数据集间的实验中，相比于仅利用源域样本训练的网络，经过迁移学习后的网络场景识别精度提升约10%，利用少量目标域标记样本后提升更为明显；与对照实验结果相比，利用少量目标域标记样本时提出方法的识别精度提升均在3%之上，仅利用源域标记样本时提出方法场景识别精度提升了10%~40%；利用自然场景数据集时，方法仍能在一定程度上提升场景识别精度。结论 本文提出的对抗迁移学习网络可以在目标域样本缺乏的条件下，充分利用其他数据集中的样本信息，实现不同场景图像数据集间的特征迁移及场景识别，有效提升遥感图像的场景识别精度。
Remote sensing image scene recognition based on adversarial learning
Li Tong,Zhang Junping(School of Electronics and Information Engineering, Harbin Institute of Technology, Harbin 150001, China;Shanghai Institute of Satellite Engineering, Shanghai 201109, China)
Objective While dealing with high-resolution remote sensing image scene recognition, classical supervised machine learning algorithms are considered effective on two conditions, namely, 1) test samples should be in the same feature space with training samples, and 2) adequate labeled samples should be provided to train the model fully. Deep learning algorithms, which achieve remarkable results in image classification and object detection for the past few years, generally require a large number of labeled samples to learn the accurate parameters. The main image classification methods select training and test samples randomly from the same dataset, and adopt cross validation to testify the effectiveness of the model. However, obtaining scene labels is time consuming and expensive for remote sensing images. To deal with the insufficiency of labeled samples in remote sensing image scene recognition and the problem that labeled samples cannot be shared between different datasets due to different sensors and complex light conditions, deep learning architecture and adversarial learning are investigated. A feature transfer method based on adversarial variational autoencoder (VAE) is proposed. Method Feature transfer architecture can be divided into three parts. The first part is the pretrain module. Given the limited samples with scene labels, the unsupervised learning model, VAE, is adopted. The VAE is unsupervised trained on the source dataset, and the encoder part in the VAE is finetuned together with classifier network using labeled samples in the source dataset. The second part is adversarial learning module. In most of the research, adversarial learning is adopted to generate new samples, while the idea is used to transfer the features from source domain to target domain in this paper. Parameters of the finetuned encoder network for the source dataset are then used to initialize the target encoder. Using the idea of adversarial training in generative adversarial networks (GAN), a discrimination network is introduced into the training of the target encoder. The goal of the target encoder is to extract features in the target domain to have as much affinity to those of the source domain as possible, such that the discrimination network cannot distinguish the features are from either the source domain or target domain. The goal of the discrimination network is to optimize the parameters for better distinction. It is called adversarial learning because of the contradiction between the purpose of encoder and discrimination network. The features extracted by the target encoder increasingly resemble those by the source encoder by training and updating the parameters of the target encoder and the discrimination network alternately. In this manner, by the time the discrimination network can no longer differentiate between source features and target features, we can assume that the target encoder can extract similar features to the source samples, and remote sensing feature transfer between the source domain and target domain is accomplished. The third part is target finetuning and test module. A small number of labeled samples in target domain is employed to finetune the target encoder and source classifier, and the other samples are used for evaluation. Result Two remote sensing scene recognition datasets, UCMerced-21 and NWPU-RESISC45, are adopted to prove the effectiveness of the proposed feature transfer method. SUN397, a natural scene recognition dataset is employed as an attempt for the cross-view feature transfer. Eight common scene types between the three datasets, namely, baseball field, beach, farmland, forest, harbor, industrial area, overpass, and river/lake, are selected for the feature transfer task. Correlation alignment (CORAL) and balanced distribution adaptation (BDA) are used as comparisons. In the experiments of adversarial learning between two remote sensing scene recognition datasets, the proposed method boosts the recognition accuracy by about 10% compared with the network trained only by the samples in the source domain. Results improve more substantially when few samples in the target domain are involved. Compared with CORAL and BDA, the proposed method improves scene recognition accuracy by more than 3% when using a few samples in the target domain and between 10%~40% without samples in the target domain. When using the information of a natural scene image, the improvement is not as much as that of a remote sensing image, but the scene recognition accuracy using the proposed feature transfer method is still increased by approximately 6% after unsupervised feature transfer and 36% after a small number of samples in the target domain are involved in finetuning. Conclusion In this paper, an adversarial VAE-based transfer learning network is proposed. The experimental results show that the proposed adversarial learning method can make the most of sample information of other dataset when the labeled samples are insufficient in the target domain. The proposed method can achieve the feature transfer between different datasets and scene recognition effectively, and remarkably improve the scene recognition accuracy.