Current Issue Cover
基于多尺度特征多对抗网络的雾天图像识别

陈硕1,2, 钟汇才1, 李勇周1, 王师峥1, 杨建刚1(1.中国科学院微电子研究所, 北京 100029;2.中国科学院大学, 北京 100049)

摘 要
目的 当前的大型数据集,例如ImageNet,以及一些主流的网络模型,如ResNet等能直接高效地应用于正常场景的分类,但在雾天场景下则会出现较大的精度损失。雾天场景复杂多样,大量标注雾天数据成本过高,在现有条件下,高效地利用大量已有场景的标注数据和网络模型完成雾天场景下的分类识别任务至关重要。方法 本文使用了一种低成本的数据增强方法,有效减小图像在像素域上的差异。基于特征多样性和特征对抗的思想,提出多尺度特征多对抗网络,通过提取数据的多尺度特征,增强特征在特征域分布的代表性,利用对抗机制,在多个特征上减少特征域上的分布差异。通过缩小像素域和特征域分布差异,进一步减小领域偏移,提升雾天场景的分类识别精度。结果 在真实的多样性雾天场景数据上,通过消融实验,使用像素域数据增强方法后,带有标签的清晰图像数据在风格上更趋向于带雾图像,总的分类精度提升了8.2%,相比其他的数据增强方法,至少提升了6.3%,同时在特征域上使用多尺度特征多对抗网络,相比其他的网络,准确率至少提升了8.0%。结论 像素域数据增强以及多尺度特征多对抗网络结合的雾天图像识别方法,综合考虑了像素域和特征域的领域分布差异,结合了多尺度的丰富特征信息,同时使用多对抗来缩小雾天数据的领域偏移,在真实多样性雾天数据集上获得了更好的图像分类识别效果。
关键词
Haze image recognition based on multi-scale feature and multi-adversarial networks

Chen Shuo1,2, Zhong Huicai1, Li Yongzhou1, Wang Shizheng1, Yang Jiangang1(1.Institute of Microelectronics, Chinese Academy of Sciences, Beijing 100029, China;2.University of Chinese Academy of Sciences, Beijing 100049, China)

Abstract
Objective While dealing with high-resolution remote sensing image scene recognition, classical supervised machine learning algorithms are considered effective on two conditions, namely, 1) test samples should be in the same feature space with training samples, and 2) adequate labeled samples should be provided to train the model fully. Deep learning algorithms, which achieve remarkable results in image classification and object detection for the past few years,generally require a large number of labeled samples to learn the accurate parameters.The main image classification methods select raining and test samples randomly from the same dataset, and adopt cross validation to testify the effectiveness of the model. However,obtaining scene labels is time consuming and expensive for remote sensing images. To deal with the insufficiency of labeled samples in remote sensing image scene recognition and the problem that labeled samples cannot be shared between different datasets due to different sensors and complex light conditions, deep learning architecture and adversarial learning are investigated. A feature transfer method based on adversarial variational autoencoder is proposed. Method Feature transfer architecture can be divided into three parts. The first part is the pretrain model.Given the limited samples with scene labels, the unsupervised learning model, variational autoencoder(VAE), is adopted. The VAE is unsupervised trained on the source dataset, and the encoder part in the VAE is finetuned together with classifier network using labeled samples in the source dataset. The second part is adversarial learning module. In most of the research, adversarial learning is adopted to generate new samples, while the idea is used to transfer the features from source domain to target domain in this paper.Parameters of the finetuned encoder network for the source dataset are then used to initialize the target encoder. Using the idea of adversarial training in generative adversarial networks (GAN), a discrimination network is introduced into the training of the target encoder. The goal of the target encoder is to extract features in the target domain to have as much affinity to those of the source domain as possible, such that the discrimination network cannot distinguish the features are from either the source domain or target domain. The goal of the discrimination network is to optimize the parameters for better distinction. It is called adversarial learning because of the contradiction between the purpose of encoder and discrimination network. The features extracted by the target encoder increasingly resemble those by the source encoder by training and updating the parameters of the target encoder and the discrimination network alternately. In this manner, by the time the discrimination network can no longer differentiate between source features and target features, we can assume that the target encoder can extract similar features to the source samples, and remote sensing feature transfer between the source domain and target domain is accomplished. The third part is target finetuning and test module. A small number of labeled samples in target domain is employed to finetune the target encoder and source classifier, and the other samples are used for evaluation. Result Two remote sensing scene recognition datasets, UCMerced-21 and NWPU-RESISC45, are adopted to prove the effectiveness of the proposed feature transfer method. SUN397, a natural scene recognition dataset is employed as an attempt for the cross-view feature transfer. Eight common scene types between the three datasets, namely, baseball field, beach, farmland, forest, harbor, industrial area, overpass, and river/lake, are selected for the feature transfer task.Correlation alignment (CORAL) and balanced distribution adaptation (BDA) are used as comparisons. In the experiments of adversarial learning between two remote sensing scene recognition datasets, the proposed method boosts the recognition accuracy by about 10% compared with the network trained only by the samples in the source domain. Results improve more substantially when few samples in the target domain are involved. Compared with CORAL and BDA, the proposed method improves scene recognition accuracy by more than 3% when using a few samples in the target domain and between 10%~40% without samples in the target domain. When using the information of a natural scene image, the improvement is not as much as that of a remote sensing image, but the scene recognition accuracy using the proposed feature transfer method is still increased by approximately 6% after unsupervised feature transfer and 36% after a small number of samples in the target domain are involved in finetuning. Conclusion In this paper, an adversarial transfer learning network is proposed. The experimental results show that the proposed adversarial learning method can make the most of sample information of other dataset when the labeled samples are insufficient in the target domain. The proposed method can achieve the feature transfer between different datasets and scene recognition effectively, and remarkably improve the scene recognition accuracy.
Keywords

订阅号|日报