基于多尺度特征多对抗网络的雾天图像识别
Haze image recognition based on multi-scale feature and multi-adversarial networks
- 2021年26卷第11期 页码:2680-2690
收稿:2020-08-19,
修回:2020-12-8,
录用:2020-12-15,
纸质出版:2021-11-16
DOI: 10.11834/jig.200491
移动端阅览

浏览全部资源
扫码关注微信
收稿:2020-08-19,
修回:2020-12-8,
录用:2020-12-15,
纸质出版:2021-11-16
移动端阅览
目的
2
当前的大型数据集,例如ImageNet,以及一些主流的网络模型,如ResNet等能直接高效地应用于正常场景的分类,但在雾天场景下则会出现较大的精度损失。雾天场景复杂多样,大量标注雾天数据成本过高,在现有条件下,高效地利用大量已有场景的标注数据和网络模型完成雾天场景下的分类识别任务至关重要。
方法
2
本文使用了一种低成本的数据增强方法,有效减小图像在像素域上的差异。基于特征多样性和特征对抗的思想,提出多尺度特征多对抗网络,通过提取数据的多尺度特征,增强特征在特征域分布的代表性,利用对抗机制,在多个特征上减少特征域上的分布差异。通过缩小像素域和特征域分布差异,进一步减小领域偏移,提升雾天场景的分类识别精度。
结果
2
在真实的多样性雾天场景数据上,通过消融实验,使用像素域数据增强方法后,带有标签的清晰图像数据在风格上更趋向于带雾图像,总的分类精度提升了8.2%,相比其他的数据增强方法,至少提升了6.3%,同时在特征域上使用多尺度特征多对抗网络,相比其他的网络,准确率至少提升了8.0%。
结论
2
像素域数据增强以及多尺度特征多对抗网络结合的雾天图像识别方法,综合考虑了像素域和特征域的领域分布差异,结合了多尺度的丰富特征信息,同时使用多对抗来缩小雾天数据的领域偏移,在真实多样性雾天数据集上获得了更好的图像分类识别效果。
Objective
2
While dealing with high-resolution remote sensing image scene recognition
classical supervised machine learning algorithms are considered effective on two conditions
namely
1) test samples should be in the same feature space with training samples
and 2) adequate labeled samples should be provided to train the model fully. Deep learning algorithms
which achieve remarkable results in image classification and object detection for the past few years
generally require a large number of labeled samples to learn the accurate parameters.The main image classification methods select raining and test samples randomly from the same dataset
and adopt cross validation to testify the effectiveness of the model. However
obtaining scene labels is time consuming and expensive for remote sensing images. To deal with the insufficiency of labeled samples in remote sensing image scene recognition and the problem that labeled samples cannot be shared between different datasets due to different sensors and complex light conditions
deep learning architecture and adversarial learning are investigated. A feature transfer method based on adversarial variational autoencoder is proposed.
Method
2
Feature transfer architecture can be divided into three parts. The first part is the pretrain model.Given the limited samples with scene labels
the unsupervised learning model
variational autoencoder(VAE)
is adopted. The VAE is unsupervised trained on the source dataset
and the encoder part in the VAE is finetuned together with classifier network using labeled samples in the source dataset. The second part is adversarial learning module. In most of the research
adversarial learning is adopted to generate new samples
while the idea is used to transfer the features from source domain to target domain in this paper.Parameters of the finetuned encoder network for the source dataset are then used to initialize the target encoder. Using the idea of adversarial training in generative adversarial networks (GAN)
a discrimination network is introduced into the training of the target encoder. The goal of the target encoder is to extract features in the target domain to have as much affinity to those of the source domain as possible
such that the discrimination network cannot distinguish the features are from either the source domain or target domain. The goal of the discrimination network is to optimize the parameters for better distinction. It is called adversarial learning because of the contradiction between the purpose of encoder and discrimination network. The features extracted by the target encoder increasingly resemble those by the source encoder by training and updating the parameters of the target encoder and the discrimination network alternately. In this manner
by the time the discrimination network can no longer differentiate between source features and target features
we can assume that the target encoder can extract similar features to the source samples
and remote sensing feature transfer between the source domain and target domain is accomplished. The third part is target finetuning and test module. A small number of labeled samples in target domain is employed to finetune the target encoder and source classifier
and the other samples are used for evaluation.
Result
2
Two remote sensing scene recognition datasets
UCMerced-21 and NWPU-RESISC45
are adopted to prove the effectiveness of the proposed feature transfer method. SUN397
a natural scene recognition dataset is employed as an attempt for the cross-view feature transfer. Eight common scene types between the three datasets
namely
baseball field
beach
farmland
forest
harbor
industrial area
overpass
and river/lake
are selected for the feature transfer task.Correlation alignment (CORAL) and balanced distribution adaptation (BDA) are used as comparisons. In the experiments of adversarial learning between two remote sensing scene recognition datasets
the proposed method boosts the recognition accuracy by about 10% compared with the network trained only by the samples in the source domain.
Results
2
improve more substantially when few samples in the target domain are involved. Compared with CORAL and BDA
the proposed method improves scene recognition accuracy by more than 3% when using a few samples in the target domain and between 10%~40% without samples in the target domain. When using the information of a natural scene image
the improvement is not as much as that of a remote sensing image
but the scene recognition accuracy using the proposed feature transfer method is still increased by approximately 6% after unsupervised feature transfer and 36% after a small number of samples in the target domain are involved in finetuning.
Conclusion
2
In this paper
an adversarial transfer learning network is proposed. The experimental results show that the proposed adversarial learning method can make the most of sample information of other dataset when the labeled samples are insufficient in the target domain. The proposed method can achieve the feature transfer between different datasets and scene recognition effectively
and remarkably improve the scene recognition accuracy.
Bousmalis K, Silberman N, Dohan D, Erhan D and Krishnan D. 2017. Unsupervised pixel-level domain adaptation with generative adversarial networks//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 95-104[ DOI: 10.1109/CVPR.2017.18 http://dx.doi.org/10.1109/CVPR.2017.18 ]
Chen W T, Ding J J and Kuo S Y. 2019. PMS-Net: robust haze removal based on patch map for single images//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 11673-11681[ DOI: 10.1109/CVPR.2019.01195 http://dx.doi.org/10.1109/CVPR.2019.01195 ]
Chen W T, Fang H Y, Ding J J and Kuo S Y. 2020. PMHLD: patch map-based hybrid learning DehazeNet for single image haze removal. IEEE Transactions on Image Processing, 29: 6773-6788[DOI:10.1109/TIP.2020.2993407]
Duchi J, Hazan E and Singer Y. 2011. Adaptive subgradient methods for online learning and stochastic optimization. The Journal of Machine Learning Research, 12: 2121-2159
Ganin Y, Ustinova E, Ajakan H, Germain P, Larochelle H, Laviolette F and Marchand M. 2017. Domain-adversarial training of neural networks//Csurka G, ed. Domain Adaptation in Computer Vision Applications. Cham: Springer: 2096-2030[10.1007/978-3-319-58347-1_10]
Ghifary M, Kleijn W B and Zhang M J. 2014. Domain adaptive neural networks for object recognition//Proceedings of the 13th Pacific Rim International Conference on Artificial Intelligence. Gold Coast, Australia: Springer: 898-904[ DOI: 10.1007/978-3-319-13560-1_76 http://dx.doi.org/10.1007/978-3-319-13560-1_76 ]
He K M, Zhang X Y, Ren S Q and Sun J. 2016. Deep residual learning for image//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 770-778[ DOI: 10.1109/CVPR.2016.90 http://dx.doi.org/10.1109/CVPR.2016.90 ]
He K M, Sun J and Tang X O. 2011. Single image haze removal using dark channel prior. IEEE Transactions on Pattern Analysis and Machine Intelligence. 33(12): 2341-2353
Kouw W M, van der Maaten L J P, Krijthe J H and Loog M. 2016. Feature-level domain adaptation. Journal of Machine Learning Research, 17: 1-32
Pan S J and Yang Q. 2010. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10): 1345-1359[DOI:10.1109/TKDE.2009.191]
Pei Y T, Huang Y P, Zou Q, Lu Y H and Wang S. 2018a. Does haze removal help CNN-based image classification?//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 697-712[ DOI: 10.1007/978-3-030-01249-6_42 http://dx.doi.org/10.1007/978-3-030-01249-6_42 ]
Pei Z Y, Cao Z J, Long M S and Wang J M. 2018b. Multi-adversarial domain adaptation//Proceedings of the 32nd AAAI Conference on Artificial Intelligence. Louisiana, USA: AAAI: 3934-3941
Saito K, Watanabe K, Ushiku Y and Harada T. 2018. Maximum classifier discrepancy for unsupervised domain adaptation//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 3723-3732[ DOI: 10.1109/CVPR.2018.00392 http://dx.doi.org/10.1109/CVPR.2018.00392 ]
Sun B C and Saenko K. 2016. Deep CORAL: correlation alignment for deep domain adaptation//Proceedings of European Conference on Computer Vision. Amsterdam, the Nether lands: Springer: 443-550[ DOI: 10.1007/978-3-319-49409-8_35 http://dx.doi.org/10.1007/978-3-319-49409-8_35 ]
Szegedy C, Liu W, Jia Y Q, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V and Rabinovich A. 2015. Going deeper with convolutions//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 1-9[ DOI: 10.1109/CVPR.2015.7298594 http://dx.doi.org/10.1109/CVPR.2015.7298594 ]
Tzeng E, Hoffman J, Zhang N, Saenko K and Darrell T. 2014. Deep domain confusion: Maximizing for domain invariance[EB/OL]. [2020-08-04] . https://arxiv.org/pdf/1412.3474.pdf https://arxiv.org/pdf/1412.3474.pdf
Wang J D, Feng W J, Chen Y Q, Yu H, Huang M Y and Yu P S. 2018. Visual domain adaptation with manifold embedded distribution alignment//Proceedings of the 26th ACM international conference on Multimedia. Seoul, Korea(South): ACM: 402-410[ DOI: 10.1145/3240508.3240512 http://dx.doi.org/10.1145/3240508.3240512 ]
Zeiler M D. 2012. ADADELTA: an adaptive learning rate method[EB/OL]. [2020-08-04] . https:arxiv.org/pdf/1212.5701.pdf https:arxiv.org/pdf/1212.5701.pdf
Zhang H and Patel V M. 2018. Densely connected pyramid dehazing network//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 3194-3203[ DOI: 10.1109/CVPR.2018.00337 http://dx.doi.org/10.1109/CVPR.2018.00337 ]
Zhu J Y, Park T, Isola P and Efros A A. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE 532-542[ DOI: 10.1109/ICCV.2017.244 http://dx.doi.org/10.1109/ICCV.2017.244 ]
相关作者
相关机构
京公网安备11010802024621