并行生成网络的红外—可见光图像转换
Infrared-to-visible image translation based on parallel generator network
- 2021年26卷第10期 页码:2346-2356
收稿:2020-04-03,
修回:2020-9-23,
录用:2020-9-30,
纸质出版:2021-10-16
DOI: 10.11834/jig.200113
移动端阅览

浏览全部资源
扫码关注微信
收稿:2020-04-03,
修回:2020-9-23,
录用:2020-9-30,
纸质出版:2021-10-16
移动端阅览
目的
2
针对现有图像转换方法的深度学习模型中生成式网络(generator network)结构单一化问题,改进了条件生成式对抗网络(conditional generative adversarial network,CGAN)的结构,提出了一种融合残差网络(ResNet)和稠密网络(DenseNet)两种不同结构的并行生成器网络模型。
方法
2
构建残差、稠密生成器分支网络模型,输入红外图像,分别经过残差、稠密生成器分支网络各自生成可见光转换图像,并提出一种基于图像分割的线性插值算法,将各生成器分支网络的转换图像进行融合,获取最终的可见光转换图像;为防止小样本条件下的训练过程中出现过拟合,在判别器网络结构中插入dropout层;设计最优阈值分割目标函数,在并行生成器网络训练过程中获取最优融合参数。
结果
2
在公共红外-可见光数据集上测试,相较于现有图像转换深度学习模型Pix2Pix和CycleGAN等,本文方法在性能指标均方误差(mean square error,MSE)和结构相似性(structural similarity index,SSIM)上均取得显著提高。
结论
2
并行生成器网络模型有效融合了各分支网络结构的优点,图像转换结果更加准确真实。
Objective
2
Image-to-image translation involves the automated conversion of input data into a corresponding output image
which differs in characteristics such as color and style. Examples include converting a photograph to a sketch or a visible image to a semantic label map. Translation has various applications in the field of computer vision such facial recognition
person identification
and image dehazing. In 2014
Goodfellow proposed an image generation model based on generative adversarial networks (GANs). This algorithm uses a loss function to classify output images as authentic or fabricated while simultaneously training a generative model to minimize loss. GANs have achieved impressive image generation results using adversarial loss specifically. For example
the image-to-image translation framework Pix2Pix was developed using a GAN architecture. Pix2Pix operates by learning a conditional generative model from input-output image pairs
which is more suitable for translation tasks. In addition
U-Net has often been used as generator networks in place of conventional decoders. While Pix2Pix provides a robust framework for image translation
acquiring sufficient quantities of paired input-output training data can be challenging. In order to solve this problem
cycle-consistent adversarial networks (CycleGANs) were developed by adding an inverse mapping and cycle consistency loss to enforce the relationship between generated and input images. In addition
ResNets have been used as generators to enhance translated image quality. Pix2PixHD offers high-resolution (2 048×1 024 pixels) output using a modified multiscale generator network that includes an instance map in the training step. Although these algorithms have effectively been used for image-to-image translation and a variety of related applications
they typically adopt U-Net or ResNet generators. These single-structure networks struggle to keep high performance across multiple evaluation indicators. As such
this study presents a novel parallel stream-based generator network to increase the robustness across multiple evaluation indicators. Unlike in previous studies
this model consists of two entirely different convolutional neural network (CNN) structures. The output translated visible image of each stream is fused with a linear interpolation-based fusion method to allow for simultaneous optimization of parameters in each model.
Method
2
The proposed parallel generator network consists of one ResNet processing stream and one DenseNet processing stream
which are fused in parallel. The ResNet stream includes down-sampling and nine Res-Unit feature extraction networks. Each Res-Unit consists of a feedforward neural network exhibiting elementwise addition. Two convolution layers are skipped. Similarly
the DenseNet stream includes down-sampling and nine Den-Unit feature extraction networks. Every Den-Unit is composed of three convolutional layers and two concatenation layers. As a result
the Den-Units output a concatenation of deep feature maps produced in all three convolutional layers. To utilize the advantages of both ResNet and DenseNet streams
two generated images are segmented into low-and high-intensity image parts with an optimal intensity threshold. Then
a linear interpolation method is proposed to fuse the segmented output images of two generator streams in the R
G
B channel respectively. We also design an intensity threshold objective function to obtain optimal parameters in the generator raining process. In addition
to avoid overfitting during training under a small dataset
we modify the discriminator structure by including four convolution-dropout pairs and a convolution layer.
Result
2
We compared our model with six state-of-the-art saliency models
including CRN(cascaded refinement networks)
SIMS(semi-parametric image synthesis)
Pix2Pix(pixel to pixel)
CycleGAN(cycle generative adversarial networks)
MUNIT(multimodal unsupervised image-to-image translation) and GauGAN(group adaptive normalization generative adversarial networks)
on a public dataset named "AAU(Aalborg University) RainSnow Traffic Surveillance Dataset". The experimental dataset
which was composed of 22 5-min video sequences acquired from traffic intersections in the Danish cities of Aalborg and Viborg
was used for testing purposes. This dataset was collected at seven different locations with a conventional RGB camera and a thermal camera
each with a resolution of 640×480 pixels
at 20 frames per second. The total experimental dataset consisted of 2 100 RGB-IR image pairs
and each scene was then randomly divided into training and test datasets by 80%-20%. In this study
multi-perspective evaluation results were acquired using the mean square error (MSE)
structural similarity index (SSIM)
gray intensity histogram correlation
and Bhattacharyya distance. The advantages of a parallel stream-based generator network were assessed by comparing the proposed parallel generator with a ResNet
DenseNet
and residual dense block (RDN)-based hybrid network. We evaluated the average MSE and SSIM values for the test data
produced using four different generators (ParaNet
ResNet
DenseNet
and RDN). The proposed method achieved an average MSE of 34.835 8
which was lower than that of ResNet
DenseNet
and hybrid RDN network. Simultaneously
the average SSIM value produced with the proposed method was 0.747 7
which was also higher than that of DenseNet
ResNet
and RDN. This result shows that the proposed parallel structure-based network produced more effective fusion results than RDB-based hybrid network structure. Moreover
comparative experiments demonstrated that parallel generator structure improves the robustness performance across multi-perspective evaluations for infrared-to-visible image translation. Compared with the six conventional methods
the MSE performance (lower is better) increased by at least 22.30%
and the SSIM (higher is better) decreased by at least 8.55%. The experimental results show that the proposed parallel generator network-based infrared-to-visible image translation deep learning model achieves high performance in terms of MSE or SSIM compared with conventional deep learning models such as CRN
SIMS
Pix2Pix
CycleGAN
MUNIT
and GauGAN.
Conclusion
2
A novel parallel stream architecture-based generator network was proposed for infrared-to-visible image translation. Unlike conventional models
the proposed parallel generator structure consists of two different network architectures: a ResNet and a DenseNet. Parallel linear combination-based fusion allowed the model to incorporate benefits from both networks simultaneously. The structure of discriminator networks used in the conditional GAN framework was also improved for training and identifying optimal ParaNet parameters. The experimental results showed that the inclusion of different networks led to increases in common assessment metrics. The MSE
SSIM
and intensity histogram similarity for the proposed parallel generator network were higher than those of existing models. In the future
this algorithm will be applied to image dehazing.
Chen Q F and Koltun V. 2017. Photographic image synthesis with cascaded refinement networks//Proceedings of 2017 IEEE International Conference on Computer Vision (ICCV). Venice, Italy: IEEE: 1520-1529[ DOI: 10.1109/ICCV.2017.168 http://dx.doi.org/10.1109/ICCV.2017.168 ]
Chen W, Li Z W and Yin Z. 2019. Image deblurring algorithm based on generative adversarial network. Information and Control, 48(6): 707-714, 722
陈玮, 李正旺, 尹钟. 2019. 基于生成对抗网络的图像去雾算法. 信息与控制, 48(6): 707-714, 722 [DOI: 10.13976/j.cnki.xk.2019.9078]
Engin D, GençA and Ekenel H K. 2018. Cycle-Dehaze: enhanced CycleGAN for single image Dehazing//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPR). Salt Lake City, USA: IEEE: 825-833[ DOI: 10.1109/CVPRW.2018.00127 http://dx.doi.org/10.1109/CVPRW.2018.00127 ]
Goodfellow I J, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A and Bengio Y. 2014. Generative adversarial nets//Proceedings of the 27th International Conference on Neural Information Processing Systems. Montreal, Canada: ACM: 2672-2680
He K M, Zhang Y, Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA: IEEE: 770-778[ DOI: 10.1109/CVPR.2016.90 http://dx.doi.org/10.1109/CVPR.2016.90 ]
Huang G, Liu Z, van der Maaten L and Weinberger K Q. 2017. Densely connected convolutional networks//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE: 12[ DOI: 10.1109/CVPR.2017.243 http://dx.doi.org/10.1109/CVPR.2017.243 ]
Huang H, Tao H J and Wang H F. 2019. Low-illumination image enhancement using a conditional generative adversarial network. Journal of Image and Graphics, 24(12): 2149-2158
黄鐄, 陶海军, 王海峰. 2019. 条件生成对抗网络的低照度图像增强方法. 中国图象图形学报, 24(12): 2149-2158 [DOI: 10.11834/jig.190145]
Huang X, Liu M Y, Belongie S and Kautz J. 2018. Multimodal unsupervised image-to-image translation//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 179-196[ DOI: 10.1007/978-3-030-01219-9_11 http://dx.doi.org/10.1007/978-3-030-01219-9_11 ]
Isola P, Zhu J Y, Zhou T H and Efros A A. 2017. Image-to-image translation with conditional adversarial networks[EB/OL]. [2016-11-21] . https://arxiv.org/pdf/1611.07004.pdf https://arxiv.org/pdf/1611.07004.pdf
Jensen M B, Bahnsen C H, Lahrmann H S, Madsen T K O and Moeslund T B. 2018. Collecting traffic video data using portable poles: survey, proposal, and analysis. Journal of Transportation Technologies, 8(4): 376-400[DOI: 10.4236/jtts.2018.84021]
Liu Z L, Zhu W and Yuan Z Y. 2019. Image instance style transfer combined with fully convolutional network and CycleGAN. Journal of Image and Graphics, 24(8): 1283-1291
刘哲良, 朱玮, 袁梓洋. 2019. 结合全卷积网络与CycleGAN的图像实例风格迁移. 中国图象图形学报, 24(8): 1283-1291 [DOI: 10.11834/jig.180624]
Mirza M and Osindero S. 2014. Conditional generative adversarial nets[EB/OL]. [2014-11-06] . https://arxiv.org/pdf/1411.1784.pdf https://arxiv.org/pdf/1411.1784.pdf
Park T, Liu M Y, Wang T C and Zhu J Y. 2019. Semantic image synthesis with spatially-adaptive normalization//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE: 2332-2341[ DOI: 10.1109/CVPR.2019.00244 http://dx.doi.org/10.1109/CVPR.2019.00244 ]
Peng Z M, Li Z C, Zhang J G, Li Y, Qi G J and Tang J H. 2019. Few-shot image recognition with knowledge transfer//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Seoul, Korea (South): IEEE: 441-449[ DOI: 10.1109/ICCV.2019.00053 http://dx.doi.org/10.1109/ICCV.2019.00053 ]
Qi X J, Chen Q F, Jia J Y and Koltun V. 2018. Semi-parametric Image Synthesis//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 8808-8816[ DOI: 10.1109/CVPR.2018.00918 http://dx.doi.org/10.1109/CVPR.2018.00918 ]
Wang T C, Liu M Y, Zhu J Y, Tao A, Kautz J and Catanzaro B. 2018. High-resolution image synthesis and semantic manipulation with conditional GANs//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 8798-8807[ DOI: 10.1109/CVPR.2018.00917 http://dx.doi.org/10.1109/CVPR.2018.00917 ]
Wei L H, Zhang S L, Gao W and Tian Q. 2018. Person transfer GAN to bridge domain gap for person re-identification//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 79-88[ DOI: 10.1109/CVPR.2018.00016 http://dx.doi.org/10.1109/CVPR.2018.00016 ]
Yang X L, Lin S Z, Lu X F, Wang L F, Li D W and Wang B. 2019. Multimodal image fusion based on generative adversarial networks. Laser and Optoelectronics Progress, 56(16): #161004
杨晓莉, 蔺素珍, 禄晓飞, 王丽芳, 李大威, 王斌. 2019. 基于生成对抗网络的多模态图像融合. 激光与光电子学进展, 56(16): #161004 [DOI: 10.3788/LOP56.161004]
Yao N M, Guo Q P, Qiao F C, Chen H and Wang H A. 2018. Robust facial expression recognition with generative adversarial networks. Acta Automatica Sinica, 44(5): 865-877
姚乃明, 郭清沛, 乔逢春, 陈辉, 王宏安. 2018. 基于生成式对抗网络的鲁棒人脸表情识别. 自动化学报, 44(5): 865-877 [DOI: 10.16383/j.aas.2018.c170477]
Zhang H, Han H, Cui J Y, Shan S G and Chen X L. 2018. RGB-D face recognition via deep complementary and common feature learning//Proceedings of the 13th IEEE International Conference on Automatic Face and Gesture Recognition. Xi'an, China: IEEE: 8-15[ DOI: 10.1109/FG.2018.00012 http://dx.doi.org/10.1109/FG.2018.00012 ]
Zhu J Y, Park T, Isola P and Efros A A. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks//Proceedings of 2017 IEEE International Conference on Computer Vision (ICCV). Venice, Italy: IEEE: 2242-2251[ DOI: 10.1109/ICCV.2017.244 http://dx.doi.org/10.1109/ICCV.2017.244 ]
相关文章
相关作者
相关机构
京公网安备11010802024621