并行生成网络的红外—可见光图像转换

余佩伦; 施佺; 王晗

doi:10.11834/jig.200113

图像处理和编码 | 浏览量 : 0 下载量: 107 CSCD: 5

PDF
导出
分享
收藏
专辑

并行生成网络的红外—可见光图像转换
Infrared-to-visible image translation based on parallel generator network
2021年26卷第10期页码：2346-2356
收稿：2020-04-03，

修回：2020-9-23，

录用：2020-9-30，

纸质出版：2021-10-16
DOI： 10.11834/jig.200113
稿件说明：

移动端阅览

余佩伦, 施佺, 王晗. 并行生成网络的红外—可见光图像转换[J]. 中国图象图形学报, 2021,26(10):2346-2356. DOI： 10.11834/jig.200113.

Peilun Yu, Quan Shi, Han Wang. Infrared-to-visible image translation based on parallel generator network[J]. Journal of Image and Graphics, 2021, 26(10): 2346-2356. DOI： 10.11834/jig.200113.

摘要

目的

针对现有图像转换方法的深度学习模型中生成式网络（generator network）结构单一化问题，改进了条件生成式对抗网络（conditional generative adversarial network，CGAN）的结构，提出了一种融合残差网络（ResNet）和稠密网络（DenseNet）两种不同结构的并行生成器网络模型。

方法

构建残差、稠密生成器分支网络模型，输入红外图像，分别经过残差、稠密生成器分支网络各自生成可见光转换图像，并提出一种基于图像分割的线性插值算法，将各生成器分支网络的转换图像进行融合，获取最终的可见光转换图像；为防止小样本条件下的训练过程中出现过拟合，在判别器网络结构中插入dropout层；设计最优阈值分割目标函数，在并行生成器网络训练过程中获取最优融合参数。

结果

在公共红外-可见光数据集上测试，相较于现有图像转换深度学习模型Pix2Pix和CycleGAN等，本文方法在性能指标均方误差（mean square error，MSE）和结构相似性（structural similarity index，SSIM）上均取得显著提高。

结论

并行生成器网络模型有效融合了各分支网络结构的优点，图像转换结果更加准确真实。

Abstract

Objective

Image-to-image translation involves the automated conversion of input data into a corresponding output image

which differs in characteristics such as color and style. Examples include converting a photograph to a sketch or a visible image to a semantic label map. Translation has various applications in the field of computer vision such facial recognition

person identification

and image dehazing. In 2014

Goodfellow proposed an image generation model based on generative adversarial networks (GANs). This algorithm uses a loss function to classify output images as authentic or fabricated while simultaneously training a generative model to minimize loss. GANs have achieved impressive image generation results using adversarial loss specifically. For example

the image-to-image translation framework Pix2Pix was developed using a GAN architecture. Pix2Pix operates by learning a conditional generative model from input-output image pairs

which is more suitable for translation tasks. In addition

U-Net has often been used as generator networks in place of conventional decoders. While Pix2Pix provides a robust framework for image translation

acquiring sufficient quantities of paired input-output training data can be challenging. In order to solve this problem

cycle-consistent adversarial networks (CycleGANs) were developed by adding an inverse mapping and cycle consistency loss to enforce the relationship between generated and input images. In addition

ResNets have been used as generators to enhance translated image quality. Pix2PixHD offers high-resolution (2 048×1 024 pixels) output using a modified multiscale generator network that includes an instance map in the training step. Although these algorithms have effectively been used for image-to-image translation and a variety of related applications

they typically adopt U-Net or ResNet generators. These single-structure networks struggle to keep high performance across multiple evaluation indicators. As such

this study presents a novel parallel stream-based generator network to increase the robustness across multiple evaluation indicators. Unlike in previous studies

this model consists of two entirely different convolutional neural network (CNN) structures. The output translated visible image of each stream is fused with a linear interpolation-based fusion method to allow for simultaneous optimization of parameters in each model.

Method

The proposed parallel generator network consists of one ResNet processing stream and one DenseNet processing stream

which are fused in parallel. The ResNet stream includes down-sampling and nine Res-Unit feature extraction networks. Each Res-Unit consists of a feedforward neural network exhibiting elementwise addition. Two convolution layers are skipped. Similarly

the DenseNet stream includes down-sampling and nine Den-Unit feature extraction networks. Every Den-Unit is composed of three convolutional layers and two concatenation layers. As a result

the Den-Units output a concatenation of deep feature maps produced in all three convolutional layers. To utilize the advantages of both ResNet and DenseNet streams

two generated images are segmented into low-and high-intensity image parts with an optimal intensity threshold. Then

a linear interpolation method is proposed to fuse the segmented output images of two generator streams in the R

B channel respectively. We also design an intensity threshold objective function to obtain optimal parameters in the generator raining process. In addition

to avoid overfitting during training under a small dataset

we modify the discriminator structure by including four convolution-dropout pairs and a convolution layer.

Result

We compared our model with six state-of-the-art saliency models

including CRN(cascaded refinement networks)

SIMS(semi-parametric image synthesis)

Pix2Pix(pixel to pixel)

CycleGAN(cycle generative adversarial networks)

MUNIT(multimodal unsupervised image-to-image translation) and GauGAN(group adaptive normalization generative adversarial networks)

on a public dataset named "AAU(Aalborg University) RainSnow Traffic Surveillance Dataset". The experimental dataset

which was composed of 22 5-min video sequences acquired from traffic intersections in the Danish cities of Aalborg and Viborg

was used for testing purposes. This dataset was collected at seven different locations with a conventional RGB camera and a thermal camera

each with a resolution of 640×480 pixels

at 20 frames per second. The total experimental dataset consisted of 2 100 RGB-IR image pairs

and each scene was then randomly divided into training and test datasets by 80%-20%. In this study

multi-perspective evaluation results were acquired using the mean square error (MSE)

structural similarity index (SSIM)

gray intensity histogram correlation

and Bhattacharyya distance. The advantages of a parallel stream-based generator network were assessed by comparing the proposed parallel generator with a ResNet

DenseNet

and residual dense block (RDN)-based hybrid network. We evaluated the average MSE and SSIM values for the test data

produced using four different generators (ParaNet

ResNet

DenseNet

and RDN). The proposed method achieved an average MSE of 34.835 8

which was lower than that of ResNet

DenseNet

and hybrid RDN network. Simultaneously

the average SSIM value produced with the proposed method was 0.747 7

which was also higher than that of DenseNet

ResNet

and RDN. This result shows that the proposed parallel structure-based network produced more effective fusion results than RDB-based hybrid network structure. Moreover

comparative experiments demonstrated that parallel generator structure improves the robustness performance across multi-perspective evaluations for infrared-to-visible image translation. Compared with the six conventional methods

the MSE performance (lower is better) increased by at least 22.30%

and the SSIM (higher is better) decreased by at least 8.55%. The experimental results show that the proposed parallel generator network-based infrared-to-visible image translation deep learning model achieves high performance in terms of MSE or SSIM compared with conventional deep learning models such as CRN

SIMS

Pix2Pix

CycleGAN

MUNIT

and GauGAN.

Conclusion

A novel parallel stream architecture-based generator network was proposed for infrared-to-visible image translation. Unlike conventional models

the proposed parallel generator structure consists of two different network architectures: a ResNet and a DenseNet. Parallel linear combination-based fusion allowed the model to incorporate benefits from both networks simultaneously. The structure of discriminator networks used in the conditional GAN framework was also improved for training and identifying optimal ParaNet parameters. The experimental results showed that the inclusion of different networks led to increases in common assessment metrics. The MSE

SSIM

and intensity histogram similarity for the proposed parallel generator network were higher than those of existing models. In the future

this algorithm will be applied to image dehazing.

关键词

Keywords

references

Chen Q F and Koltun V. 2017. Photographic image synthesis with cascaded refinement networks//Proceedings of 2017 IEEE International Conference on Computer Vision (ICCV). Venice, Italy: IEEE: 1520-1529[ DOI: 10.1109/ICCV.2017.168 http://dx.doi.org/10.1109/ICCV.2017.168 ]

Chen W, Li Z W and Yin Z. 2019. Image deblurring algorithm based on generative adversarial network. Information and Control, 48(6): 707-714, 722

陈玮, 李正旺, 尹钟. 2019. 基于生成对抗网络的图像去雾算法. 信息与控制, 48(6): 707-714, 722 [DOI: 10.13976/j.cnki.xk.2019.9078]

Engin D, GençA and Ekenel H K. 2018. Cycle-Dehaze: enhanced CycleGAN for single image Dehazing//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPR). Salt Lake City, USA: IEEE: 825-833[ DOI: 10.1109/CVPRW.2018.00127 http://dx.doi.org/10.1109/CVPRW.2018.00127 ]

Goodfellow I J, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A and Bengio Y. 2014. Generative adversarial nets//Proceedings of the 27th International Conference on Neural Information Processing Systems. Montreal, Canada: ACM: 2672-2680

He K M, Zhang Y, Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA: IEEE: 770-778[ DOI: 10.1109/CVPR.2016.90 http://dx.doi.org/10.1109/CVPR.2016.90 ]

Huang G, Liu Z, van der Maaten L and Weinberger K Q. 2017. Densely connected convolutional networks//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE: 12[ DOI: 10.1109/CVPR.2017.243 http://dx.doi.org/10.1109/CVPR.2017.243 ]

Huang H, Tao H J and Wang H F. 2019. Low-illumination image enhancement using a conditional generative adversarial network. Journal of Image and Graphics, 24(12): 2149-2158

黄鐄, 陶海军, 王海峰. 2019. 条件生成对抗网络的低照度图像增强方法. 中国图象图形学报, 24(12): 2149-2158 [DOI: 10.11834/jig.190145]

Huang X, Liu M Y, Belongie S and Kautz J. 2018. Multimodal unsupervised image-to-image translation//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 179-196[ DOI: 10.1007/978-3-030-01219-9_11 http://dx.doi.org/10.1007/978-3-030-01219-9_11 ]

Isola P, Zhu J Y, Zhou T H and Efros A A. 2017. Image-to-image translation with conditional adversarial networks[EB/OL]. [2016-11-21] . https://arxiv.org/pdf/1611.07004.pdf https://arxiv.org/pdf/1611.07004.pdf

Jensen M B, Bahnsen C H, Lahrmann H S, Madsen T K O and Moeslund T B. 2018. Collecting traffic video data using portable poles: survey, proposal, and analysis. Journal of Transportation Technologies, 8(4): 376-400[DOI: 10.4236/jtts.2018.84021]

Liu Z L, Zhu W and Yuan Z Y. 2019. Image instance style transfer combined with fully convolutional network and CycleGAN. Journal of Image and Graphics, 24(8): 1283-1291

刘哲良, 朱玮, 袁梓洋. 2019. 结合全卷积网络与CycleGAN的图像实例风格迁移. 中国图象图形学报, 24(8): 1283-1291 [DOI: 10.11834/jig.180624]

Mirza M and Osindero S. 2014. Conditional generative adversarial nets[EB/OL]. [2014-11-06] . https://arxiv.org/pdf/1411.1784.pdf https://arxiv.org/pdf/1411.1784.pdf

Park T, Liu M Y, Wang T C and Zhu J Y. 2019. Semantic image synthesis with spatially-adaptive normalization//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE: 2332-2341[ DOI: 10.1109/CVPR.2019.00244 http://dx.doi.org/10.1109/CVPR.2019.00244 ]

Peng Z M, Li Z C, Zhang J G, Li Y, Qi G J and Tang J H. 2019. Few-shot image recognition with knowledge transfer//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Seoul, Korea (South): IEEE: 441-449[ DOI: 10.1109/ICCV.2019.00053 http://dx.doi.org/10.1109/ICCV.2019.00053 ]

Qi X J, Chen Q F, Jia J Y and Koltun V. 2018. Semi-parametric Image Synthesis//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 8808-8816[ DOI: 10.1109/CVPR.2018.00918 http://dx.doi.org/10.1109/CVPR.2018.00918 ]

Wang T C, Liu M Y, Zhu J Y, Tao A, Kautz J and Catanzaro B. 2018. High-resolution image synthesis and semantic manipulation with conditional GANs//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 8798-8807[ DOI: 10.1109/CVPR.2018.00917 http://dx.doi.org/10.1109/CVPR.2018.00917 ]

Wei L H, Zhang S L, Gao W and Tian Q. 2018. Person transfer GAN to bridge domain gap for person re-identification//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 79-88[ DOI: 10.1109/CVPR.2018.00016 http://dx.doi.org/10.1109/CVPR.2018.00016 ]

Yang X L, Lin S Z, Lu X F, Wang L F, Li D W and Wang B. 2019. Multimodal image fusion based on generative adversarial networks. Laser and Optoelectronics Progress, 56(16): #161004

杨晓莉, 蔺素珍, 禄晓飞, 王丽芳, 李大威, 王斌. 2019. 基于生成对抗网络的多模态图像融合. 激光与光电子学进展, 56(16): #161004 [DOI: 10.3788/LOP56.161004]

Yao N M, Guo Q P, Qiao F C, Chen H and Wang H A. 2018. Robust facial expression recognition with generative adversarial networks. Acta Automatica Sinica, 44(5): 865-877

姚乃明, 郭清沛, 乔逢春, 陈辉, 王宏安. 2018. 基于生成式对抗网络的鲁棒人脸表情识别. 自动化学报, 44(5): 865-877 [DOI: 10.16383/j.aas.2018.c170477]

Zhang H, Han H, Cui J Y, Shan S G and Chen X L. 2018. RGB-D face recognition via deep complementary and common feature learning//Proceedings of the 13th IEEE International Conference on Automatic Face and Gesture Recognition. Xi'an, China: IEEE: 8-15[ DOI: 10.1109/FG.2018.00012 http://dx.doi.org/10.1109/FG.2018.00012 ]

Zhu J Y, Park T, Isola P and Efros A A. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks//Proceedings of 2017 IEEE International Conference on Computer Vision (ICCV). Venice, Italy: IEEE: 2242-2251[ DOI: 10.1109/ICCV.2017.244 http://dx.doi.org/10.1109/ICCV.2017.244 ]

文章被引用时，请邮件提醒。

提交

暂无数据