Re-GAN:残差生成式对抗网络算法
Re-GAN: residual generative adversarial network algorithm
- 2021年26卷第3期 页码:594-604
纸质出版日期: 2021-03-16 ,
录用日期: 2020-06-15
DOI: 10.11834/jig.200069
移动端阅览
浏览全部资源
扫码关注微信
纸质出版日期: 2021-03-16 ,
录用日期: 2020-06-15
移动端阅览
史彩娟, 涂冬景, 刘靖祎. Re-GAN:残差生成式对抗网络算法[J]. 中国图象图形学报, 2021,26(3):594-604.
Caijuan Shi, Dongjing Tu, Jingyi Liu. Re-GAN: residual generative adversarial network algorithm[J]. Journal of Image and Graphics, 2021,26(3):594-604.
目的
2
生成式对抗网络(generative adversarial network,GAN)是一种无监督生成模型,通过生成模型和判别模型的博弈学习生成图像。GAN的生成模型是逐级直接生成图像,下级网络无法得知上级网络学习的特征,以至于生成的图像多样性不够丰富。另外,随着网络层数的增加,参数变多,反向传播变得困难,出现训练不稳定和梯度消失等问题。针对上述问题,基于残差网络(residual network,ResNet)和组标准化(group normalization,GN),提出了一种残差生成式对抗网络(residual generative adversarial networks,Re-GAN)。
方法
2
Re-GAN在生成模型中构建深度残差网络模块,通过跳连接的方式融合上级网络学习的特征,增强生成图像的多样性和质量,改善反向传播过程,增强生成式对抗网络的训练稳定性,缓解梯度消失。随后采用组标准化(GN)来适应不同批次的学习,使训练过程更加稳定。
结果
2
在Cifar10、CelebA和LSUN数据集上对算法的性能进行测试。Re-GAN的IS(inception score)均值在批次为64时,比DCGAN(deep convolutional GAN)和WGAN(Wasserstein-GAN)分别提高了5%和30%,在批次为4时,比DCGAN和WGAN分别提高了0.2%和13%,表明无论批次大小,Re-GAN生成图像具有很好的多样性。Re-GAN的FID(Fréchet inception distance)在批次为64时比DCGAN和WGAN分别降低了18%和11%,在批次为4时比DCGAN和WGAN分别降低了4%和10%,表明Re-GAN生成图像的质量更好。同时,Re-GAN缓解了训练过程中出现的训练不稳定和梯度消失等问题。
结论
2
实验结果表明,在图像生成方面,Re-GAN的生成图像质量高、多样性强;在网络训练方面,Re-GAN在不同批次下的训练具有更好的兼容性,使训练过程更加稳定,梯度消失得到缓解。
Objective
2
A generative adversarial network (GAN) is a currently popular unsupervised generation model that generates images via game learning of the generative and discriminative models. The generative model uses Gaussian noise to generate probability distribution
and the discriminative model distinguishes between the generated and real probability distributions. In the ideal state
the discriminative model cannot distinguish between the two data distributions. However
achieving Nash equilibrium between the generative and discriminative models is difficult. Simultaneously
some problems
such as unstable training
gradient disappearance
and poor image quality
occur. Therefore
many studies have been conducted to address these problems
and these studies can be divided into two directions. One direction involves selecting the appropriate loss function
and the other direction involves changing the structure of GAN
e.g.
from a fully connected neural network to a convolutional neural network (CNN). A typical work involves deep convolutional GANs (DCGANs)
which adopts CNN and batch normalization (BN). Although DCGAN shave achieved good performance
some problems persist in the training process. Increasing the number of network layers leads to more errors
particularly gradient disappearance when the number of neural network layers is extremely high. In addition
BN leads to poor stability in the training process
particularly with small batch samples. In general
as the number of layers increases
the number of parameters increases and backpropagation becomes difficult as the number of layers increases
resulting in some problems
such as unstable training and gradient disappearance. In addition
the generative model directly generates images step by step
and a lower level network cannot determine the features learned by a higher level network
and thus
the diversity of the generated images is not sufficiently rich. To address the a fore mentioned problems
a residual GAN (Re-GAN) is proposed based on a residual network (ResNet) and group normalization (GN).
Method
2
ResNet has been recently proposed to solve the problem of network degradation caused by too many layers of a deep neural network and has been applied to image classification due to its good performance. In contrast with BN
GN divides channels into groups and calculates the normalized mean and variance within each group. Calculation is stable and independent of batch size. Therefore
we apply ResNet and GN to GAN to propose Re-GAN. First
a residual module ResNet is introduced into the generative model of GAN by adding the input and the mapping to the output of the layer to prevent gradient disappearance and enhance training stability. Moreover
the residual module ResNet optimizes feature transmission between neural network layers and enhances the diversity and quality of the generated image. Second
Re-GAN adopts the standardized GN to adapt to different batch learning. GN can reduce the difficulty of standardization caused by the lack of training samples and stabilize the training process of the network. Moreover
when the number of samples is sufficient
GN can make the calculated results match well with the sample distribution and exhibit good compatibility.
Result
2
To verify the effectiveness of the proposed algorithm Re-GAN
we compare it with DCGAN and Wasserstein-GAN (WGAN) with different batches of samples on three datasets namely
Cifar10
CcelebA
and LSUN bedroom. Two evaluation criteria
i.e.
inception score (IS) and Fréchet inception distance (FID)
are adopted in our experiments. As a common evaluation criterion for GAN
IS uses the inception network trained on ImageNet to calculate the information of the generated images. IS focuses on the evaluation of the quality but not the diversity of the generated images. When IS is larger
the quality of the generated images is better. FID is more robust to noise and more suitable for describing the diversity of the generated images. It is computed via a set of generated images and a set of ground images. When FID is smaller
the diversity of the generated images is better. We can obtain the following experimental results. 1) When the batch number is 64
the IS of the proposed algorithm Re-GAN is 5% higher than that of DCGAN and 30% higher than that of WGAN. When the batch is 4
the IS of Re-GAN is 0.2% higher than that of DCGAN and 13% higher than that of WGAN. These results show that the images generated by Re-GAN exhibit good diversity regardless of batch size. 2) When the batch number is 64
the FID of Re-GAN is 18% lower than that of DCGAN and 11% lower than that of WGAN. When the batch number is 4
the FID of Re-GAN is 4% lower than that of DCGAN and 10% lower than that of WGAN. These results indicate that the proposed algorithm Re-GAN can generate images with higher quality. 3) Training instability and gradient disappearance are alleviated during the training process.
Conclusion
2
The performance of the proposed Re-GAN is tested using two evaluation criteria
i.e.
IS and FID
on three datasets. Extensive experiments are conducted
and the experimental results indicate the following findings. In the aspect of image generation
Re-GAN generates high-quality images with rich diversity. In the aspect of network training
Re-GAN guarantees that training exhibits better compatibility regardless of whether the batch is large or small
and then it makes the training process more stable and alleviates gradient disappearance. In addition
compared with DCGAN and WGAN
the proposed Re-GAN exhibits better performance
which can be attributed to the ResNet and GN adopted in Re-GAN.
图像生成深度学习卷积神经网络生成式对抗网络残差网络组标准化
image generationdeep learningconvolutional neural network (CNN)generative adversarial network (GAN)residual network (ResNet)group normalization (GN)
Arjovsky M, Chintala S and Bottou L. 2017. Wasserstein GAN[EB/OL].[2020-02-13].https://arxiv.org/pdf/1701.07875.pdfhttps://arxiv.org/pdf/1701.07875.pdf
Ba J L, Kiros J R and Hinton G E. 2016. Layer normalization[EB/OL].[2020-02-13].https://arxiv.org/pdf/1607.06450.pdfhttps://arxiv.org/pdf/1607.06450.pdf
Barratt S and Sharma R. 2018. A note on the inception score[EB/OL].[2020-02-13].https://arxiv.org/pdf/1801.01973.pdfhttps://arxiv.org/pdf/1801.01973.pdf
Cao Y J, Jia L L, Chen Y X, Lin N and Li X X. 2018. Review of computer vision based on generative adversarial networks. Journal of Image and Graphics, 23(10): 1433-1449
曹仰杰, 贾丽丽, 陈永霞, 林楠, 李学相. 2018. 生成式对抗网络及其计算机视觉应用研究综述. 中国图象图形学报, 23(10): 1433-1449[DOI:10.11834/jig.180103]
Che T, Li Y R, Jacob A P, Bengio Y and Li W J. 2016. Mode regularized generative adversarial networks[EB/OL].[2020-02-13].https://arxiv.org/pdf/1612.02136.pdfhttps://arxiv.org/pdf/1612.02136.pdf
Chen X, Duan Y, Houthooft R, Schulman J, Sutskever I and Abbeel P. 2016. InfoGAN: Interpretable representation learning by information maximizing generative adversarial nets//Proceedings of the 30th International Conference on Neural Information Processing Systems. Barcelona, Spain: ACM: 2180-2188[DOI: 10.5555/3157096.3157340http://dx.doi.org/10.5555/3157096.3157340]
Donahue J, Krähenbühl P and Darrell T. 2016. Adversarial feature learning[EB/OL].[2020-02-13].https://arxiv.org/pdf/1605.09782.pdfhttps://arxiv.org/pdf/1605.09782.pdf
Dumoulin V, Belghazi I, Poole B, Mastropietro O, Lamb A, Arjovsky M and Courville A. 2016. Adversarially learned inference[EB/OL].[2020-02-13].https://arxiv.org/pdf/1606.00704.pdfhttps://arxiv.org/pdf/1606.00704.pdf
Goodfellow I J, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A and Bengio Y. 2014. Generative adversarial nets//Proceedings of the 27th International Conference on Neural Information Processing Systems. Montreal, Canada: MIT Press: 2672-2680
He K M, Zhang X Y, Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 770-778[DOI: 10.1109/CVPR.2016.90http://dx.doi.org/10.1109/CVPR.2016.90]
Heusel M, Ramsauer H, Unterthiner T, Nessler B and Hochreiter S. 2017. GANs trained by a two time-scale update rule converge to a local Nash equilibrium//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA: ACM: 6629-6640
Huang F R, Ash J, Langford J and Schapire R. 2018. Learning deep ResNet blocks sequentially using boosting theory[EB/OL].[2020-02-13].https://arxiv.org/pdf/1706.04964.pdfhttps://arxiv.org/pdf/1706.04964.pdf
Ioffe S and Szegedy C. 2015. Batch normalization: accelerating deep network training by reducing internal covariate shift//Proceedings of the 32nd International Conference on Machine Learning. Lille, France: ACM: 448-456
Isola P, Zhu J Y, Zhou T H and Efros A A. 2017. Image-to-image translation with conditional adversarial networks//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 5967-5976[DOI: 10.1109/CVPR.2017.632http://dx.doi.org/10.1109/CVPR.2017.632]
Larsen A B L, Sønderby S K, Larochelle H and Winther. 2016. Autoencoding beyond pixels using a learned similarity metric//Proceedings of the 33rd International Conference on Machine Learning. New York, USA: ACM: 1558-1566
Lim B, Son S, Kim H, Nah S and Lee K M. 2017. Enhanced deep residual networks for single image super-resolution//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Honolulu, USA: IEEE: 1132-1140[DOI: 10.1109/CVPRW.2017.151http://dx.doi.org/10.1109/CVPRW.2017.151]
LiuM Y and Tuzel O. 2016. Coupled generative adversarial networks[EB/OL].[2020-02-13].https://arxiv.org/pdf/1606.07536.pdfhttps://arxiv.org/pdf/1606.07536.pdf
Mathieu M, Couprie C and LeCun Y. 2015. Deep multi-scale video prediction beyond mean square error[EB/OL].[2020-02-13].https://arxiv.org/pdf/1511.05440.pdfhttps://arxiv.org/pdf/1511.05440.pdf
Mirza M and Osindero S. 2014. Conditional generative adversarial nets[EB/OL].[2020-02-13].https://arxiv.org/pdf/1411.1784.pdfhttps://arxiv.org/pdf/1411.1784.pdf
Nitanda A and Suzuki T. 2018. Functional gradient boosting based on residual network perception[EB/OL].[2020-02-13].https://arxiv.org/pdf/1802.09031.pdfhttps://arxiv.org/pdf/1802.09031.pdf
Qiu Z F, Yao T and Mei T. 2017. Learning spatio-temporal representation with pseudo-3D residual networks//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 5534-5542[DOI: 10.1109/ICCV.2017.590http://dx.doi.org/10.1109/ICCV.2017.590]
Radford A, Metz L and Chintala S. 2015. Unsupervised representation learning with deep convolutional generative adversarial networks[EB/OL].[2020-02-13].https://arxiv.org/pdf/1511.06434.pdfhttps://arxiv.org/pdf/1511.06434.pdf
Rezende D J, Mohamed S and Wierstra D. 2014. Stochastic backpropagation and approximate inference in deep generative models//Proceedings of the 31st International Conference on Machine Learning. Beijing, China: ACM: Ⅱ-1278-1286
Silver D, Schrittwieser J, Simonyan K Antonoglou I, Huang A, Guez A, Hubert T, Baker L, Lai M, Bolton A, Chen Y T, Lillicrap T, Hui F, Sifre L, Van Den Driessche G, Graepel T and Hassabis D. 2017. Mastering the game of Go without human knowledge. Nature, 550(7676): 354-359[DOI:10.1038/nature24270]
Ulyanov D, Vedaldi A and Lempitsky V. 2016. Instance normalization: the missing ingredient for fast stylization[EB/OL].[2020-02-13].https://arxiv.org/pdf/1607.08022.pdfhttps://arxiv.org/pdf/1607.08022.pdf
Wang Y X, Zhang L C and Van De Weijer J. 2016. Ensembles of generative adversarial networks[EB/OL].[2020-02-13].https://arxiv.org/pdf/1612.00991.pdfhttps://arxiv.org/pdf/1612.00991.pdf
Wu Y X and He K M. 2020. Group normalization. International Journal of Computer Vision, 128(3): 742-755[DOI:10.1007/s11263-019-01198-w]
Yeh R A, Chen C, Lim T Y, Schwing A G, Hasegawa-Johnson M and Do M N. 2017. Semantic image inpainting with deep generative models//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 6882-6890[DOI: 10.1109/CVPR.2017.728http://dx.doi.org/10.1109/CVPR.2017.728]
Zhu J Y, Park T, Isola P and Efros A A. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks//Proceedings of 2017 IEEE International Conference on Computer Vision (ICCV). Venice, Italy: IEEE: 2242-2251[DOI: 10.1109/ICCV.2017.244http://dx.doi.org/10.1109/ICCV.2017.244]
Zhu W T, Xiang X, Tran T D and Xie X H. 2016. Adversarial deep structural networks for mammographic mass segmentation[EB/OL].[2020-02-13].https://arxiv.org/pdf/1612.05970.pdfhttps://arxiv.org/pdf/1612.05970.pdf
相关作者
相关机构