Re-GAN：残差生成式对抗网络算法

史彩娟; 涂冬景; 刘靖祎

doi:10.11834/jig.200069

图像理解和计算机视觉 | 浏览量 : 0 下载量: 0 CSCD: 4

PDF
导出
分享
收藏
专辑

Re-GAN：残差生成式对抗网络算法
Re-GAN: residual generative adversarial network algorithm
2021年26卷第3期页码：594-604
纸质出版日期： 2021-03-16 ，

录用日期： 2020-06-15
DOI： 10.11834/jig.200069
稿件说明：

移动端阅览

史彩娟, 涂冬景, 刘靖祎. Re-GAN：残差生成式对抗网络算法[J]. 中国图象图形学报, 2021,26(3):594-604.

Caijuan Shi, Dongjing Tu, Jingyi Liu. Re-GAN: residual generative adversarial network algorithm[J]. Journal of Image and Graphics, 2021,26(3):594-604.
史彩娟, 涂冬景, 刘靖祎. Re-GAN：残差生成式对抗网络算法[J]. 中国图象图形学报, 2021,26(3):594-604. DOI： 10.11834/jig.200069.

Caijuan Shi, Dongjing Tu, Jingyi Liu. Re-GAN: residual generative adversarial network algorithm[J]. Journal of Image and Graphics, 2021,26(3):594-604. DOI： 10.11834/jig.200069.

摘要

目的

生成式对抗网络（generative adversarial network，GAN）是一种无监督生成模型，通过生成模型和判别模型的博弈学习生成图像。GAN的生成模型是逐级直接生成图像，下级网络无法得知上级网络学习的特征，以至于生成的图像多样性不够丰富。另外，随着网络层数的增加，参数变多，反向传播变得困难，出现训练不稳定和梯度消失等问题。针对上述问题，基于残差网络（residual network，ResNet）和组标准化（group normalization，GN），提出了一种残差生成式对抗网络（residual generative adversarial networks，Re-GAN）。

方法

Re-GAN在生成模型中构建深度残差网络模块，通过跳连接的方式融合上级网络学习的特征，增强生成图像的多样性和质量，改善反向传播过程，增强生成式对抗网络的训练稳定性，缓解梯度消失。随后采用组标准化（GN）来适应不同批次的学习，使训练过程更加稳定。

结果

在Cifar10、CelebA和LSUN数据集上对算法的性能进行测试。Re-GAN的IS（inception score）均值在批次为64时，比DCGAN（deep convolutional GAN）和WGAN（Wasserstein-GAN）分别提高了5%和30%，在批次为4时，比DCGAN和WGAN分别提高了0.2%和13%，表明无论批次大小，Re-GAN生成图像具有很好的多样性。Re-GAN的FID（Fréchet inception distance）在批次为64时比DCGAN和WGAN分别降低了18%和11%，在批次为4时比DCGAN和WGAN分别降低了4%和10%，表明Re-GAN生成图像的质量更好。同时，Re-GAN缓解了训练过程中出现的训练不稳定和梯度消失等问题。

结论

实验结果表明，在图像生成方面，Re-GAN的生成图像质量高、多样性强；在网络训练方面，Re-GAN在不同批次下的训练具有更好的兼容性，使训练过程更加稳定，梯度消失得到缓解。

Abstract

Objective

A generative adversarial network (GAN) is a currently popular unsupervised generation model that generates images via game learning of the generative and discriminative models. The generative model uses Gaussian noise to generate probability distribution

and the discriminative model distinguishes between the generated and real probability distributions. In the ideal state

the discriminative model cannot distinguish between the two data distributions. However

achieving Nash equilibrium between the generative and discriminative models is difficult. Simultaneously

some problems

such as unstable training

gradient disappearance

and poor image quality

occur. Therefore

many studies have been conducted to address these problems

and these studies can be divided into two directions. One direction involves selecting the appropriate loss function

and the other direction involves changing the structure of GAN

e.g.

from a fully connected neural network to a convolutional neural network (CNN). A typical work involves deep convolutional GANs (DCGANs)

which adopts CNN and batch normalization (BN). Although DCGAN shave achieved good performance

some problems persist in the training process. Increasing the number of network layers leads to more errors

particularly gradient disappearance when the number of neural network layers is extremely high. In addition

BN leads to poor stability in the training process

particularly with small batch samples. In general

as the number of layers increases

the number of parameters increases and backpropagation becomes difficult as the number of layers increases

resulting in some problems

such as unstable training and gradient disappearance. In addition

the generative model directly generates images step by step

and a lower level network cannot determine the features learned by a higher level network

and thus

the diversity of the generated images is not sufficiently rich. To address the a fore mentioned problems

a residual GAN (Re-GAN) is proposed based on a residual network (ResNet) and group normalization (GN).

Method

ResNet has been recently proposed to solve the problem of network degradation caused by too many layers of a deep neural network and has been applied to image classification due to its good performance. In contrast with BN

GN divides channels into groups and calculates the normalized mean and variance within each group. Calculation is stable and independent of batch size. Therefore

we apply ResNet and GN to GAN to propose Re-GAN. First

a residual module ResNet is introduced into the generative model of GAN by adding the input and the mapping to the output of the layer to prevent gradient disappearance and enhance training stability. Moreover

the residual module ResNet optimizes feature transmission between neural network layers and enhances the diversity and quality of the generated image. Second

Re-GAN adopts the standardized GN to adapt to different batch learning. GN can reduce the difficulty of standardization caused by the lack of training samples and stabilize the training process of the network. Moreover

when the number of samples is sufficient

GN can make the calculated results match well with the sample distribution and exhibit good compatibility.

Result

To verify the effectiveness of the proposed algorithm Re-GAN

we compare it with DCGAN and Wasserstein-GAN (WGAN) with different batches of samples on three datasets namely

Cifar10

CcelebA

and LSUN bedroom. Two evaluation criteria

i.e.

inception score (IS) and Fréchet inception distance (FID)

are adopted in our experiments. As a common evaluation criterion for GAN

IS uses the inception network trained on ImageNet to calculate the information of the generated images. IS focuses on the evaluation of the quality but not the diversity of the generated images. When IS is larger

the quality of the generated images is better. FID is more robust to noise and more suitable for describing the diversity of the generated images. It is computed via a set of generated images and a set of ground images. When FID is smaller

the diversity of the generated images is better. We can obtain the following experimental results. 1) When the batch number is 64

the IS of the proposed algorithm Re-GAN is 5% higher than that of DCGAN and 30% higher than that of WGAN. When the batch is 4

the IS of Re-GAN is 0.2% higher than that of DCGAN and 13% higher than that of WGAN. These results show that the images generated by Re-GAN exhibit good diversity regardless of batch size. 2) When the batch number is 64

the FID of Re-GAN is 18% lower than that of DCGAN and 11% lower than that of WGAN. When the batch number is 4

the FID of Re-GAN is 4% lower than that of DCGAN and 10% lower than that of WGAN. These results indicate that the proposed algorithm Re-GAN can generate images with higher quality. 3) Training instability and gradient disappearance are alleviated during the training process.

Conclusion

The performance of the proposed Re-GAN is tested using two evaluation criteria

i.e.

IS and FID

on three datasets. Extensive experiments are conducted

and the experimental results indicate the following findings. In the aspect of image generation

Re-GAN generates high-quality images with rich diversity. In the aspect of network training

Re-GAN guarantees that training exhibits better compatibility regardless of whether the batch is large or small

and then it makes the training process more stable and alleviates gradient disappearance. In addition

compared with DCGAN and WGAN

the proposed Re-GAN exhibits better performance

which can be attributed to the ResNet and GN adopted in Re-GAN.

关键词

图像生成深度学习卷积神经网络生成式对抗网络残差网络组标准化

Keywords

image generationdeep learningconvolutional neural network (CNN)generative adversarial network (GAN)residual network (ResNet)group normalization (GN)

references

Arjovsky M, Chintala S and Bottou L. 2017. Wasserstein GAN[EB/OL].[2020-02-13].https://arxiv.org/pdf/1701.07875.pdfhttps://arxiv.org/pdf/1701.07875.pdf

Ba J L, Kiros J R and Hinton G E. 2016. Layer normalization[EB/OL].[2020-02-13].https://arxiv.org/pdf/1607.06450.pdfhttps://arxiv.org/pdf/1607.06450.pdf

Barratt S and Sharma R. 2018. A note on the inception score[EB/OL].[2020-02-13].https://arxiv.org/pdf/1801.01973.pdfhttps://arxiv.org/pdf/1801.01973.pdf

Cao Y J, Jia L L, Chen Y X, Lin N and Li X X. 2018. Review of computer vision based on generative adversarial networks. Journal of Image and Graphics, 23(10): 1433-1449

曹仰杰, 贾丽丽, 陈永霞, 林楠, 李学相. 2018. 生成式对抗网络及其计算机视觉应用研究综述. 中国图象图形学报, 23(10): 1433-1449[DOI:10.11834/jig.180103]

Che T, Li Y R, Jacob A P, Bengio Y and Li W J. 2016. Mode regularized generative adversarial networks[EB/OL].[2020-02-13].https://arxiv.org/pdf/1612.02136.pdfhttps://arxiv.org/pdf/1612.02136.pdf

Chen X, Duan Y, Houthooft R, Schulman J, Sutskever I and Abbeel P. 2016. InfoGAN: Interpretable representation learning by information maximizing generative adversarial nets//Proceedings of the 30th International Conference on Neural Information Processing Systems. Barcelona, Spain: ACM: 2180-2188[DOI: 10.5555/3157096.3157340http://dx.doi.org/10.5555/3157096.3157340]

Donahue J, Krähenbühl P and Darrell T. 2016. Adversarial feature learning[EB/OL].[2020-02-13].https://arxiv.org/pdf/1605.09782.pdfhttps://arxiv.org/pdf/1605.09782.pdf

Dumoulin V, Belghazi I, Poole B, Mastropietro O, Lamb A, Arjovsky M and Courville A. 2016. Adversarially learned inference[EB/OL].[2020-02-13].https://arxiv.org/pdf/1606.00704.pdfhttps://arxiv.org/pdf/1606.00704.pdf

Goodfellow I J, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A and Bengio Y. 2014. Generative adversarial nets//Proceedings of the 27th International Conference on Neural Information Processing Systems. Montreal, Canada: MIT Press: 2672-2680

He K M, Zhang X Y, Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 770-778[DOI: 10.1109/CVPR.2016.90http://dx.doi.org/10.1109/CVPR.2016.90]

Heusel M, Ramsauer H, Unterthiner T, Nessler B and Hochreiter S. 2017. GANs trained by a two time-scale update rule converge to a local Nash equilibrium//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA: ACM: 6629-6640

Huang F R, Ash J, Langford J and Schapire R. 2018. Learning deep ResNet blocks sequentially using boosting theory[EB/OL].[2020-02-13].https://arxiv.org/pdf/1706.04964.pdfhttps://arxiv.org/pdf/1706.04964.pdf

Ioffe S and Szegedy C. 2015. Batch normalization: accelerating deep network training by reducing internal covariate shift//Proceedings of the 32nd International Conference on Machine Learning. Lille, France: ACM: 448-456

Isola P, Zhu J Y, Zhou T H and Efros A A. 2017. Image-to-image translation with conditional adversarial networks//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 5967-5976[DOI: 10.1109/CVPR.2017.632http://dx.doi.org/10.1109/CVPR.2017.632]

Larsen A B L, Sønderby S K, Larochelle H and Winther. 2016. Autoencoding beyond pixels using a learned similarity metric//Proceedings of the 33rd International Conference on Machine Learning. New York, USA: ACM: 1558-1566

Lim B, Son S, Kim H, Nah S and Lee K M. 2017. Enhanced deep residual networks for single image super-resolution//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Honolulu, USA: IEEE: 1132-1140[DOI: 10.1109/CVPRW.2017.151http://dx.doi.org/10.1109/CVPRW.2017.151]

LiuM Y and Tuzel O. 2016. Coupled generative adversarial networks[EB/OL].[2020-02-13].https://arxiv.org/pdf/1606.07536.pdfhttps://arxiv.org/pdf/1606.07536.pdf

Mathieu M, Couprie C and LeCun Y. 2015. Deep multi-scale video prediction beyond mean square error[EB/OL].[2020-02-13].https://arxiv.org/pdf/1511.05440.pdfhttps://arxiv.org/pdf/1511.05440.pdf

Mirza M and Osindero S. 2014. Conditional generative adversarial nets[EB/OL].[2020-02-13].https://arxiv.org/pdf/1411.1784.pdfhttps://arxiv.org/pdf/1411.1784.pdf

Nitanda A and Suzuki T. 2018. Functional gradient boosting based on residual network perception[EB/OL].[2020-02-13].https://arxiv.org/pdf/1802.09031.pdfhttps://arxiv.org/pdf/1802.09031.pdf

Qiu Z F, Yao T and Mei T. 2017. Learning spatio-temporal representation with pseudo-3D residual networks//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 5534-5542[DOI: 10.1109/ICCV.2017.590http://dx.doi.org/10.1109/ICCV.2017.590]

Radford A, Metz L and Chintala S. 2015. Unsupervised representation learning with deep convolutional generative adversarial networks[EB/OL].[2020-02-13].https://arxiv.org/pdf/1511.06434.pdfhttps://arxiv.org/pdf/1511.06434.pdf

Rezende D J, Mohamed S and Wierstra D. 2014. Stochastic backpropagation and approximate inference in deep generative models//Proceedings of the 31st International Conference on Machine Learning. Beijing, China: ACM: Ⅱ-1278-1286

Silver D, Schrittwieser J, Simonyan K Antonoglou I, Huang A, Guez A, Hubert T, Baker L, Lai M, Bolton A, Chen Y T, Lillicrap T, Hui F, Sifre L, Van Den Driessche G, Graepel T and Hassabis D. 2017. Mastering the game of Go without human knowledge. Nature, 550(7676): 354-359[DOI:10.1038/nature24270]

Ulyanov D, Vedaldi A and Lempitsky V. 2016. Instance normalization: the missing ingredient for fast stylization[EB/OL].[2020-02-13].https://arxiv.org/pdf/1607.08022.pdfhttps://arxiv.org/pdf/1607.08022.pdf

Wang Y X, Zhang L C and Van De Weijer J. 2016. Ensembles of generative adversarial networks[EB/OL].[2020-02-13].https://arxiv.org/pdf/1612.00991.pdfhttps://arxiv.org/pdf/1612.00991.pdf

Wu Y X and He K M. 2020. Group normalization. International Journal of Computer Vision, 128(3): 742-755[DOI:10.1007/s11263-019-01198-w]

Yeh R A, Chen C, Lim T Y, Schwing A G, Hasegawa-Johnson M and Do M N. 2017. Semantic image inpainting with deep generative models//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 6882-6890[DOI: 10.1109/CVPR.2017.728http://dx.doi.org/10.1109/CVPR.2017.728]

Zhu J Y, Park T, Isola P and Efros A A. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks//Proceedings of 2017 IEEE International Conference on Computer Vision (ICCV). Venice, Italy: IEEE: 2242-2251[DOI: 10.1109/ICCV.2017.244http://dx.doi.org/10.1109/ICCV.2017.244]

Zhu W T, Xiang X, Tran T D and Xie X H. 2016. Adversarial deep structural networks for mammographic mass segmentation[EB/OL].[2020-02-13].https://arxiv.org/pdf/1612.05970.pdfhttps://arxiv.org/pdf/1612.05970.pdf

文章被引用时，请邮件提醒。

提交