生成式对抗网络及其计算机视觉应用研究综述

曹仰杰; 贾丽丽; 陈永霞; 林楠; 李学相

发布时间： 2018-09-21
摘要点击次数： 5564
全文下载次数： 1688
DOI: 10.11834/jig.180103
2018 | Volume 23 | Number 10

生成式对抗网络及其计算机视觉应用研究综述

曹仰杰, 贾丽丽, 陈永霞, 林楠, 李学相(郑州大学软件与应用科技学院, 郑州 450000)

摘要

目的生成式对抗网络（GAN）的出现为计算机视觉应用提供了新的技术和手段，它以独特零和博弈与对抗训练的思想生成高质量的样本，具有比传统机器学习算法更强大的特征学习和特征表达能力。目前在机器视觉领域尤其是样本生成领域取得了显著的成功，是当前研究的热点方向之一。方法以生成式对抗网络的不同模型及其在计算机视觉领域的应用为研究对象，在广泛调研文献特别是GAN的最新发展成果基础上，结合不同模型的对比试验，对每种方法的基本思想、方法特点及使用场景进行分析，并对GAN的优势与劣势进行总结，阐述了GAN研究的现状、在计算机视觉上的应用范围，归纳生成式对抗网络在高质量图像生成、风格迁移与图像翻译、文本与图像的相互生成和图像的还原与修复等多个计算机视觉领域的研究现状和发展趋势，并对每种应用的理论改进之处、优点、局限性及使用场景进行了总结，对未来可能的发展方向进行展望。结果 GAN的不同模型在生成样本质量与性能上各有优劣。当前的GAN模型在图像的处理上取得较大的成就，能生成以假乱真的样本，但是也存在网络不收敛、模型易崩溃、过于自由不可控的问题。结论 GAN作为一种新的生成模型具有很高的研究价值与应用价值，但目前存在一些理论上的桎梏亟待突破，在应用方面生成高质量的样本、逼真的场景是值得研究的方向。

关键词

生成式对抗网络计算机视觉图像生成图像风格迁移图像修复

Review of computer vision based on generative adversarial networks

Cao Yangjie, Jia Lili, Chen Yongxia, Lin Nan, Li Xuexiang(Department of Software Engineering and Applied Science and Technology, Zhengzhou University, Zhengzhou 450000, China)

Abstract

Objective The appearance of generative adversarial networks (GANs) provides a new approach and a framework for the application of computer vision.GAN generates high-quality samples with unique zero-sum game and adversarial training concepts,and therefore more powerful in both feature learning and representation than traditional machine learning algorithms.Remarkable achievements have been realized in the field of computer vision,especially in sample generation,which is one of the popular topics in current research.Method The research and application of different GAN models based on computer vision are reviewed based on the extensive research and the latest achievements of relevant literature.The typical GAN network methods are introduced,categorized,and compared in experiments by using generation samples to present their performance and summarized the research status and development trends in computer vision fields,such as high-quality image generation,style transfer and image translation,text-image mutual generation,image inpainting,and restoration.Finally,existing major research problems are summarized and discussed,and potential future research directions are presented.Result Since the emergence of GAN,many variations have been proposed for different fields,either structural improvement or development of theory or innovation in applications.Different GAN models have advantages and disadvantages in terms of generating examples,have significant achievements in many fields,especially the computer vision,and can generate examples such as the real ones.However,they also have unique problems,such as non-convergence,model collapse,and uncontrollability due to high degree-of-freedom.Priori hypotheses about the data in the original GAN,whose final goals are to realize infinite modeling power and fit for all distributions,hardly exits.In addition,the designs of GAN models are simple.A complex function model need not be pre-designed,and the generator and the discriminator can work normally with the back propagation algorithm.Moreover,GAN can use a machine to interact with other machines through continuous confrontation and learn the inherent laws in the real world with sufficient data training.Each aspect has two sides,and a series of problems are hidden behind the goal of infinite modeling.The generation process is extremely flexible that the stability and convergence of the training process cannot be guaranteed.Model collapse will likely occur and further training cannot be achieved.The original GAN has the following problems:disappearance of gradients,training difficulties,the losses of generator and discriminator cannot indicate the training process,the lack of diversities in the generated samples,and easy over-fitting.Discrete distributions are also difficult to generate due to the limitations of GAN.Many researchers have proposed new ways to address these problems,and several landmark models,such as DCGAN,CGAN,WGAN,WGAN-GP,EBGAN,BEGAN,InfoGAN,and LSGAN,have been introduced.DCGAN combines GAN with CNN and performs well in the field of computer vision.Furthermore,DCGAN sets a series of limitations for the CNN network so it can be trained stably and use the learned feature representation for sample generation and image classification.CGAN inputs the conditional variable (c) with the random variable (z) and the real data (x) to guide the data generation process.The conditional variable (c) can be category labels,texts,and generated targets.The straightforward improvement proves to be extremely effective and has been widely used in subsequent work.WGAN uses the Wasserstein distance to measure the distance between the real and generated samples instead of the JS divergence.The Wasserstein distance has the following advantages.It can measure distance even if the two distributions do not overlap,has excellent smoothing properties,and can solve the gradients disappearance problem to some degrees.In addition,WGAN solves the problems of instability in training,diversifies the generated examples,and does not require the careful balancing of the training of G and D.WGAN-GP replaces the weight pruning in WGAN to implement the Lipschitz constraint method.Experiments show that the quality of samples generated by WGAN-GP is higher than those of WGAN.It also provides stable training without hyperparameters and successfully trains various generating tasks.However,the convergence speed of WGAN-GP is slower,that is,it takes more time to converge under the same dataset.The EBGAN interprets GAN from the perspective of energy.It can learn the probability distributions of images with low convergence speed.The images BEGAN products are still disorganized,whereas other models have been able to express the outline of the objects roughly.However,the images generated by BEGAN have the sharpest edges and rich image diversities in the experiments.The discriminator of BEGAN draws lessons from EBGAN,and the loss of generator refers to the loss of WGAN.It also proposes a hyper parameter that can measure the diversity of generated samples to balance D and G and stabilize the training process.The internal texture of the generated images of InfoGAN is poor,and the shape of the generated objects is the same.As for the generator,in addition to the input noise (z),a controllable variable (c) is added,which contains interpretable information about the data to control the generative results,resulting in poor diversity.LSGAN can generate high quality examples because the object function of least squares loss replaces the cross-entropy loss,which partly solves the two shortcomings (i.e.,low-quality and instability of training process).Conclusion GAN has significant theoretical and practical values as a new generative model.It provides a good solution to problems of insufficient sample,poor quality of generation,and difficulties in extracting features.GAN is an inclusive framework that can be combined with most deep learning algorithms to solve problems that traditional machine learning algorithms cannot solve.However,it has theoretical problems that must be solved urgently.How to generate high-quality examples and a realistic scene is worth studying.Further GAN developments are predicted in the following areas:breakthrough of theory,development of algorithm,system of evaluation,system of specialism,and combination of industry.

Keywords

generative adversarial networks computer vision image generation style transfer image inpainting

在线采编平台

在线出版

年度会议

下载中心

年度信息