深度对抗视觉生成综述

谭明奎; 许守恺; 张书海; 陈奇

doi:10.11834/jig.210252

综述 | 浏览量 : 0 下载量: 0 CSCD: 1

PDF
导出
分享
收藏
专辑

深度对抗视觉生成综述
A review on deep adversarial visual generation
2021年26卷第12期页码：2751-2766
纸质出版日期： 2021-12-16 ，

录用日期： 2021-07-23
DOI： 10.11834/jig.210252
稿件说明：

移动端阅览

谭明奎, 许守恺, 张书海, 陈奇. 深度对抗视觉生成综述[J]. 中国图象图形学报, 2021,26(12):2751-2766.

Mingkui Tan, Shoukai Xu, Shuhai Zhang, Qi Chen. A review on deep adversarial visual generation[J]. Journal of Image and Graphics, 2021,26(12):2751-2766.
谭明奎, 许守恺, 张书海, 陈奇. 深度对抗视觉生成综述[J]. 中国图象图形学报, 2021,26(12):2751-2766. DOI： 10.11834/jig.210252.

Mingkui Tan, Shoukai Xu, Shuhai Zhang, Qi Chen. A review on deep adversarial visual generation[J]. Journal of Image and Graphics, 2021,26(12):2751-2766. DOI： 10.11834/jig.210252.

摘要

深度视觉生成是计算机视觉领域的热门方向，旨在使计算机能够根据输入数据自动生成预期的视觉内容。深度视觉生成使用人工智能技术赋能相关产业，推动产业自动化、智能化改革与转型。生成对抗网络（generative adversarial networks，GANs）是深度视觉生成的有效工具，近年来受到极大关注，成为快速发展的研究方向。GANs能够接收多种模态的输入数据，包括噪声、图像、文本和视频，以对抗博弈的模式进行图像生成和视频生成，已成功应用于多项视觉生成任务。利用GANs实现真实的、多样化和可控的视觉生成具有重要的研究意义。本文对近年来深度对抗视觉生成的相关工作进行综述。首先介绍深度视觉生成背景及典型生成模型，然后根据深度对抗视觉生成的主流任务概述相关算法，总结深度对抗视觉生成目前面临的痛点问题，在此基础上分析深度对抗视觉生成的未来发展趋势。

Abstract

Deep visual generation has aimed to create synthetic photo-realistic visual contents (such as images and videos) that could fool or please human perceptions according to some specific requirements. In fact

many human activities belong to the field of visual generation

e.g.

advertisement making

house designing and film making. However

these tasks normally can only be done by experts with professional skills gained through long-term training and the help of professional software such as Adobe Photoshop. Besides

it may also take a very long time to produce photo-realistic contents since the process can be very tedious and cumbersome. Thus

how to make these processes automated is a very important yet non-trivial problem. Nowadays

deep visual generation has become a significant research direction in computer vision and machine learning

and has been applied in many tasks

such as automatic content generation

beautification

rendering and data augmentation. Thanks to the current deep generative methods can be categorized into two groups: variational auto-encoder (VAE) based methods and generative adversarial networks (GANs) based methods. Based on encoder-decoder architecture

VAE methods first map input data into a latent distribution

and then minimize the distance between the latent distribution and some prior distribution

e.g.

Gaussian distribution. A well-trained VAE model could be used in the tasks of dimensionality reduction and image generation. However

an inevitable gap between the latent distribution and prior distribution would make the generated images/videos blurred. Unlike the VAE model

GAN has learned a mapping between input and output distributions to synthesize sharper images/videos. A GAN model has contained two major modules. A generator has aimed to generate the fake data and a discriminator has distinguished whether a sample is fake or not. To produce plausible fake data

the generator has been matched the distribution of real data and synthesized fake data that would fulfill the requirements of reality and diversity. The optimization problem of learning the generator and discriminator has been formulated into a two-player minimax game. During the training

the two modules have been optimized alternately using stochastic gradient methods. At the end of the training

the generator and discriminator have been supposed to reach a Nash Equilibria of the minimax game. Due to the development of GAN model

more deep visual generation applications and tasks have occurred based on GAN model. The six typical tasks for deep visual generation have been presented as follows: 1) Image generation from noises: it is the earliest task of deep visual generation in which GAN model seeks to generate an image (e.g.

face image) from random noises. 2) Image generation from images: it tries to transform a given image into a new one (e.g.

from black-and-white image to color image). This task can be applied to applications like style transfer and image reconstruction. 3) Image generation from texts: it is a very natural task just like that humans describe the content of a painting and then the painters draw the corresponding images based on the texts. 4) Video generation from images: it aims to turn a static image into a dynamic video

which can be used in time-lapse photography

making animated videos from pictures

etc. 5) Video generation from videos: it is mainly used for video style transfer

video super-resolution and so on. 6) Video generation from texts: it is more difficult than image generation from texts since it needs the generated videos focusing on both semantical alignments with text and consistency among video frames. The challenges in deep visual generation have been analyzed and discussed. First

rather than 2D data

we should try to generate high-quality 3D data

which contains more information and details. Second

we could pay more attention to video generation instead of only image generation. Third

we could conduct some researches on controllable deep visual generation methods

which are more practical in real-world applications. Finally

we could try to expand the style transfer methods from two domains to multiple domains. In this review

we have summarized very recent works on deep adversarial visual generation through a systematic investigation. The review has mainly included an introduction of deep visual generation background

typical generation models

an overview of mainstream deep visual generation tasks and related algorithms. The deep adversarial visual generation research has been conducted further.

关键词

深度学习视觉生成生成对抗网络(GANs)图像生成视频生成3维深度图像生成风格迁移可控生成

Keywords

deep learningvisual generationgenerative adversarial networks (GANs)image generationvideo generation3D-depth image generationstyle transfercontrollable generation

references

Abdal R, Zhu P H, Mitra N J and Wonka P. 2021. StyleFlow: attribute-conditioned exploration of StyleGAN-generated images using conditional continuous normalizing flows. ACM Transactions on Graphics, 40(3): #21 [DOI: 10.1145/3447648]

Arjovsky M, Chintala S and Bottou L. 2017. Wasserstein GAN//Proceedings of 2017 International Conference on Machine Learning. Sydney, Australia: [s. n.]: 214-223

Balaji Y, Min M R, Bai B, Chellappa R and Graf H P. 2019a. Conditional GAN with discriminative filter generation for text-to-video synthesis//Proceedings of the 28th International Joint Conference on Artificial Intelligence. Macao, China: AAAI Press: 1995-2001 [DOI: 10.24963/IJCAI.2019/276http://dx.doi.org/10.24963/IJCAI.2019/276]

Balaji Y, Min M R, Bai B, Chellappa R and Graf H P. 2019b. TFGAN: improving conditioning for text-to-video synthesis//Proceedings of the 7th International Conference on Learning Representations. New Orleans, USA: [s. n.]

Balakrishnan G, Dalca A V, Zhao A, Guttag J V, Durand F and Freeman W T. 2019. Visual deprojection: probabilistic recovery of collapsed dimensions//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 171-180 [DOI: 10.1109/ICCV.2019.00026http://dx.doi.org/10.1109/ICCV.2019.00026]

Baumgartner C F, Koch L M, Tezcan K C, Ang J X and Konukoglu E. 2018. Visual feature attribution using Wasserstein GANs//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 8309-8319 [DOI: 10.1109/CVPR.2018.00867http://dx.doi.org/10.1109/CVPR.2018.00867]

Brock A, Donahue J and Simonyan K. 2019. Large scale GAN training for high fidelity natural image synthesis//Proceedings of the 7th International Conference on Learning Representations. New Orleans, USA: [s. n.]

Cao J Z, Mo L Y, Zhang Y F, Jia K, Shen C H and Tan M K. 2019. Multi-marginal Wasserstein GAN//Proceedings of the 33rd Conference on Neural Information Processing Systems. Vancouver, Canada: Curran Associates, Inc. : 1774-1784

Cha M, Gown Y L and Kung H T. 2019. Adversarial learning of semantic relevance in text to image synthesis//Proceedings of 2019 AAAI Conference on Artificial Intelligence, 33: 3272-3279 [DOI: 10.1609/aaai.v33i01.33013272http://dx.doi.org/10.1609/aaai.v33i01.33013272]

ChanC, Ginosar S, Zhou T H and Efros A. 2019. Everybody dance now//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 5932-5941 [DOI: 10.1109/ICCV.2019.00603http://dx.doi.org/10.1109/ICCV.2019.00603]

Chen Q, Wu Q, Chen J, Wu Q Y, van den Hengel A and Tan M K. 2020a. Scripted video generation with a bottom-up generative adversarial network. IEEE Transactions on Image Processing, 29: 7454-7467

Chen Q, Wu Q, Tang R, Wang Y H, Wang S and Tan M K. 2020b. Intelligent home 3D: automatic 3D-house design from linguistic descriptions only//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 12622-12631 [DOI: 10.1109/CVPR42600.2020.01264http://dx.doi.org/10.1109/CVPR42600.2020.01264]

Chen Y H, Shi F, Christodoulou A G, Xie Y B, Zhou Z W and Li D B. 2018. Efficient and accurate MRI super-resolution using a generative adversarial network and 3D multi-level densely connected network//Proceedings of 2018 International Conference on Medical Image Computing and Computer-Assisted Intervention. Granada, Spain: Springer: 91-99 [DOI: 10.1007/978-3-030-00928-1_11http://dx.doi.org/10.1007/978-3-030-00928-1_11]

Chen Z, Wang C Y, Yuan B and Tao D C. 2020e. PuppeteerGAN: arbitrary portrait animation with semantic-aware appearance transformation//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 13515-13524 [DOI: 10.1109/CVPR42600.2020.01353http://dx.doi.org/10.1109/CVPR42600.2020.01353]

Choi Y, Choi M, Kim M, Ha J W, Kim S and Choo J. 2018. StarGAN: unified generative adversarial networks for multi-domain image-to-image translation//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 8789-8797 [DOI: 10.1109/CVPR.2018.00916http://dx.doi.org/10.1109/CVPR.2018.00916]

Choi Y, Uh Y, Yoo J and Ha J W. 2020. StarGAN v2: diverse image synthesis for multiple domains//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 8185-8194 [DOI: 10.1109/CVPR42600.2020.00821http://dx.doi.org/10.1109/CVPR42600.2020.00821]

Deng K L, Fei T Y, Huang X and Peng Y X. 2019. IRC-GAN: introspective recurrent convolutional GAN for text-to-video generation//Proceedings of the 28th International Joint Conference on Artificial Intelligence. Macao, China: AAAI Press: 2216-2222 [DOI: 10.24963/ijcai.2019/307http://dx.doi.org/10.24963/ijcai.2019/307]

Deng Y, Yang J L, Chen D, Wen F and Tong X. 2020. Disentangled and controllable face image generation via 3D imitative-contrastive learning//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 5153-5162 [DOI: 10.1109/CVPR42600.2020.00520http://dx.doi.org/10.1109/CVPR42600.2020.00520]

Dong H Y, Liang X D, Shen X H, Wu B W, Chen B C and Yin J. 2019. FW-GAN: flow-navigated warping GAN for video virtual try-on//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 1161-1170 [DOI: 10.1109/ICCV.2019.00125http://dx.doi.org/10.1109/ICCV.2019.00125]

Dong H Y, Liang X D, Zhang Y X, Zhang X J, Shen X H, Xie Z Y, Wu B W and Yin J. 2020. Fashion editing with adversarial parsing learning//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 8117-8125 [DOI: 10.1109/CVPR42600.2020.00814http://dx.doi.org/10.1109/CVPR42600.2020.00814]

Fan J F, Cao X H, Xue Z, Yap P T and Shen D G. 2018. Adversarial similarity network for evaluating image alignment in deep learning based registration//Proceedings of 2018 International Conference on Medical Image Computing and Computer-Assisted Intervention. Granada, Spain: Springer: 739-746 [DOI: 10.1007/978-3-030-00928-1_83http://dx.doi.org/10.1007/978-3-030-00928-1_83]

Frid-Adar M, Klang E, Amitai M, Goldberger J and Greenspan H. 2018. Synthetic data augmentation using GAN for improved liver lesion classification//The 15th IEEE International Symposium on Biomedical Imaging (ISBI 2018). Washington, USA: IEEE: 289-293 [DOI: 10.1109/ISBI.2018.8363576http://dx.doi.org/10.1109/ISBI.2018.8363576]

Gao C Y, Liu Q, Xu Q, Wang L M, Liu J Z and Zou C Q. 2020. SketchyCOCO: image generation from freehand scene sketches//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 5173-5182 [DOI: 10.1109/CVPR42600.2020.00522http://dx.doi.org/10.1109/CVPR42600.2020.00522]

Goodfellow I J, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A and Bengio Y. 2014. Generative adversarial nets//Proceedings of the 27th International Conference on Neural Information Processing Systems. Montreal, Canada: MIT Press: 2672-2680

Gui J, Sun Z, Wen Y G, Tao D C and Ye J P. 2020. A review on generative adversarial networks: algorithms, theory, and applications[EB/OL]. [2021-04-07].https://arxiv.org/pdf/2001.06937.pdfhttps://arxiv.org/pdf/2001.06937.pdf

Gulrajani I, Ahmed F, Arjovsky M, Dumoulin V and Courville A. 2017. Improved training of Wasserstein GANs//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA: Curran Associates Inc. : 5769-5779

Guo Y, Chen Q, Chen J, Wu Q Y, Shi Q F and Tan M K. 2019. Auto-embedding generative adversarial networks for high resolution image synthesis. IEEE Transactions on Multimedia, 21(11): 2726-2737 [DOI: 10.1109/TMM.2019.2908352]

Gupta T, Schwenk D, Farhadi A, Hoiem D and Kembhavi A. 2018. Imagine this! Scripts to compositions to videos//Proceedings of 2018 European Conference on Computer Vision. Munich, Germany: Springer: 610-626 [DOI: 10.1007/978-3-030-01237-3_37http://dx.doi.org/10.1007/978-3-030-01237-3_37]

Huang X, Liu M Y, Belongie S and Kautz J. 2018. Multimodal unsupervised image-to-image translation//Proceedings of 2018 European Conference on Computer Vision. Munich, Germany: Springer: 179-196 [DOI: 10.1007/978-3-030-01219-9_11http://dx.doi.org/10.1007/978-3-030-01219-9_11]

Kaneko T and Harada T. 2020. Noise robust generative adversarial networks//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 8401-8411 [DOI: 10.1109/CVPR42600.2020.00843http://dx.doi.org/10.1109/CVPR42600.2020.00843]

Karras T, Aila T, Laine S and Lehtinen J. 2018. Progressive growing of GANs for improved quality, stability, and variation//Proceedings of the 6th International Conference on Learning Representations. Vancouver BC, Canada: [s. n.]

Karras T, Laine S and Aila T. 2019. A style-based generator architecture for generative adversarial networks//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 4396-4405 [DOI: 10.1109/CVPR.2019.00453http://dx.doi.org/10.1109/CVPR.2019.00453]

Karras T, Laine S, Aittala M, Hellsten J, Lehtinen J and Aila T. 2020. Analyzing and improving the image quality of StyleGAN//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 8107-8116 [DOI: 10.1109/CVPR42600.2020.00813http://dx.doi.org/10.1109/CVPR42600.2020.00813]

Kim D, Woo S, Lee J Y and Kweon I S. 2019b. Deep video inpainting//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 5785-5794 [DOI: 10.1109/CVPR.2019.00594http://dx.doi.org/10.1109/CVPR.2019.00594]

Kim J, Kim M, Kang H and Lee K H. 2020a. U-GAT-IT: unsupervised generative attentional networks with adaptive layer-instance normalization for image-to-image translation//Proceedings of the 8th International Conference on Learning Representations. Addis Ababa, Ethiopia: [s. n.]

Kim S W, Zhou Y H, Philion J, Torralba A and Fidler S. 2020b. Learning to simulate dynamic environments with GameGAN//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 1228-1237 [DOI: 10.1109/CVPR42600.2020.00131http://dx.doi.org/10.1109/CVPR42600.2020.00131]

Kingma D P and Welling M. 2014. Auto-encoding variational Bayes//Proceedings of the 2nd International Conference on Learning Representations. Banff, Canada: [s. n.]

Ledig C, Theis L, Huszár F, Caballero J, Cunningham A, Acosta A, Aitken A, Tejani A, Totz J, Wang Z H and Shi W Z. 2017. Photo-realistic single image super-resolution using a generative adversarial network//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 105-114 [DOI: 10.1109/CVPR.2017.19http://dx.doi.org/10.1109/CVPR.2017.19]

Li B W, Qi X J, Lukasiewicz T and Torr P H S. 2020a. ManiGAN: text-guided image manipulation//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 7877-7886 [DOI: 10.1109/CVPR42600.2020.00790http://dx.doi.org/10.1109/CVPR42600.2020.00790]

Li Y H, Singh K K, Ojha U and Lee Y J. 2020b. MixNMatch: multifactor disentanglement and encoding for conditional image generation//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 8036-8045 [DOI: 10.1109/CVPR42600.2020.00806http://dx.doi.org/10.1109/CVPR42600.2020.00806]

Li Y T, Min M R, Shen D H, Carlson D E and Carin L. 2018. Video generation from text//Proceedings of the 32nd AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th Innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18). New Orleans, USA: AAAI: 7065-7072

Liao R J, Tao X, Li R Y, Ma Z Y and Jia J Y. 2017. Video super-resolution via deep draft-ensemble learning//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 531-539 [DOI: 10.1109/ICCV.2015.68http://dx.doi.org/10.1109/ICCV.2015.68]

Lin A S, Wu L M, Corona R, Tai K, Huang Q X and Mooney R J. 2018. Generating animated videos of human activities from natural language descriptions//Proceedings of the 32nd Conference on Neural Information Processing Systems. Montréal, Canada: Curran Associates Inc.

Liu M Y, Huang X, Mallya A, Karras T, Aila T, Lehtinen J and Kautz J. 2019a. Few-shot unsupervised image-to-image translation//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 10550-10559 [DOI: 10.1109/ICCV.2019.01065http://dx.doi.org/10.1109/ICCV.2019.01065]

Liu Y, Wang X, Yuan Y T and Zhu W W. 2019b. Cross-modal dual learning for sentence-to-video generation//Proceedings of the 27th ACM International Conference on Multimedia. Nice, France: ACM:1239-1247 [DOI: 10.1145/3343031.3350986http://dx.doi.org/10.1145/3343031.3350986]

Liu Z M, Jia W, Yang M, Luo P Y, Guo Y and Tan M K. 2021. Deep view synthesis via self-consistent generative network[J/OL]. [2021-03-07]. IEEE Transactions on Multimedia,https://ieeexplore.ieee.org/document/9339999https://ieeexplore.ieee.org/document/9339999[DOI: 10.1109/TMM.2021.3053401http://dx.doi.org/10.1109/TMM.2021.3053401].

Mallya A, Wang T C, Sapra K and Liu M Y. 2020. World-consistent video-to-video synthesis//Proceedings of 2020 European Conference on Computer Vision. Glasgow, United Kingdom: Springer: 359-378 [DOI: 10.1007/978-3-030-58598-3_22http://dx.doi.org/10.1007/978-3-030-58598-3_22]

Mao X D, Li Q, Xie H R, Lau R Y K, Wang Z and Smolley S P. 2017. Least squares generative adversarial networks//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 2813-2821 [DOI: 10.1109/ICCV.2017.304http://dx.doi.org/10.1109/ICCV.2017.304]

Marwah T, Mittal G and Balasubramanian V N. 2017. Attentive semantic video generation using captions//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 1435-1443 [DOI: 10.1109/ICCV.2017.159http://dx.doi.org/10.1109/ICCV.2017.159]

Mathew S, Nadeem S, Kumari S and Kaufman A. 2020. Augmenting colonoscopy using extended and directional CycleGAN for lossy image translation//Proceedings of 2020 IEEE/CVF Conference on Computer Vision andPattern Recognition. Seattle, USA: IEEE: 4695-4704 [DOI: 10.1109/CVPR42600.2020.00475http://dx.doi.org/10.1109/CVPR42600.2020.00475]

Maximov M, Elezi I and Leal-TaixéL. 2020. CIAGAN: conditional identity anonymization generative adversarial networks//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 5446-5455 [DOI: 10.1109/CVPR42600.2020.00549http://dx.doi.org/10.1109/CVPR42600.2020.00549]

Men Y F, Mao Y M, Jiang Y N, Ma W Y and Lian Z H. 2020. Controllable person image synthesis with attribute-decomposed GAN//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 5083-5092 [DOI: 10.1109/CVPR42600.2020.00513http://dx.doi.org/10.1109/CVPR42600.2020.00513]

Menon S, Damian A, Hu S J, Ravi N and Rudin C. 2020. PULSE: self-supervised photo upsampling via latent space exploration of generative models//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 2434-2442 [DOI: 10.1109/CVPR42600.2020.00251http://dx.doi.org/10.1109/CVPR42600.2020.00251]

Mildenhall B, Srinivasan P P, Tancik M, Jonathan T, Barron J T, Ramamoorthi R and Ng R. 2020. Nerf: representing scenes as neural radiance fields for view synthesis//Proceedings of 2020 European Conference on Computer Vision. Glasgow, United Kingdom: Springer: 405-421 [DOI: 10.1007/978-3-030-58452-8-24http://dx.doi.org/10.1007/978-3-030-58452-8-24]

Mittal G,Marwah T and Balasubramanian V N. 2017. Sync-DRAW: automatic video generation using deep recurrent attentive architectures//Proceedings of the 25th ACM International Conference on Multimedia. Mountain View, USA: ACM: 1096-1104 [DOI: 10.1145/3123266.3123309http://dx.doi.org/10.1145/3123266.3123309]

Nam S, Ma C, Chai M, Brendel W, Xu N and Kim S J. 2019. End-to-end time-lapse video synthesis from a single outdoor image//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 1409-1418 [DOI: 10.1109/CVPR.2019.00150http://dx.doi.org/10.1109/CVPR.2019.00150]

Otberdout N, Daoudi M, Kacem A, Ballihi L and Berretti S. 2020. Dynamic facial expression generation on Hilbert Hypersphere with conditional Wasserstein generative adversarial nets[J/OL]. [2021-03-07]. IEEE Transactions on Pattern Analysis and Machine Intelligence,https://ieeexplore.ieee.org/document/9117185https://ieeexplore.ieee.org/document/9117185[DOI: 10.1109/TPAMI.2020.3002500http://dx.doi.org/10.1109/TPAMI.2020.3002500].

Plumerault A, Le Borgne H and Hudelot C. 2020. Controlling generative models with continuous factors of variations//Proceedings of the 8th International Conference on Learning Representations. Addis Ababa, Ethiopia: [s. n.]

Qiao T T, Zhang J, Xu D Q and Tao D C. 2019. MirrorGAN: learning text-to-image generation by redescription//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 1505-1514 [DOI: 10.1109/CVPR.2019.00160http://dx.doi.org/10.1109/CVPR.2019.00160]

Radford A, Metz L and Chintala S. 2016. Unsupervised representation learning with deep convolutional generative adversarial networks//Proceedings of the 4th International Conference on Learning Representations. San Juan, USA: [s. n.]

Ren J, Hacihaliloglu I, Singer E A, Foran D J and Qi X. 2018. Adversarial domain adaptation for classification of prostate histopathology whole-slide images//Proceedings of 2018 International Conference on Medical Image Computing and Computer-Assisted Intervention. Granada, Spain: Springer: 201-209 [DOI: 10.1007/978-3-030-00934-2_23http://dx.doi.org/10.1007/978-3-030-00934-2_23]

Saito S, Huang Z, Natsume R, Morishima S, Li H and Kanazawa A. 2019. PIFu: pixel-aligned implicit function for high-resolution clothed human digitization//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 2304-2314 [DOI: 10.1109/ICCV.2019.00239http://dx.doi.org/10.1109/ICCV.2019.00239]

Saito S, Simon T, Saragih J and Joo H. 2020. PIFuHD: multi-level pixel-aligned implicit function for high-resolution 3D human digitization//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 81-90 [DOI: 10.1109/CVPR42600.2020.00016http://dx.doi.org/10.1109/CVPR42600.2020.00016]

Shaham T R, Dekel T and Michaeli T. 2019. SinGAN: learning a generative model from a single natural image//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 4569-4579 [DOI: 10.1109/ICCV.2019.00467http://dx.doi.org/10.1109/ICCV.2019.00467]

Shen G Y, Huang W B, Gan C, Tan M K, Huang J Z, Zhu W W and Gong B Q. 2019. Facial image-to-video translation by a hidden affine transformation//Proceedings of the 27th ACM International Conference on Multimedia. Nice, France: ACM: 2505-2513 [DOI: 10.1145/3343031.3350981http://dx.doi.org/10.1145/3343031.3350981]

Tewari A, Elgharib M, Bharaj G, Bernard F, Seidel H P, Pérez P, Zollh fer M and Theobalt C. 2020a. StyleRig: rigging StyleGAN for 3D control over portrait images//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 6141-6150 [DOI: 10.1109/CVPR42600.2020.00618http://dx.doi.org/10.1109/CVPR42600.2020.00618]

Tewari A, Elgharib M, Mallikarjun B R, Bernard F, Seidel H P, Pérez P, Zollh fer M and Theobalt C. 2020b. PIE: portrait image embedding for semantic control. ACM Transactions on Graphics, 39(6): #223 [DOI: 10.1145/3414685.3417803]

Tian Y P, Zhang Y L, Fu Y and Xu C L. 2020. TDAN: temporally-deformable alignment network for video super-resolution//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 3357-3366 [DOI: 10.1109/CVPR42600.2020.00342http://dx.doi.org/10.1109/CVPR42600.2020.00342]

Vondrick C, Shrivastava A, Fathi A, Guadarrama S and Murphy K. 2018. Tracking emerges by colorizing videos//Proceedings of 2018 European Conference on Computer Vision. Munich, Germany: Springer: 402-419 [DOI: 10.1007/978-3-030-01261-8_24http://dx.doi.org/10.1007/978-3-030-01261-8_24]

Wan Z Y, Zhang B, Chen D D, Zhang P, Chen D, Liao J and Wen F. 2020. Bringing old photos back to life//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 2744-2754 [DOI: 10.1109/CVPR42600.2020.00282http://dx.doi.org/10.1109/CVPR42600.2020.00282]

Wandt B and Rosenhahn B. 2019. RepNet: weakly supervised training of an adversarial reprojection network for 3D human pose estimation//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 7774-7783 [DOI: 10.1109/CVPR.2019.00797http://dx.doi.org/10.1109/CVPR.2019.00797]

Wang J N, Zhao Y Y, Noble J H and Dawant B M. 2018a. Conditional generative adversarial networks for metal artifact reduction in CT images of the ear//Proceedings of 2018 International Conference on Medical Image Computing and Computer-Assisted Intervention. Granada, Spain: Springer: 3-11 [DOI: 10.1007/978-3-030-00928-1_1http://dx.doi.org/10.1007/978-3-030-00928-1_1]

Wang L Z, Zhao X C, Yu T, Wang S T and Liu Y B. 2020a. NormalGAN: learning detailed 3D human from a single RGB-D image//Proceedings of 2020 European Conference on Computer Vision. Glasgow, United Kingdom: Springer: 430-446 [DOI: 10.1007/978-3-030-58565-5_26http://dx.doi.org/10.1007/978-3-030-58565-5_26]

Wang T C, Liu M Y, Tao A, Liu G L, Kautz J and Catanzaro B. 2019c. Few-shot video-to-video synthesis//Proceedings of the 33rd Conference on Neural Information Processing Systems. Vancouver, Canada: Curran Associates, Inc. : 5014-5025

Wang T C, Liu M Y, Zhu J Y, Liu G L, Tao A, Kautz J and Catanzaro B. 2018b. Video-to-video synthesis//Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montréal Canada: Curran Associates Inc. : 1152-1164

Wang T C, Liu M Y, Zhu J Y, Tao A, Kautz J and Catanzaro B. 2018c. High-resolution image synthesis and semantic manipulation with conditional GANs//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 8798-8807 [DOI: 10.1109/CVPR.2018.00917http://dx.doi.org/10.1109/CVPR.2018.00917]

Wang X T, Yu K, Wu S X, Gu J J, Liu Y H, Dong C, Qiao Y and Loy C C. 2018d. ESRGAN: enhanced super-resolution generative adversarial networks//Proceedings of 2018 European Conference on Computer Vision. Munich, Germany: Springer: 63-79 [DOI: 10.1007/978-3-030-11021-5_510.1007/978-3-030-11021-5_5]

Wang Y H, Bilinski P, Bremond F and Dantcheva A. 2020b. G3AN: disentangling appearance and motion for video generation//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 5263-5272 [DOI: 10.1109/CVPR42600.2020.00531http://dx.doi.org/10.1109/CVPR42600.2020.00531]

Wang Y X, Khan S, Gonzalez-Garcia A, van de Weijer J and Khan F S. 2020c. Semi-supervised learning for few-shot image-to-image translation//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 4452-4461 [DOI: 10.1109/CVPR42600.2020.00451http://dx.doi.org/10.1109/CVPR42600.2020.00451]

Weng C Y, Curless B and Kemelmacher-Shlizerman I. 2019. Photo wake-up: 3D character animation from a single photo//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 5901-5910 [DOI: 10.1109/CVPR.2019.00606http://dx.doi.org/10.1109/CVPR.2019.00606]

Wu S Z, Rupprecht C and Vedaldi A. 2020a. Unsupervised Learning of Probably Symmetric Deformable 3D Objects from Images in the Wild//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and PatternRecognition. Seattle, USA: IEEE: 1-10 [DOI: 10.1109/CVPR42600.2020.00008http://dx.doi.org/10.1109/CVPR42600.2020.00008]

Xu R, Li X X, Zhou B L and Loy C C. 2019. Deep flow-guided video inpainting//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 3718-3727 [DOI: 10.1109/CVPR.2019.00384http://dx.doi.org/10.1109/CVPR.2019.00384]

Xu T, Zhang P C, Huang Q Y, Zhang H, Gan Z, Huang X L and He X D. 2018. AttnGAN: fine-grained text to image generation with attentional generative adversarial networks//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 1316-1324 [DOI: 10.1109/CVPR.2018.00143http://dx.doi.org/10.1109/CVPR.2018.00143]

Yang Z Q, Zhu W T, Wu W, Qian C, Zhou Q, Zhou B L and Loy C C. 2020. TransMoMo: invariance-driven unsupervised video motion retargeting//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 5305-5314 [DOI: 10.1109/CVPR42600.2020.00535http://dx.doi.org/10.1109/CVPR42600.2020.00535]

Zhang H, Goodfellow I, Metaxas D and Odena A. 2019. Self-attention generative adversarial networks//Proceedings of the 36th International Conference on Machine Learning. Long Beach, USA: PMLR: 7354-7363

Zhang H, Xu T, Li H S, Zhang S T, Wang X G, Huang X L and Metaxas, D N. 2017. Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 5907-5915 [DOI: 10.1109/ICCV.2017.629http://dx.doi.org/10.1109/ICCV.2017.629]

Zhang P Y, Wang F S, Xu W and Li Y. 2018. Multi-channel generative adversarial network for parallel magnetic resonance image reconstruction in K-space//Proceedings of 2018 International Conference on Medical Image Computing and Computer-Assisted Intervention. Granada, Spain: Springer: 180-188 [DOI: 10.1007/978-3-030-00928-1_21http://dx.doi.org/10.1007/978-3-030-00928-1_21]

Zhao A, Balakrishnan G, Lewis K M, Durand F, Guttag J V and Dalca A V. 2020. Painting many pasts: synthesizing time lapse videos of paintings//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 8432-8442 [DOI: 10.1109/CVPR42600.2020.00846http://dx.doi.org/10.1109/CVPR42600.2020.00846]

Zhu J Y, Park T, Isola P and Efros A A. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 2242-2251 [DOI: 10.1109/ICCV.2017.244http://dx.doi.org/10.1109/ICCV.2017.244]

Zhu M F, Pan P B, Chen W and Yang Y. 2019. DM-GAN: dynamic memory generative adversarial networks for text-to-image synthesis//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 5795-5803 [DOI: 10.1109/CVPR.2019.00595http://dx.doi.org/10.1109/CVPR.2019.00595]

Zhu Q L, Bi W, Liu X J, Ma X Y, Li X Land Wu D P. 2020a. A batch normalized inference network keeps the KL vanishing away//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Seattle, USA: Association for Computational Linguistics: 2636-2649

Zhu Y Z, Min M R, Kadav A and Graf H P. 2020b. S3VAE: self-supervised sequential VAE for representation disentanglement and data generation//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 6537-6546 [DOI: 10.1109/CVPR42600.2020.00657http://dx.doi.org/10.1109/CVPR42600.2020.00657]

文章被引用时，请邮件提醒。

提交

Re-GAN：残差生成式对抗网络算法

结合全卷积网络与CycleGAN的图像实例风格迁移

数字人脸渲染与外观恢复方法综述

智能交通系统中的车辆标志识别方法综述

医学影像多血管和气道分割方法综述