深度学习图像合成研究综述

叶国升; 王建明; 杨自忠; 张宇航; 崔荣凯; 宣帅

doi:10.11834/jig.220713

综述 | 浏览量 : 0 下载量: 1406 CSCD: 0

PDF
导出
分享
收藏
专辑

深度学习图像合成研究综述
Survey of image composition based on deep learning
2023年28卷第12期页码：3670-3698
收稿：2022-07-18，

修回：2023-03-07，

纸质出版：2023-12-16
DOI： 10.11834/jig.220713
稿件说明：

移动端阅览

叶国升，王建明，杨自忠，张宇航，崔荣凯，宣帅. 2023. 深度学习图像合成研究综述. 中国图象图形学报， 28(12):3670-3698 DOI： 10.11834/jig.220713.

Ye Guosheng， Wang Jianming， Yang Zizhong， Zhang Yuhang， Cui Rongkai， Xuan Shuai. 2023. Survey of image composition based on deep learning. Journal of Image and Graphics， 28(12):3670-3698 DOI： 10.11834/jig.220713.

摘要

图像合成一直是图像处理领域的研究热点，具有广泛的应用前景。从原图中精确提取出前景目标对象并将其与新背景合成，构造尽量接近真实的图像是图像合成的基本目标。为推动基于深度学习的图像合成技术研究与发展，本文论述了当前图像合成任务中面临的主要问题：1）前景对象适应性问题，包括前景对象相对于背景图像的大小、位置、几何角度等几何一致性问题，以及前后景互相遮挡、前景对象边缘细节模糊的外观一致性问题；2）视觉和谐问题，包括前后景色彩、对比度、饱和度等不统一的色调一致性问题，及前景对象丢失对应阴影的阴影缺失问题；3）生境适应性问题，表现为前景对象与背景图像的逻辑合理性。总结了目前为解决不同问题主要使用的深度学习方法，同时对不同问题中的合成图像结果进行质量评估，总结了相应的评价指标，并介绍了为解决不同问题所使用的公开数据集，同时进行了深度学习方法的对比，描述了图像合成技术的主要应用场景，最后分析了基于深度学习的图像合成技术中仍然存在的不足，同时提出可行的研究意见，并对未来图像合成技术发展方向提出展望。

Abstract

Image composition has always been a research hotspot in the field of image processing and has a wide range of application prospects. This process involves accurately extracting the foreground objects in an image and compositing them with a new background image. However， traditional image compositions methods are often time consuming and labor intensive. Users not only need to manually complete the accurate extraction and reasonable placement of foreground objects but also need to adjust the lighting conditions， saturation， edge details， shadows， and other information of foreground objects to make the image quality close to that of the real image. With the development of deep learning technology in recent years， image composition technology has attracted increasing applications and has demonstrated its efficiency. To promote the research and development of image composition technology based on deep learning， this paper expounds four main problems faced in current image composition tasks. First， the foreground object adaptation problem mainly involves foreground object size adjustment， spatial position placement， blurred edge detail processing of foreground objects， and unreasonable mutual occlusion of foreground and background. The current deep learning methods for solving this problem include appropriate bounding box prediction for foreground objects in background images， spatial transformation networks， foreground object location distribution prediction and adversarial training， image fusion technology， and guided placement based on domain information. Second， the foreground object harmonization problem mainly concerns the non-uniformity in the visual information， such as illumination， color， saturation， and contrast， of the foreground and background images after image composition. The current deep learning methods for solving this problem include the attention-based guidance mechanism， domain-information-based verification and discrimination methods， codecs， context-dependent capabilities of Transformers， assisting input with high dynamic range （HDR）， and borrowing methods， such as style transfer. Third， the foreground object shadow harmonization problem mainly involves generating shadows of missing foreground objects in composite images. The current deep learning methods for solving this problem include methods based on image rendering， shadow generation using generative adversarial networks， relying on background ambient lighting assistance， and attention-based methods and mechanisms. Fourth， the habitat adaptation problem between the foreground object and background mainly focuses on biological information matching， which should be considered when compositing foreground objects and background images. Whether foreground objects， such as animals and plants， can be composited in background images is the first problem that should be considered in image composition tasks. The background image selection of an object cannot deviate from its corresponding habitat information. For instance， seagulls do not appear in the desert， and flowers do not grow from ice and snow. The foreground object adaptation problem can be regarded as the key problem in image composition. As long as the foreground objects are correctly and reasonably composited， the subsequent optimization task of the composite image can be performed efficiently. Effectively solving the visual harmonization problem of foreground objects can further improve the authenticity of composite images from the perspective of users. The most important problem to be considered is the adaptation of the foreground and background habitats. Objects and background images cannot be chosen arbitrarily but need to satisfy the logical relationship of reality， that is， to satisfy habitat adaptation， which can be regarded as the primary task in an image composition task. If the habitat information does not fit， then the foreground object and background scenes lose their logical authenticity， and all subsequent tasks fail to make the composite image realistic. This study summarizes the current deep learning methods， publicly available datasets， and evaluation indices for each of the above problems， compares the different deep learning methods， and introduces the application of image synthesis techno-logy. A composite image not only reduces the cost of real data acquisition but also improves the generalization ability of the model. The shortcomings of image composition technology based on deep learning are also analyzed， feasible research suggestions are put forward， and the future development direction of image synthesis technology is forecasted.

关键词

Keywords

references

Abu Alhaija H ， Mustikovela S K ， Mescheder L ， Geiger A and Rother C . 2018 . Augmented reality meets computer vision： efficient data generation for urban driving scenes . International Joutnal Computter Vision ， 126 ， 961 - 972 ［ DOI： 10.1007/s11263-018-1070-x http://dx.doi.org/10.1007/s11263-018-1070-x ］

Arjovsky M ， Chintala S and Bottou L . 2017 . Wasserstein generative adversarial networks // Proceedings of the 34th International Conference on Machine Learning . Sydney， Australia ： JMLR.org： 214 - 223

Azadi S ， Pathak D ， Ebrahimi S and Darrell T . 2020 . Compositional GAN： learning image-conditional binary composition . International Journal of Computer Vision ， 128 （ 10/11 ）： 2570 - 2585 ［ DOI： 10.1007/s11263-020-01336-9 http://dx.doi.org/10.1007/s11263-020-01336-9 ］

Barron J T and Malik J . 2015 . Shape， illumination， and reflectance from shading . IEEE Transactions on Pattern Analysis and Machine Intelligence ， 37 （ 8 ）： 1670 - 1687 ［ DOI： 10.1109/TPAMI.2014.2377712 http://dx.doi.org/10.1109/TPAMI.2014.2377712 ］

Bazazian D ， Calway A and Damen D . 2022 . Dual-domain image synthesis using segmentation-guided GAN ［EB/OL］. ［ 2022-05-20 ］. https://arxiv.org/pdf/2204.09015.pdf https://arxiv.org/pdf/2204.09015.pdf

Brasó G and Leal-Taixé L . 2020 . Learning a neural solver for multiple object tracking // Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle， USA ： IEEE： 6246 - 6256 ［ DOI： 10.1109/CVPR42600.2020.00628 http://dx.doi.org/10.1109/CVPR42600.2020.00628 ］

Burt P and Adelson E . 1983a . The laplacian pyramid as a compact image code . IEEE Transactions on Communications ， 31 （ 4 ）： 532 - 540 ［ DOI： 10.1109/TCOM.1983.1095851 http://dx.doi.org/10.1109/TCOM.1983.1095851 ］

Burt P J and Adelson E H . 1983b . A multiresolution spline with application to image mosaics . ACM Transactions on Graphics ， 2 （ 4 ）： 217 - 236 ［ DOI： 10.1145/245.247 http://dx.doi.org/10.1145/245.247 ］

Bychkovsky V ， Paris S ， Chan E and Durand F . 2011 . Learning photographic global tonal adjustment with a database of input/output image pairs // Proceedings of 2011 CVPR . Colorado Springs， USA ： IEEE： 97 - 104 ［ DOI： 10.1109/CVPR.2011.5995413 http://dx.doi.org/10.1109/CVPR.2011.5995413 ］

Cao J Y ， Cong W Y ， Niu L ， Zhang J F ， Gao X S ， Tang Z W and Zhang L Q . 2022 . Deep image harmonization by bridging the reality gap ［EB/OL］. ［ 2022-05-20 ］. https://arxiv.org/pdf/2103.17104.pdf https://arxiv.org/pdf/2103.17104.pdf

Chang A X ， Funkhouser T ， Guibas L ， Hanrahan P ， Huang Q X ， Li Z M ， Savarese S ， Savva M ， Song S R ， Su H ， Xiao J X ， Yi L and Yu F . 2015 . ShapeNet： an information-rich 3D model repository ［EB/OL］. ［ 2022-05-20 ］. https://arxiv.org/pdf/1512.03012.pdf https://arxiv.org/pdf/1512.03012.pdf

Chen B C and Kae A . 2019 . Toward realistic image compositing with adversarial learning // Processdings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Long Beach， USA ： IEEE： 8407 - 8416 ［ DOI： 10.1109/CVPR.2019.00861 http://dx.doi.org/10.1109/CVPR.2019.00861 ］

Chen Q F ， Li D Z Y and Tang C K . 2012 . KNN matting // Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition . Providence， USA ： IEEE： 869 - 876 ［ DOI： 10.1109/CVPR.2012.6247760 http://dx.doi.org/10.1109/CVPR.2012.6247760 ］

Chen S H ， Wei Y K ， Xu L ， Dong X H and Wen K Z . 2019 . Survey of image style transfer based on deep learning . Application Research of Computers ， 36 （ 8 ）： 2250 - 2255

陈淑環，韦玉科，徐乐，董晓华，温坤哲 . 2019 . 基于深度学习的图像风格迁移研究综述 . 计算机应用研究， 36 （ 8 ）： 2250 - 2255 ［ DOI： 10.19734/j.issn.1001-3695.2018.05.0270 http://dx.doi.org/10.19734/j.issn.1001-3695.2018.05.0270 ］

Cheng D C ， Shi J ， Chen Y Y ， Deng X M and Zhang X P . 2018 . Learning scene illumination by pairwise photos from rear and front mobile cameras . Computer Graphics Forum ， 37 （ 7 ）： 213 - 221 ［ DOI： 10.1111/cgf.13561 http://dx.doi.org/10.1111/cgf.13561 ］

Cheng D L ， Prasad D K and Brown M S . 2014 . Illuminant estimation for color constancy： why spatial-domain methods work and the role of the color distribution . Journal of the Optical Society of America A ， 31 （ 5 ）： 1049 - 1058 ［ DOI： 10.1364/JOSAA.31.001049 http://dx.doi.org/10.1364/JOSAA.31.001049 ］

Cong W Y ， Niu L ， Zhang J F ， Liang J and Zhang L Q . 2021 . Bargainnet： background-guided domain translation for image harmonization // Proceedings of 2021 IEEE International Conference on Multimedia and Expo . Shenzhen， China ： IEEE： #9428394 ［ DOI： 10.1109/ICME51207.2021.9428394 http://dx.doi.org/10.1109/ICME51207.2021.9428394 ］

Cong W Y ， Tao X H ， Niu L ， Liang J ， Gao X S ， Sun Q H and Zhang L Q . 2022 . High-resolution image harmonization via collaborative dual transformations ［EB/OL］. ［ 2022-05-20 ］. https://arxiv.org/pdf/2109.06671.pdf https://arxiv.org/pdf/2109.06671.pdf

Cong W Y ， Zhang J F ， Niu L ， Liu L ， Ling Z X ， Li W Y and Zhang L Q . 2020a . DoveNet： deep image harmonization via domain verification // Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle， USA ： IEEE： 8391 - 8400 ［ DOI： 10.1109/CVPR42600.2020.00842 http://dx.doi.org/10.1109/CVPR42600.2020.00842 ］

Cong W Y ， Zhang J F ， Niu L ， Liu L ， Ling Z X ， Li W Y and Zhang L Q . 2020b. Image harmonization dataset iharmony 4： HCOCO ， HAdobe 5 k ， HFlickr ， and hday2night ［EB/OL］. ［ 2022-05-20 ］. https://arxiv.org/pdf/1908.10526.pdf https://arxiv.org/pdf/1908.10526.pdf

Cun X D and Pun C M . 2020 . Improving the harmony of the composite image by spatial-separated attention module . IEEE Transactions on Image Processing ， 29 ： 4759 - 4771 ［ DOI： 10.1109/TIP.2020.2975979 http://dx.doi.org/10.1109/TIP.2020.2975979 ］

Dematteis N and Giordan D . 2021 . Comparison of digital image correlation methods and the impact of noise in geoscience applications . Remote Sensing ， 13 （ 2 ）： # 327 ［ DOI： 10.3390/rs13020327 http://dx.doi.org/10.3390/rs13020327 ］

Dowson D C and Landau B V . 1982 . The Fréchet distance between multivariate normal distributions . Journal of Multivariate Analysis ， 12 （ 3 ）： 450 - 455 ［ DOI： 10.1016/0047-259X（82）90077-X http://dx.doi.org/10.1016/0047-259X（82）90077-X ］

Du C B and Gao S S . 2017 . Image segmentation-based multi-focus image fusion through multi-scale convolutional neural network . IEEE Access ， 5 ： 15750 - 15761 ［ DOI： 10.1109/ACCESS.2017.2735019 http://dx.doi.org/10.1109/ACCESS.2017.2735019 ］

El Helou M ， Zhou R F ， Barthas J and Süsstrunk S . 2020 . VIDIT： virtual image dataset for illumination transfer ［EB/OL］. ［ 2022-05-20 ］. https://arxiv.org/pdf/2005.05460.pdf https://arxiv.org/pdf/2005.05460.pdf

Gardner M A ， Hold-Geoffroy Y ， Sunkavalli K ， Gagné C and Lalonde J F . 2019 . Deep parametric indoor lighting estimation // Proceedings of 2019 IEEE/CVF International Conference on Computer Vision . Seoul， Korea （South）： IEEE： 7174 - 7182 ［ DOI： 10.1109/ICCV.2019.00727 http://dx.doi.org/10.1109/ICCV.2019.00727 ］

Gardner M A ， Sunkavalli K ， Yumer E ， Shen X H ， Gambaretto E ， Gagné C and Lalonde J F . 2017 . Learning to predict indoor illumination from a single image . ACM Transactions on Graphics ， 36 （ 6 ）： # 176 ［ DOI： 10.1145/3130800.3130891 http://dx.doi.org/10.1145/3130800.3130891 ］

Garon M ， Sunkavalli K ， Hadap S ， Carr N and Lalonde J F . 2019 . Fast spatially-varying indoor lighting estimation // Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Long Beach， USA ： IEEE： 6901 - 6910 ［ DOI： 10.1109/CVPR.2019.00707 http://dx.doi.org/10.1109/CVPR.2019.00707 ］

Gatys L ， Ecker A and Bethge M . 2016a . A neural algorithm of artistic style . Journal of Vision ， 16 （ 12 ）： # 326 ［ DOI： 10.1167/16.12.326 http://dx.doi.org/10.1167/16.12.326 ］

Gatys L A ， Ecker A S and Bethge M . 2016b . Image style transfer using convolutional neural networks // Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition . Las Vegas， USA ： IEEE： 2414 - 2423 ［ DOI： 10.1109/CVPR.2016.265 http://dx.doi.org/10.1109/CVPR.2016.265 ］

Gehler P V ， Rother C ， Blake A ， Minka T and Sharp T . 2008 . Bayesian color constancy revisited // Proceedings of 2008 IEEE Conference on Computer Vision and Pattern Recognition . Anchorage， USA ： IEEE： #4587765 ［ DOI： 10.1109/CVPR.2008.4587765 http://dx.doi.org/10.1109/CVPR.2008.4587765 ］

Gkioulekas I and Zhi T C . 2017 . Computational photography ［EB/OL］. ［ 2022-05-20 ］. http://graphics.cs.cmu.edu/courses/15-463/2017_fall/lectures/lecture7.pdf http://graphics.cs.cmu.edu/courses/15-463/2017_fall/lectures/lecture7.pdf

Goodfellow I J ， Pouget-Abadie J ， Mirza M ， Xu B ， Warde-Farley D ， Ozair S ， Courville A and Bengio Y . 2014 . Generative adversarial nets // Proceedings of the 27th International Conference on Neural Information Processing Systems . Montréal， Canada ： MIT Press： 2672 - 2680

Grosse R ， Johnson M K ， Adelson E H and Freeman W T . 2009 . Ground truth dataset and baseline evaluations for intrinsic image algorithms // Proceedings of the 12th IEEE International Conference on Computer Vision . Kyoto， Japan ： IEEE： 2335 - 2342 ［ DOI： 10.1109/ICCV.2009.5459428 http://dx.doi.org/10.1109/ICCV.2009.5459428 ］

Guo Z H ， Guo D S ， Zheng H Y ， Gu Z R ， Zheng B and Dong J Y . 2021a . Image harmonization with transformer // Proceedings of 2021 IEEE/CVF International Conference on Computer Vision . Montreal， Canada ： IEEE： 14850 - 14859 ［ DOI： 10.1109/ICCV48922.2021.01460 http://dx.doi.org/10.1109/ICCV48922.2021.01460 ］

Guo Z H ， Zheng H Y ， Jiang Y F ， Gu Z R and Zheng B . 2021b . Intrinsic image harmonization // Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Nashville， USA ： IEEE： 16362 - 16371 ［ DOI： 10.1109/CVPR46437.2021.01610 http://dx.doi.org/10.1109/CVPR46437.2021.01610 ］

Hao G Q. Iizuka S and Fukui K . 2020 . Image harmonization with attention-based deep feature modulation //Proceedings of the 31st British Machine Vision Conference. Virtual Event， UK ： BMVA Press 2020

He J J . 2020 . Face Image Synthesis Method Research and Application Using Machine Learning Based Image Generation Algorithm . Hefei， China ： University of Science and Technology of China

何冀军 . 2020 . 用于图像生成的机器学习算法在人像合成中的研究与应用 . 合肥：中国科学技术大学

He K M ， Zhang X Y ， Ren S Q and Sun J . 2016 . Deep residual learning for image recognition // Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition . Las Vegas， USA ： IEEE： 770 - 778 ［ DOI： 10.1109/cvpr.2016.90 http://dx.doi.org/10.1109/cvpr.2016.90 ］

Heusel M ， Ramsauer H ， Unterthiner T ， Nessler B and Hochreiter S . 2017 . GANs trained by a two time-scale update rule converge to a local nash equilibrium // Proceedings of the 31st International Conference on Neural Information Processing Systems . Long Beach， USA ： Curran Associates Inc.： 6629 - 6640

Hold-Geoffroy Y ， Athawale A and Lalonde J F . 2019 . Deep sky modeling for single image outdoor lighting estimation // Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Long Beach， USA ： IEEE： 6920 - 6928 ［ DOI： 10.1109/CVPR.2019.00709 http://dx.doi.org/10.1109/CVPR.2019.00709 ］

Hong Y ， Niu L ， Zhang J F and Zhang L Q . 2022 . Shadow generation for composite image in real-world scenes ［EB/OL］. ［ 2022-05-20 ］. https://arxiv.org/pdf/2104.10338.pdf https://arxiv.org/pdf/2104.10338.pdf

Hou L ， Vicente T F Y ， Hoai M and Samaras D . 2021 . Large scale shadow annotation and detection using lazy annotation and stacked CNNs . IEEE Transactions on Pattern Analysis and Machine Intelligence ， 43 （ 4 ）： 1337 - 1351 ［ DOI： 10.1109/TPAMI.2019.2948011 http://dx.doi.org/10.1109/TPAMI.2019.2948011 ］

Hu X W ， Jiang Y T ， Fu C W and Heng P A . 2019 . Mask-ShadowGan： learning to remove shadows from unpaired data // Proceedings of 2019 IEEE/CVF International Conference on Computer Vision . Seoul， Korea （South）： IEEE： 2472 - 2481 ［ DOI： 10.1109/iccv.2019.00256 http://dx.doi.org/10.1109/iccv.2019.00256 ］

Hu Z Y ， Nsampi N E ， Wang X and Wang Q . 2021 . NeurSF： neural shading field for image harmonization ［EB/OL］. ［ 2022-05-20 ］. https://arxiv.org/pdf/2112.01314.pdf https://arxiv.org/pdf/2112.01314.pdf

Huang H X and Niu L . 2022 . ccHarmony： color-checker based image harmonization dataset ［EB/OL］. ［ 2022-05-20 ］. https://arxiv.org/pdf/2206.00800.pdf https://arxiv.org/pdf/2206.00800.pdf

Isola P ， Zhu J Y ， Zhou T H and Efros A A . 2017 . Image-to-image translation with conditional adversarial networks // Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition . Honolulu， USA ： IEEE： 5967 - 5976 ［ DOI： 10.1109/CVPR.2017.632 http://dx.doi.org/10.1109/CVPR.2017.632 ］

Jaderberg M ， Simonyan K ， Zisserman A and Kavukcuoglu K . 2015 . Spatial transformer networks // Proceedings of the 28th International Conference on Neural Information Processing Systems . Montreal， Canada ： MIT Press： 2017 - 2025 ［ DOI： 10.5555/2969442.2969465 http://dx.doi.org/10.5555/2969442.2969465 ］

Jiang Y F ， Zhang H ， Zhang J M ， Wang Y L ， Lin Z ， Sunkavalli K ， Chen S ， Amirghodsi S ， Kong S and Wang Z Y . 2021 . SSH： a self-supervised framework for image harmonization // Proce-edings of 2021 IEEE/CVF International Conference on Computer Vision . Montreal， Canada ： IEEE： 4812 - 4821 ［ DOI： 10.1109/iccv48922.2021.00479 http://dx.doi.org/10.1109/iccv48922.2021.00479 ］

Kaur H ， Koundal D and Kadyan V . 2021 . Image fusion techniques： a survey . Archives of Computational Methods in Engineering ， 28 （ 7 ）： 4425 - 4447 ［ DOI： 10.1007/s11831-021-09540-7 http://dx.doi.org/10.1007/s11831-021-09540-7 ］

Laffont P Y ， Ren Z L ， Tao X F ， Qian C and Hays J . 2014 . Transient attributes for high-level understanding and editing of outdoor scenes . ACM Transactions on Graphics ， 33 （ 4 ）： # 149 ［ DOI： 10.1145/2601097.2601101 http://dx.doi.org/10.1145/2601097.2601101 ］

Lalonde J F and Efros A A . 2007 . Using color compatibility for assessing image realism // Proceedings of the 11th International Conference on Computer Vision . Rio de Janeiro， Brazil ： IEEE： #4409107 ［ DOI： 10.1109/ICCV.2007.4409107 http://dx.doi.org/10.1109/ICCV.2007.4409107 ］

Lee D ， Liu S F ， Gu J W ， Liu M Y ， Yang M H and Kautz J . 2018 . Context-aware synthesis and placement of object instances // Proceedings of the 32nd International Conference on Neural Information Processing Systems . Montréal， Canada ： Curran Associates Inc.： 10414 - 10424 ［ DOI： 10.5555/3327546.3327701 http://dx.doi.org/10.5555/3327546.3327701 ］

Li X T ， Liu S F ， Kim K ， Wang X L ， Yang M H and Kautz J . 2019 . Putting humans in a scene： learning affordance in 3D indoor environments // Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Long Beach， USA ： IEEE： 12360 - 12368 ［ DOI： 10.1109/CVPR.2019.01265 http://dx.doi.org/10.1109/CVPR.2019.01265 ］

Liao B ， Zhu Y ， Liang C ， Luo F and Xiao C X . 2019 . Illumination animating and editing in a single picture using scene structure estimation . Computers and Graphics ， 82 ： 53 - 64 ［ DOI： 10.1016/j.cag.2019.05.007 http://dx.doi.org/10.1016/j.cag.2019.05.007 ］

Lin C H ， Yumer E ， Wang O ， Shechtman E and Lucey S . 2018 . ST-GAN： spatial transformer generative adversarial networks for image compositing // Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Salt Lake City， USA ： IEEE： 9455 - 9464 ［ DOI： 10.1109/CVPR.2018.00985 http://dx.doi.org/10.1109/CVPR.2018.00985 ］

Lin T Y ， Maire M ， Belongie S ， Hays J ， Perona P ， Ramanan D ， Dollár P and Zitnick C L . 2014 . Microsoft COCO： common objects in context // Proceedings of the 13th European Conference on Computer Vision . Zurich， Switzerland ： Springer： 740 - 755 ［ DOI： 10.1007/978-3-319-10602-1_48 http://dx.doi.org/10.1007/978-3-319-10602-1_48 ］

Ling J ， Xue H ， Song L ， Xie R and Gu X . 2021 . Region-aware adaptive instance normalization for image harmonization // Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Nashville， USA ： IEEE： 9357 - 9366 ［ DOI： 10.1109/CVPR46437.2021.00924 http://dx.doi.org/10.1109/CVPR46437.2021.00924 ］

Liu B ， Xu K and Martin R R . 2017 . Static scene illumination estimation from videos with applications . Journal of Computer Science and Technology ， 32 （ 3 ）： 430 - 442 ［ DOI： 10.1007/s11390-017-1734-y http://dx.doi.org/10.1007/s11390-017-1734-y ］

Liu D Q ， Long C J ， Zhang H P ， Yu H N ， Dong X Z and Xiao C X . 2020 . ARShadowGAN： shadow generative adversarial network for augmented reality in single light scenes // Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle， USA ： IEEE： 8136 - 8145 ［ DOI： 10.1109/cvpr42600.2020.00816 http://dx.doi.org/10.1109/cvpr42600.2020.00816 ］

Liu L ， Liu Z C ， Zhang B ， Li J T ， Niu L ， Liu Q Y and Zhang L Q . 2021 . OPA： object placement assessment dataset ［EB/OL］. ［ 2022-05-20 ］. https://arxiv.org/pdf/2107.01889.pdf https://arxiv.org/pdf/2107.01889.pdf

Luan F J ， Paris S ， Shechtman E and Bala K . 2018 . Deep painterly harmonization . Computer Graphics Forum ， 37 （ 4 ）： 95 - 106 ［ DOI： 10.1111/cgf.13478 http://dx.doi.org/10.1111/cgf.13478 ］

Make Human Community . 2022 . MakeHuman： open source tool for making 3D characters ［EB/OL］. ［ 2022-05-20 ］. http://www.makehumancommunity.org http://www.makehumancommunity.org

Miao H ， Lu F X ， Liu Z D ， Zhang L J ， Manocha D and Zhou B . 2021 . Robust 2D/3D vehicle parsing in arbitrary camera views for CVIS // Proceedings of 2021 IEEE/CVF International Conference on Computer Vision . Montreal， Canada ： IEEE： 15611 - 15620 ［ DOI： 10.1109/ICCV48922.2021.01534 http://dx.doi.org/10.1109/ICCV48922.2021.01534 ］

Mirza M and Osindero S . 2014 . Conditional generative adversarial nets ［EB/OL］. ［ 2022-05-20 ］. https://arxiv.org/pdf/1411.1784.pdf https://arxiv.org/pdf/1411.1784.pdf

Nguyen V ， Vicente T F Y ， Zhao M Z ， Hoai M and Samaras D . 2017 . Shadow detection with conditional generative adversarial networks // Proceedings of 2017 IEEE International Conference on Computer Vision . Venice， Italy ： IEEE： 4520 - 4528 ［ DOI： 10.1109/ICCV.2017.483 http://dx.doi.org/10.1109/ICCV.2017.483 ］

Niu L ， Cong W Y ， Liu L ， Hong Y ， Zhang B ， Liang J and Zhang L Q . 2021 . Making images real again： a comprehensive survey on deep image composition ［EB/OL］. ［ 2022-05-20 ］. https://arxiv.org/pdf/2106.14490.pdf https://arxiv.org/pdf/2106.14490.pdf

Pandey R ， Escolano S O ， Legendre C ， Häne C ， Bouaziz S ， Rhemann C ， Debevec P and Fanello S . 2021 . Total relighting： learning to relight portraits for background replacement . ACM Transactions on Graphics ， 40 （ 4 ）： # 43 ［ DOI： 10.1145/3450626.3459872 http://dx.doi.org/10.1145/3450626.3459872 ］

Paramanandham N and Rajendiran K . 2018 . Multi sensor image fusion for surveillance applications using hybrid image fusion algorithm . Multimedia Tools and Applications ， 77 （ 10 ）： 12405 - 12436 ［ DOI： 10.1007/s11042-017-4895-3 http://dx.doi.org/10.1007/s11042-017-4895-3 ］

Patil V ， Sale D and Joshi M A . 2013 . Image fusion methods and quality assessment parameters . Asian Journal of Engineering and Applied Technology ， 2 （ 1 ）： 40 - 46

Peng J L ， Luo Z K ， Liu L ， Zhang B S ， Wang T ， Wang Y B ， Tai Y ， Wang C J and Lin W Y . 2022 . FRIH： fine-grained region-aware image harmonization ［EB/OL］. ［ 2022-05-20 ］. https://arxiv.org/pdf/2205.06448.pdf https://arxiv.org/pdf/2205.06448.pdf

Pérez P ， Gangnet M and Blake A . 2003 . Poisson image editing // Proceedings of ACM SIGGRAPH 2003 . San Diego， USA ： Association for Computing Machinery： 313 - 318 ［ DOI： 10.1145/1201775.882269 http://dx.doi.org/10.1145/1201775.882269 ］

Qu G H ， Zhang D L and Yan P F . 2002 . Information measure for performance of image fusion . Electronics Letters ， 38 （ 7 ）： 313 - 315 ［ DOI： 10.1049/el：20020212 http://dx.doi.org/10.1049/el：20020212 ］

Ros G ， Sellart L ， Materzynska J ， Vazquez D and Lopez A M . 2016 . The SYNTHIA dataset： a large collection of synthetic images for semantic segmentation of urban scenes // Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition . Las Vegas， USA ： IEEE： 3234 - 3243 ［ DOI： 10.1109/CVPR.2016.352 http://dx.doi.org/10.1109/CVPR.2016.352 ］

Sankaranarayanan S ， Balaji Y ， Jain A ， Lim S N and Chellappa R . 2018 . Learning from synthetic data： addressing domain shift for semantic segmentation // Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Salt Lake City， USA ： IEEE： 3752 - 3761 ［ DOI： 10.1109/cvpr.2018.00395 http://dx.doi.org/10.1109/cvpr.2018.00395 ］

Schieber T A ， Carpi L ， Díaz-Guilera A ， Pardalos P M ， Masoller C and Ravetti M G . 2017 . Quantification of network structural dissimilarities . Nature Communications ， 8 （ 1 ）： # 13928 ［ DOI： 10.1038/ncomms13928 http://dx.doi.org/10.1038/ncomms13928 ］

Sheng Y C ， Zhang J M and Benes B . 2021 . SSN： soft shadow network for image compositing // Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Nashville， USA ： IEEE： 4378 - 4388 ［ DOI： 10.1109/CVPR46437.2021.00436 http://dx.doi.org/10.1109/CVPR46437.2021.00436 ］

Shermeyer J ， Hossler T ， Etten A V ， Hogan D ， Lewis R and Kim D . 2021 . RarePlanes： synthetic data takes flight // Proceedings of 2021 IEEE Winter Conference on Applications of Computer Vision . Waikoloa， USA ： IEEE： 207 - 217 ［ DOI： 10.1109/wacv48630.2021.00025 http://dx.doi.org/10.1109/wacv48630.2021.00025 ］

Simonyan K and Zisserman A . 2015 . Verydeep convolutional networks for large-scale image recognition // Proceedings of the 3rd International Conference on Learning Representations . San Diego， USA ， #1556 ［ DOI： 10.48550/arXiv.1409.1556 http://dx.doi.org/10.48550/arXiv.1409.1556 ］

Sofiiuk K ， Popenova P and Konushin A . 2021 . Foreground-aware semantic representations for image harmonization // Proceedings of 2021 IEEE Winter Conference on Applications of Computer Vision . Waikoloa， USA ： IEEE： 1619 - 1628 ［ DOI： 10.1109/wacv48630.2021.00166 http://dx.doi.org/10.1109/wacv48630.2021.00166 ］

Strickland E . 2022 . Are you still using real data to train your AI？［EB/OL］. ［ 2022-05-20 ］. https://spectrum.ieee.org/synthetic-data-ai https://spectrum.ieee.org/synthetic-data-ai

Sun T C ， Barron J T ， Tsai Y T ， Xu Z X ， Yu X M ， Fyffe G ， Rhemann C ， Busch J ， Debevec P and Ramamoorthi R . 2019 . Single image portrait relighting . ACM Transactions on Graphics ， 38 （ 4 ）： # 79 ［ DOI： 10.1145/3306346.3323008 http://dx.doi.org/10.1145/3306346.3323008 ］

Szeliski R . 2011 . Computer Vision： Algorithms and Applications . New York， USA ： Springer

Tan F W ， Bernier C ， Cohen B ， Ordonez V and Barnes C . 2018 . Where and who？ Automatic semantic-aware person composition // Proceedings of 2018 IEEE Winter Conference on Applications of Computer Vision . Lake Tahoe， USA ： IEEE： 1519 - 1528 ［ DOI： 10.1109/WACV.2018.00170 http://dx.doi.org/10.1109/WACV.2018.00170 ］

Tan X H ， Xu P P ， Guo S H and Wang W C . 2019 . Image composition of partially occluded objects . Computer Graphics Forum ， 38 （ 7 ）： 641 - 650 ［ DOI： 10.1111/cgf.13867 http://dx.doi.org/10.1111/cgf.13867 ］

Tremblay J ， Prakash A ， Acuna D ， Brophy M ， Jampani V ， Anil C ， To T ， Cameracci E ， Boochoon S and Birchfield S . 2018 . Training deep networks with synthetic data： bridging the reality gap by domain randomization // Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops . Salt Lake City， USA ： IEEE： 10820 - 10828 ［ DOI： 10.1109/cvprw.2018.00143 http://dx.doi.org/10.1109/cvprw.2018.00143 ］

Tripathi S ， Chandra S ， Agrawal A ， Tyagi A ， Rehg J M and Chari V . 2019 . Learning to generate synthetic data via compositing // Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Long Beach， USA ： IEEE： 461 - 470 ［ DOI： 10.1109/CVPR.2019.00055 http://dx.doi.org/10.1109/CVPR.2019.00055 ］

Tsai Y H ， Shen X H ， Lin Z ， Sunkavalli K ， Lu X and Yang M H . 2017 . Deep image harmonization // Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition . Honolulu， USA ： IEEE： 2799 - 2807 ［ DOI： 10.1109/cvpr.2017.299 http://dx.doi.org/10.1109/cvpr.2017.299 ］

Valanarasu J M J ， Zhang H ， Zhang J M ， Wang Y L ， Lin Z ， Echevarria J ， Ma Y L ， Wei Z J ， Sunkavalli K and Patel V . 2023 . Interactive portrait harmonization // Proceedings of the 11th International Conference on Learning Representations . Kigali， Rwanda ： OpenReview.net

Wang J F ， Li X and Yang J . 2018 . Stacked conditional generative adversarial networks for jointly learning shadow detection and shadow removal // Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Salt Lake City， USA ： IEEE： 1788 - 1797 ［ DOI： 10.1109/CVPR.2018.00192 http://dx.doi.org/10.1109/CVPR.2018.00192 ］

Wang J M ， Cheng X Y ， Yang Z Z ， Shi C Y ， Zhang Y H and Qian Z K . 2022 . Influence of different data augmentation methods on model recognition accuracy . Computer Science ， 49 （ 6 A）： 418 - 423

王建明，陈响育，杨自忠，史晨阳，张宇航，钱正坤 . 2022 . 不同数据增强方法对模型识别精度的影响 . 计算机科学， 49 （ 6 A）： 418 - 423 ［ DOI： 10.11896/jsjkx.210700210 http://dx.doi.org/10.11896/jsjkx.210700210 ］

Wang T Y ， Hu X W ， Wang Q ， Heng P A and Fu C W . 2020 . Instance shadow detection // Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle， USA ： IEEE： 1877 - 1886 ［ DOI： 10.1109/CVPR42600.2020.00195 http://dx.doi.org/10.1109/CVPR42600.2020.00195 ］

Wang Z and Bovik A C . 2002 . A universal image quality index . IEEE Signal Processing Letters ， 9 （ 3 ）： 81 - 84 ［ DOI： 10.1109/97.995823 http://dx.doi.org/10.1109/97.995823 ］

Wang Z ， Bovik A C ， Sheikh H R and Simoncelli E P . 2004 . Image quality assessment： from error visibility to structural similarity . IEEE Transactions on Image Processing ， 13 （ 4 ）： 600 - 612 ［ DOI： 10.1109/TIP.2003.819861 http://dx.doi.org/10.1109/TIP.2003.819861 ］

Ward D ， Moghadam P and Hudson N . 2018 . Deep leaf segmentation using synthetic data // Proceedings of 2018 British Machine Vision Conference . Newcastle， UK ： BMVA Press

Weber H ， Prévost D and Lalonde J F . 2018 . Learning to estimate indoor lighting from 3D object // Proceedings of 2018 International Conference on 3D Vision . Verona， Italy ： IEEE： 199 - 207 ［ DOI： 10.1109/3dv.2018.00032 http://dx.doi.org/10.1109/3dv.2018.00032 ］

Wu H and Xu D . 2012 . Survey of digital image compositing . Journal of Image and Graphics ， 17 （ 11 ）： 1333 - 1346

吴昊，徐丹 . 2012 . 数字图像合成技术综述 . 中国图象图形学报， 17 （ 11 ）： 1333 - 1346 ［ DOI： 10.11834/jig.20121101 http://dx.doi.org/10.11834/jig.20121101 ］

Wu H K ， Zheng S ， Zhang J G and Huang K Q . 2019 . GP-GAN： towards realistic high-resolution image blending // Proceedings of the 27th ACM International Conference on Multimedia . Nice， France ： Association for Computing Machinery： 2487 - 2495 ［ DOI： 10.1145/3343031.3350944 http://dx.doi.org/10.1145/3343031.3350944 ］

Xu N ， Price B ， Cohen S and Huang T . 2017 . Deep image matting // Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition . Honolulu， USA ： IEEE： 311 - 320 ［ DOI： 10.1109/cvpr.2017.41 http://dx.doi.org/10.1109/cvpr.2017.41 ］

Xue S ， Agarwala A ， Dorsey J and Rushmeier H . 2012 . Understanding and improving the realism of image composites . ACM Transactions on Graphics ， 31 （ 4 ）： # 84 ［ DOI： 10.1145/2185520.2185580 http://dx.doi.org/10.1145/2185520.2185580 ］

Zhan F N ， Huang J X and Lu S J . 2021a . Hierarchy composition gan for high-fidelity image synthesis ［EB/OL］. ［ 2022-05-20 ］. https://arxiv.org/pdf/1905.04693.pdf https://arxiv.org/pdf/1905.04693.pdf

Zhan F N ， Lu S J ， Zhang C G ， Ma F Y and Xie X S . 2021b . Adversarial image composition with auxiliary illumination // Proceedings of the 15th Asian Conference on Computer Vision . Kyoto， Japan ： Springer： 234 - 250 ［ DOI： 10.1007/978-3-030-69532-3_15 http://dx.doi.org/10.1007/978-3-030-69532-3_15 ］

Zhan F N ， Zhu H Y and Lu S J . 2019 . Spatial fusion GAN for image synthesis // Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Long Beach， USA ： IEEE： 3648 - 3657 ［ DOI： 10.1109/CVPR.2019.00377 http://dx.doi.org/10.1109/CVPR.2019.00377 ］

Zhang H ， Zhang J M ， Perazzi F ， Lin Z and Patel V M . 2021 . Deep image compositing // Proceedings of 2021 IEEE Winter Conference on Applications of Computer Vision . Waikoloa， USA ： IEEE： 365 - 374 ［ DOI： 10.1109/WACV48630.2021.00041 http://dx.doi.org/10.1109/WACV48630.2021.00041 ］

Zhang J S ， Sunkavalli K ， Hold-Geoffroy Y ， Hadap S ， Eisenman J and Lalonde J F . 2019a . All-weather deep outdoor lighting estimation // Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Long Beach， USA ： IEEE： 10150 - 10158 ［ DOI： 10.1109/CVPR.2019.01040 http://dx.doi.org/10.1109/CVPR.2019.01040 ］

Zhang L Z ， Wen T ， Min J ， Wang J C ， Han D and Shi J B . 2020a . Learning object placement by inpainting for compositional data augmentation // Proceedings of the 16th European Conference on Computer Vision . Glasgow， UK ： Springer： 566 - 581 ［ DOI： 10.1007/978-3-030-58601-0_34 http://dx.doi.org/10.1007/978-3-030-58601-0_34 ］

Zhang L Z ， Wen T and Shi J B . 2020b . Deep image blending // Proceedings of 2020 IEEE Winter Conference on Applications of Computer Vision . Snowmass， USA ： IEEE： 231 - 240 ［ DOI： 10.1109/WACV45572.2020.9093632 http://dx.doi.org/10.1109/WACV45572.2020.9093632 ］

Zhang R ， Isola P ， Efros A A ， Shechtman E and Wang O . 2018 . The unreasonable effectiveness of deep features as a perceptual metric // Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Salt Lake City， USA ： IEEE： 586 - 595 ［ DOI： 10.1109/CVPR.2018.00068 http://dx.doi.org/10.1109/CVPR.2018.00068 ］

Zhang S Y ， Liang R Z and Wang M . 2019b . ShadowGAN： shadow synthesis for virtual objects with conditional adversarial networks . Computational Visual Media ， 5 （ 1 ）： 105 - 115 ［ DOI： 10.1007/s41095-019-0136-1 http://dx.doi.org/10.1007/s41095-019-0136-1 ］

Zhao H S ， Shen X H ， Lin Z ， Sunkavalli K ， Price B and Jia J Y . 2018 . Compositing-aware image search // Proceedings of the 15th European Conference on Computer Vision . Munich， Germany ： Springer： 517 - 532 ［ DOI： 10.1007/978-3-030-01219-9_31 http://dx.doi.org/10.1007/978-3-030-01219-9_31 ］

Zhao L ， Gao X B and Tian C N . 2013 . Review of frontal face image synthesis methods . Journal of Image and Graphics ， 18 （ 1 ）： 1 - 10

赵林，高新波，田春娜 . 2013 . 正面人脸图像合成方法综述 . 中国图象图形学报， 18 （ 1 ）： 1 - 10 ［ DOI： 10.11834/jig.20130101 http://dx.doi.org/10.11834/jig.20130101 ］

Zhao Y N ， Price B ， Cohen S and Gurari D . 2019 . Unconstrained foreground object search // Proceedings of 2019 IEEE/CVF International Conference on Computer Vision . Seoul， Korea （South）： IEEE： 2030 - 2039 ［ DOI： 10.1109/ICCV.2019.00212 http://dx.doi.org/10.1109/ICCV.2019.00212 ］

Zhou B L ， Zhao H ， Puig X ， Fidler S ， Barriuso A and Torralba A . 2017 . Scene parsing through ADE20K dataset // Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition . Honolulu， USA ： IEEE： 5122 - 5130 ［ DOI： 10.1109/CVPR.2017.544 http://dx.doi.org/10.1109/CVPR.2017.544 ］

Zhou B L ， Zhao H ， Puig X ， Xiao T T ， Fidler S ， Barriuso A and Torralba A . 2019 . Semantic understanding of scenes through the ADE20K dataset . International Journal of Computer Vision ， 127 （ 3 ）： 302 - 321 ［ DOI： 10.1007/s11263-018-1140-0 http://dx.doi.org/10.1007/s11263-018-1140-0 ］

Zhou H ， Sattler T and Jacobs D W . 2016 . Evaluating local features for day-night matching // Proceedings of 2016 European Conference on Computer Vision . Amsterdam， the Netherlands ： Springer： 724 - 736 ［ DOI： 10.1007/978-3-319-49409-8_60 http://dx.doi.org/10.1007/978-3-319-49409-8_60 ］

Zhou P ， Han X T ， Morariu V I and Davis L S . 2018 . Learning rich features for image manipulation detection // Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Salt Lake City， USA ： IEEE： 1053 - 1061 ［ DOI： 10.1109/CVPR.2018.00116 http://dx.doi.org/10.1109/CVPR.2018.00116 ］

Zhou S Y ， Liu L ， Niu L and Zhang L Q . 2022 . Learning object placement via dual-path graph completion // Proceedings of the 17th European Conference on Computer Vision . Tel Aviv， Israel ： Springer： 373 - 389 ［ DOI： 10.1007/978-3-031-19790-1_23 http://dx.doi.org/10.1007/978-3-031-19790-1_23 ］

Zhu J Y ， Krähenbühl P ， Shechtman E and Efros A A . 2015 . Learning a discriminative model for the perception of realism in composite images // Proceedings of 2015 IEEE International Conference on Computer Vision . Santiago， Chile ： IEEE： 3943 - 3951 ［ DOI： 10.1109/iccv.2015.449 http://dx.doi.org/10.1109/iccv.2015.449 ］

Zhu S J ， Lin Z ， Cohen S ， Kuen J ， Zhang Z F and Chen C . 2022a . GALA： toward geometry-and-lighting-aware object search for compositing // Proceedings of the 17th European Conference . Tel Aviv， Israel ： Springer： 676 - 692 ［ DOI： 10.1007/978-3-031-19812-0_39 http://dx.doi.org/10.1007/978-3-031-19812-0_39 ］

Zhu Z Y ， Zhang Z ， Lin Z ， Wu R Q and Guo C L . 2022b . Image harmonization by matching regional references ［EB/OL］. ［ 2022-05-20 ］. https://arxiv.org/pdf/2204.04715.pdf https://arxiv.org/pdf/2204.04715.pdf

Zuo Y H . 2011 . Environmental Studies . 2nd ed . Beijing， China ： Higher Education Press ： 183 - 184

左玉辉 . 2011 . 环境学. 2版 . 北京：高等教育出版社）： 183 - 184