面向图像修复的增强语义双解码器生成模型

王倩娜; 陈燚

doi:10.11834/jig.210301

图像处理和编码 | 浏览量 : 0 下载量: 0 CSCD: 3

PDF
导出
分享
收藏
专辑

面向图像修复的增强语义双解码器生成模型
Enhanced semantic dual decoder generation model for image inpainting
2022年27卷第10期页码：2994-3009
纸质出版日期： 2022-10-16 ，

录用日期： 2021-08-02
DOI： 10.11834/jig.210301
稿件说明：

移动端阅览

王倩娜, 陈燚. 面向图像修复的增强语义双解码器生成模型[J]. 中国图象图形学报, 2022,27(10):2994-3009.

Qianna Wang, Yi Chen. Enhanced semantic dual decoder generation model for image inpainting[J]. Journal of Image and Graphics, 2022,27(10):2994-3009.
王倩娜, 陈燚. 面向图像修复的增强语义双解码器生成模型[J]. 中国图象图形学报, 2022,27(10):2994-3009. DOI： 10.11834/jig.210301.

Qianna Wang, Yi Chen. Enhanced semantic dual decoder generation model for image inpainting[J]. Journal of Image and Graphics, 2022,27(10):2994-3009. DOI： 10.11834/jig.210301.

摘要

目的

图像修复技术虽然取得了长足进步，但是当图像中缺失区域较大时，非缺失区域提供的信息量非常有限，从而导致难以产生语义信息一致的内容来增强修复图像和真实图像的视觉一致性；同时图像修复常使用两阶段网络结构，基于该结构的模型不仅需要较长的训练时间，还会导致图像修复效果对第1阶段输出结果依赖性较强。针对上述问题，提出了一种基于双解码器的增强语义一致的图像修复方法。

方法

使用双解码器网络结构消除两阶段修复方法中存在的依赖性问题，同时有效缩短模型的训练时间；利用一致性损失、感知损失和风格损失，更好地捕获图像的上下文语义信息，解决图像修复任务中出现的视觉不一致的问题。此外，本文使用了跳跃连接，并引入多尺度注意力模块和扩张卷积，进一步提高了网络的特征提取能力。

结果

为了公正地评价，在CelebA、Stanford Cars和UCF Google Street View共3个数据集上对具有规则和不规则缺失区域的图像分别进行实验，采用客观评价指标：均方误差(L

)、峰值信噪比(peak signal-to-noise ratio，PSNR)、结构相似性(structural similarity，SSIM)、FID(Fréchet inception distance)和IS (inception score)进行评价。实验结果表明本文方法修复的图像不仅在视觉上有明显的提升，而且取得了较优的数值。如规则缺失区域下，在CelebA数据集中，本文方法的FID(越小越好)比性能第2的模型在数值上减少了39.2%；在UCF Google Street View数据集中，本文方法的PSNR比其他模型在数值上分别提高了12.64%、6.77%、4.41%。

结论

本文方法有效减少了模型的训练时间，同时消除了两阶段网络模型中的依赖性问题，修复的图像也呈现出更好的视觉一致性。

Abstract

Objective

Image inpainting for computer vision has been widely used in the context of image and video editing

medical

public security. Constrained of large missing regions in the image

most existing methods usually fail to generate sustainable semantic content to ensure the visual consistency between the repaired image and real image due to the very limited amount of information provided by non-missing regions. The inpainting results of the generator are often be distorted such as color difference

blurring and other artifacts. In addition

the model design has become complex in pursuit of high quality inpainting results

especially the two-stage network structure. The first stage predicts the image contents of missing regions coarsely. The following prediction is fed into the second stage to refine the previous inpainting results. This improves the inpainting effect of the model to some extent

but the two-stage network structure often lead to more inefficient training time and the dependency issue

which means the image inpainting effect is strongly dependent on the result of the first stage.

Method

We propose a dual decoders based enhancing semantic consistency of image inpainting. First

we use consistency loss to reduce the difference of the image between encoder and corresponding decoder. Meanwhile

perceptual loss and style loss are combined to improve the similarity between the repaired image and the real image. These loss functions are defined in the high-level deep features

which can motivate the network to capture the contextual semantic information of the images better

thus producing semantically content in consistency and ensuring the visual consistency between the repaired image and real image. Second

we illustrated a single encoder network structure and a simple and reconstructed paths based dual decoder to eliminate training cost and the dependence of the inpainting effect of the two-stage network structure on the first stage. Simple paths predict the content of missing regions in the image roughly

reconstructed paths generate higher quality inpainting effect

and the inpainting results are regularized by sharing weights. The dual decoder structure allows two inpainting paths to be performed independently at the same time

eliminating the dependency problem in the two-stage network structure and training cost. Finally

we apply the U-Net structure and introduce a skip connection between the encoder and decoder to improve the feature extraction ability

which resolves information loss through down-sampling. Additionally

the dilated convolution is utilized in the encoder to enlarge the receptive field of the model

and the multi-scale attention module is added in the decoder to enhance extracting features extraction ability from distant regions.

Result

We carried out experiments on three datasets

such as CelebA

Stanford Cars and UCF Google Street View. In general

there are usually regular and irregular missing regions in images. To fairly evaluate

we have performed experiments on images with centering and irregular holes. All masks and images are set to the resolution of 256×256 pixels for training and testing. The missing region of the central regular mask is set to 128×128 pixels

and the irregular mask is randomly generated. The qualitative experimental result have shown that our method generates more effectiveness compared to the other six methods

and the repaired image is more consistent visually with the real image. Furthermore

the quantitative comparisons are conducted in five metrics like mean square error (MSE

)

peak signal-to-noise ratio (PSNR)

structural similarity (SSIM)

Fréchet inception distance (FID) and inception score (IS) between the proposed method and other methods. Our experimental results indicate that our repaired images have its potentials of visual improvement and numerical performance. For example

the FID is 12.893 in the CelebA dataset in the case of regular missing regions

the FID (lower is better)is decreased by 39.2% compared to the second method. In addition

the PSNR (higher is better) is increased by 12.64%

6.77% and 4.41% in the UCF Google Street View dataset

respectively. Meanwhile

we carry out ablation studies to verify the effectiveness of the proposed dual decoder. The effectiveness of loss function

multi-scale attention

and U-Net is also verified. Our model can enhance the visual consistency effectively between the repaired and real images

which is capable to produce more effective content for the missing regions of the images.

Conclusion

A novel image inpainting model is facilitated based on the multiple optimizations in network structure

training time

and image inpainting results. Our proposed method reduces the training time of the model effectively via utilizing a dual decoder and resolve the dependency issue in the two-stage network model simultaneously. The repaired images of our method has better visual consistency in related to consistency loss

perceptual loss

multi-scale attention module. Some limitations of the inpainting effect is challenged to be customized further for complex image structure.

关键词

图像修复语义一致双解码器跳跃连接多尺度注意力模块

Keywords

image inpaintingsemantic consistencydual decoderskip connectionmulti-scale attention module

references

Armanious K, Kumar V, Abdulatif S, Hepp T, Gatidis S and Yang B. 2020. IpA-MedGAN: inpainting of arbitrary regions in medical imaging//Proceedings of 2020 IEEE International Conference on Image Processing. Abu Dhabi, United Arab Emirates: IEEE: 3005-3009 [DOI: 10.1109/ICIP40778.2020.9191207http://dx.doi.org/10.1109/ICIP40778.2020.9191207]

Barnes C, Shechtman E, Finkelstein A and Goldman D B. 2009. PatchMatch: a randomized correspondence algorithm for structural image editing. ACM Transactions on Graphics, 28(3): #24 [DOI: 10.1145/1531326.1531330]

Bertalmio M, Sapiro G, Caselles V and Ballester C. 2000. Image inpainting//Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques. New York, USA: ACM: 417-424 [DOI: 10.1145/344779.344972http://dx.doi.org/10.1145/344779.344972]

Doersch C, Singh S, Gupta A, Sivic J and Efros A A. 2015. What makes Paris look like Paris? Communications of the ACM, 58(12): 103-110 [DOI: 10.1145/2830541http://dx.doi.org/10.1145/2830541]

Drori I, Cohen-Or D and Yeshurun H. 2003. Fragment-based image completion. ACM Transactions on Graphics, 22(3): 303-312 [DOI: 10.1145/882262.882267]

Gatys L A, Ecker A S and Bethge M. 2016. Image style transfer using convolutional neural networks//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 2414-2423 [DOI: 10.1109/CVPR.2016.265http://dx.doi.org/10.1109/CVPR.2016.265]

Heusel M, Ramsauer H, Unterthiner T, Nessler B and Hochreiter S. 2017. GANs trained by a two time-scale update rule converge to a local Nash equilibrium//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA: Curran Associates Inc. : 6629-6640

Iizuka S, Simo-Serra E and Ishikawa H. 2017. Globally and locally consistent image completion. ACM Transactions on Graphics, 36(4): # 107 [DOI: 10.1145/3072959.3073659]

Isola P, Zhu J Y, Zhou T H and Efros A A. 2017. Image-to-image translation with conditional adversarial networks//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 5967-5976 [DOI: 10.1109/CVPR.2017.632http://dx.doi.org/10.1109/CVPR.2017.632]

Johnson J, Alahi A and Li F F. 2016. Perceptual losses for real-time style transfer and super-resolution//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands: Springer: 694-711 [DOI: 10.1007/978-3-319-46475-6_43http://dx.doi.org/10.1007/978-3-319-46475-6_43]

Kingma D P and Ba J. 2017. Adam: a method for stochastic optimization[EB/OL]. [2021-04-12].https://arxiv.org/pdf/1412.6980.pdfhttps://arxiv.org/pdf/1412.6980.pdf

Krause J, Stark M, Deng J and Li F F. 2013. 3D object representations for fine-grained categorization//Proceedings of 2013 IEEE International Conference on Computer Vision Workshops. Sydney, Australia: IEEE: 554-561 [DOI: 10.1109/ICCVW.2013.77http://dx.doi.org/10.1109/ICCVW.2013.77]

Liu G L, Reda F A, Shih K J, Wang T C, Tao A and Catanzaro B. 2018. Image inpainting for irregular holes using partial convolutions//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 89-105 [DOI: 10.1007/978-3-030-01252-6_6http://dx.doi.org/10.1007/978-3-030-01252-6_6]

Liu H Y, Jiang B, Xiao Y and Yang C. 2019. Coherent semantic attention for image inpainting//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 4169-4178 [DOI: 10.1109/ICCV.2019.00427http://dx.doi.org/10.1109/ICCV.2019.00427]

Liu K H, Wang X H, Xie Y T and Hu J Y. 2021. Edge-guided GAN: a depth image inpainting approach guided by edge information. Journal ofImage and Graphics, 26(1): 186-197

刘坤华, 王雪辉, 谢玉婷, 胡坚耀. 2019. Edge-guided GAN: 边界信息引导的深度图像修复. 中国图象图形学报, 26(1): 186-197 [DOI: 10.11834/jig.200509]

Liu Z W, Luo P, Wang X G and Tang X O. 2015. Deep learning face attributes in the wild//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 3730-3738 [DOI: 10.1109/ICCV.2015.425http://dx.doi.org/10.1109/ICCV.2015.425]

Miyato T, Kataoka T, Koyama, M and Yoshida Y. 2018. Spectral normalization for generative adversarial networks [EB/OL]. [2021-04-12].https://openreview.net/forum?id=B1QRgziT-https://openreview.net/forum?id=B1QRgziT-

Nazeri K, Ng E, Joseph T, Qureshi F and Ebrahimi M. 2019. EdgeConnect: structure guided image inpainting using edge prediction//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision Workshop. Seoul, Korea (South): IEEE: 3265-3274 [DOI: 10.1109/ICCVW.2019.00408http://dx.doi.org/10.1109/ICCVW.2019.00408]

Pathak D, Krähenbühl P, Donahue J, Darrell T and Efros A A. 2016. Context encoders: feature learning by inpainting//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 2536-2544 [DOI: 10.1109/CVPR.2016.278http://dx.doi.org/10.1109/CVPR.2016.278]

Qiang Z P, He L B, Chen X and Xu D. 2019. Survey on deep learning image inpainting methods. Journal of Image and Graphics, 24(3): 447-463

强振平, 何丽波, 陈旭, 徐丹. 2019. 深度学习图像修复方法综述. 中国图象图形学报, 24(3): 447-463 [DOI: 10.11834/jig.180408]

Ronneberger O, Fischer P and Brox T. 2015. U-net: convolutional networks for biomedical image segmentation// Proceedings of the 18th International Conference on Medical Image Computing and Computer- Assisted Intervention. Munich, Germany: Springer: 234-241 [DOI: 10.1007/978-3-319-24574-4_28http://dx.doi.org/10.1007/978-3-319-24574-4_28]

Sagong M C, Shin Y G, Kim S W, Park S and Ko S J. 2019. PEPSI: fast image inpainting with parallel decoding network//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 11352-11360 [DOI: 10.1109/CVPR.2019.01162http://dx.doi.org/10.1109/CVPR.2019.01162]

Salimans T, Goodfellow I J, Zaremba W, Cheung V, Radford A and Chen X. 2016. Improved techniques for training GANs//Proceedings of the 30th International Conference on Neural Information Processing Systems. Barcelona, Spain: Curran Associates Inc. : 2234-2242

Shen J H and Chan T F. 2002. Mathematical models for local nontexture inpaintings. SIAM Journal on Applied Mathematics, 62(3): 1019-1043 [DOI: 10.1137/S0036139900368844]

Song L S, Cao J, Song L X, Hu Y B and He R. 2019. Geometry-aware face completion and editing. Proceedings of the AAAI Conference on Artificial Intelligence, 33(1): 2506-2513 [DOI: 10.1609/aaai.v33i01.33012506]

Wang N, Li J Y, Zhang L F and Du B. 2019. MUSICAL: multi-scale image contextual attention learning for inpainting//Proceedings of the 28th International Joint Conference on Artificial Intelligence. Macao, China: IJCAI: 3748-3754 [DOI: 10.24963/ijcai.2019/520http://dx.doi.org/10.24963/ijcai.2019/520]

Wang X S, Zeng W B, Zheng H Y, Dan T R, Li H Z and Sheng J Q. 2020. A two-step feature extraction algorithm for face recognition//Proceedings of the 6th International Conference on Computing and Data Engineering. Sanya, China: ACM: 154-158 [DOI: 10.1145/3379247.3379249http://dx.doi.org/10.1145/3379247.3379249]

Wang Z, Bovik A C, Sheikh H R and Simoncelli E P. 2004. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4): 600-612 [DOI: 10.1109/TIP.2003.819861]

Yan Z Y, Li X M, Li M, Zuo W M and Shan S G. 2018. Shift-Net: image inpainting via deep feature rearrangement//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 3-19 [DOI: 10.1007/978-3-030-01264-9_1http://dx.doi.org/10.1007/978-3-030-01264-9_1]

Yu J H, Lin Z, Yang J M, Shen X H, Lu X and Huang T S. 2018. Generative image inpainting with contextual attention//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 5505-5514 [DOI: 10.1109/CVPR.2018.00577http://dx.doi.org/10.1109/CVPR.2018.00577]

Yu J H, Lin Z, Yang J M, Shen X H, Lu X and Huang T. 2019. Free-form image inpainting with gated convolution//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 4470-4479 [DOI: 10.1109/ICCV.2019.00457http://dx.doi.org/10.1109/ICCV.2019.00457]

Zhang H, Goodfellow I, Metaxas D and Odena A. 2019. Self-attention generative adversarial networks//Proceedings of the 36th International Conference on Machine Learning. Long Beach, USA: PMLR: 7354-7363

文章被引用时，请邮件提醒。

提交

混合双注意力机制生成对抗网络的图像修复模型