联合语义分割与边缘重建的深度学习图像修复

杨红菊; 李丽琴; 王鼎

doi:10.11834/jig.210702

图像理解和计算机视觉 | 浏览量 : 0 下载量: 2 CSCD: 2

PDF
导出
分享
收藏
专辑

联合语义分割与边缘重建的深度学习图像修复
Deep learning image inpainting combining semantic segmentation reconstruction and edge reconstruction
2022年27卷第12期页码：3553-3565
纸质出版日期： 2022-12-16 ，

录用日期： 2021-11-12
DOI： 10.11834/jig.210702
稿件说明：

移动端阅览

杨红菊, 李丽琴, 王鼎. 联合语义分割与边缘重建的深度学习图像修复[J]. 中国图象图形学报, 2022,27(12):3553-3565.

Hongju Yang, Liqin Li, Ding Wang. Deep learning image inpainting combining semantic segmentation reconstruction and edge reconstruction[J]. Journal of Image and Graphics, 2022,27(12):3553-3565.
杨红菊, 李丽琴, 王鼎. 联合语义分割与边缘重建的深度学习图像修复[J]. 中国图象图形学报, 2022,27(12):3553-3565. DOI： 10.11834/jig.210702.

Hongju Yang, Liqin Li, Ding Wang. Deep learning image inpainting combining semantic segmentation reconstruction and edge reconstruction[J]. Journal of Image and Graphics, 2022,27(12):3553-3565. DOI： 10.11834/jig.210702.

摘要

目的

传统图像修复方法缺乏对图像高级语义的理解，只能应对结构纹理简单的小面积受损。现有的端到端深度学习图像修复方法在大量训练图像的支持下克服了上述局限性，但由于这些方法试图在约束不足的情况下恢复整个目标，修复的图像往往存在边界模糊和结构扭曲问题。对此，本文提出一种语义分割结构与边缘结构联合指导的深度学习图像修复方法。

方法

该方法将图像修复任务分解为语义分割重建、边缘重建和内容补全3个阶段。首先重建缺失区域的语义分割结构，然后利用重建的语义分割结构指导缺失区域边缘结构的重建，最后利用重建的语义分割结构与边缘结构联合指导图像缺失区域内容的补全。

结果

在CelebAMask-HQ(celebfaces attributes mask high quality)人脸数据集和Cityscapes城市景观数据集上，将本文方法与其他先进的图像修复方法进行对比实验。在掩膜比例为50%~60%的情况下，与性能第2的方法相比，本文方法在Celebamask-HQ数据集上的平均绝对误差降低了4.5%，峰值信噪比提高了1.6%，结构相似性提高了1.7%；在Cityscapes数据集上平均绝对误差降低了4.2%，峰值信噪比提高了1.5%，结构相似性提高了1.9%。结果表明，本文方法在平均绝对误差、峰值信噪比和结构相似性3个指标上均优于对比方法，且生成的图像边界清晰，视觉上更加合理。

结论

本文提出的3阶段图像修复方法在语义分割结构与边缘结构的联合指导下，有效减少了结构重建错误。当修复涉及大面积缺失时，该方法比现有方法具有更高的修复质量。

Abstract

Objective

Image in-painting is to reconstruct the missing areas of distorted images. This technique is widely used in multiple scenes like image editing

image de-noising

cultural relics preservation. Conventional image in-painting methods are based on patch blocks to fill the missing pixels or to transmit the pixels to the missing region via diffusion mechanism. These methods have achieved regular effects or small defects in in-painting. However

due to the lack of semantic understanding of the image

more generated images often have a non-photorealistic sense of inconsistent semantic structure when filling large-scale consistent holes. Deep learning-based in-painting method can learn the high-level semantic information of the image from a large number of data. Although these methods have made significant progress in image inpainting

they are often unable to reconstruct feasible structures. Current methods are focused on target-completed restoration without sufficient constraints

so the generated images often have the problems of fuzzy boundaries and distorted structures.

Method

Our research is aimed to develop a deep image inpainting method guided by semantic segmentation and edge. It divides the image inpainting task into three steps: 1) semantic segmentation reconstruction

2) edge reconstruction and 3) content restoration. First

the semantic segmentation reconstruction module reconstructs the semantic segmentation structure. Then

the reconstructed semantic segmentation structure is used to guide the reconstruction of the edge structure of the missing area. Finally

the reconstructed semantic segmentation structure and edge structure are used to guide the content restoration of the missing area. Semantic segmentation can represent the global structure information of the image well. 1) The reconstruction of the semantic segmentation structure can improve the accuracy of edge structure-reconstructed. 2) Edge contains rich structural information

reconstructing the edge structure is beneficial to generate more inner details of object. 3) Under the guidance of reconstructed semantic segmentation structure and edge structure

the content restoration can use texture in-painting to clear the boundary of the generated image. The structure is more reasonable

and the texture is more real. Our network structure is based on the generative adversarial network (GAN-based)

including generator and discriminator. The generator network uses encoder-decoder structure and the discriminator network uses 70 × 70 PatchGAN structure. Joint loss is adopted in terms of loss function in the three steps

which can approach the in-painting results of each step to real results. The two reconstructed modules of semantic segmentation and edge use adversarial loss and feature matching loss. Our feature matching loss used actually includes L1 loss function. Feature matching loss is similar to perceptual loss

which can clarify the ground truth issue of semantic segmentation structure and edge structure. The content restoration module can add the perception loss and style loss in the context of image in-painting when style loss can reduce the "checkerboard" artifact caused by transpose convolution layer.

Result

First

we analyze the performance of semantic segmentation reconstruction module quantitatively and qualitatively. The results show that the semantic segmentation reconstruction module can reconstruct the feasibility of semantic segmentation structure. When the mask is small

the pixel accuracy can reach 99.16%

and for the larger mask

the pixel accuracy can also reach 92.64%. Next

we compare the edge reconstruction results quantitatively. It shows that the accuracy and recall of the reconstructed edge structure are optimized further under the guidance of the semantic segmentation structure. Finally

the method proposed is compared with four popular image in-painting methods on CelebAMask HQ (celebfaces attributes mask high quality) dataset and Cityscapes dataset. When the mask ratio is 50%~60%

compared with the second-performing method

the mean absolute error (MAE) on the CelebAMask-HQ dataset is reduced by 4.5%

the peak signal-to-noise ratio (PSNR) is increased by 1.6%

and the structure similarity index measure (SSIM) is increased by 1.7%; the MAE on the Cityscapes dataset is reduced by 4.2%

the PSNR is increased by 1.5%

and the SSIM is increased by 1.9%. Our method is optimized for the three indexes of MAE

PSNR and SSIM

the generated image has more clear boundaries and visibility.

Conclusion

Our 3-steps image in-painting method introduces the guidance of semantic segmentation structure

which can significantly improve the accuracy of edge reconstruction. In addition

it can reduce structure reconstruction errors effectively through the joint guidance of semantic segmentation structure and edge structure. It has stronger potentials in-painting quality for large-area deletions-oriented in-painting task.

关键词

图像修复生成对抗网络(GAN)语义分割边缘检测深度学习

Keywords

image inpaintinggenerative adversarial network(GAN)semantic segmentationedge detectiondeeplearning

references

Ballester C, Bertalmio M, Caselles V, Sapiro G and Verdera J. 2001. Filling-in by joint interpolation of vector fields and gray levels. IEEE Transactions on Image Processing, 10(8): 1200-1211 [DOI: 10.1109/83.935036]

Barnes C, Shechtman E, Finkelstein A and Goldman D B. 2009. PatchMatch: a randomized correspondence algorithm for structural image editing. ACM Transactions on Graphics, 28(3): #24 [DOI: 10.1145/1531326.1531330]

Bertalmio M, Sapiro G, Caselles V and Ballester C. 2000. Image inpainting//Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques. New Orleans, USA: ACM: 417-424 [DOI: 10.1145/344779.344972http://dx.doi.org/10.1145/344779.344972]

Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Franke U, Roth S and Schiele B. 2016. The cityscapes dataset for semantic urban scene understanding//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 3213-3223 [DOI: 10.1109/CVPR.2016.350http://dx.doi.org/10.1109/CVPR.2016.350]

Elharrouss O, Almaadeed N, Al-Maadeed S and Akbari Y. 2020. Image inpainting: a review. Neural Processing Letters, 51(2): 2007-2028 [DOI: 10.1007/s11063-019-10163-0]

Gatys L A, Ecker A S and Bethge M. 2016. Image style transfer using convolutional neural networks//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 2414-2423 [DOI: 10.1109/CVPR.2016.265http://dx.doi.org/10.1109/CVPR.2016.265]

Goodfellow I J, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A and Bengio Y. 2014. Generative adversarial nets//Proceedings of the 27th International Conference on Neural Information Processing Systems. Montreal, Canada: CAM: 2672-2680

He K M, Zhang X Y, Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 770-778 [DOI: 10.1109/CVPR.2016.90http://dx.doi.org/10.1109/CVPR.2016.90]

Iizuka S, Simo-Serra E and Ishikawa H. 2017. Globally and locally consistent image completion. ACM Transactions on Graphics, 36(4): #107 [DOI: 10.1145/3072959.3073659]

Isola P, Zhu J Y, Zhou T H and Efros A A. 2017. Image-to-image translation with conditional adversarial networks//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 5967-5976 [DOI: 10.1109/CVPR.2017.632http://dx.doi.org/10.1109/CVPR.2017.632]

Johnson J, Alahi A and Li F F. 2016. Perceptual losses for Real-Time style transfer and Super-Resolution//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands: Springer: 694-711 [DOI: 10.1007/978-3-319-46475-6_43http://dx.doi.org/10.1007/978-3-319-46475-6_43]

Kwatra V, Essa I, Bobick A and Kwatra N. 2005. Texture optimization for example-based synthesis. ACM Transactions on Graphics, 24(3): 795-802 [DOI: 10.1145/1073204.1073263]

Li J Y, He F X, Zhang L F, Du B and Tao D C. 2019. Progressive reconstruction of visual structure for image inpainting//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 5961-5970 [DOI: 10.1109/ICCV.2019.00606http://dx.doi.org/10.1109/ICCV.2019.00606]

Li J Y, Wang N, Zhang L F, Du B and Tao D C. 2020. Recurrent feature reasoning for image inpainting//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 7757-7765 [DOI: 10.1109/CVPR42600.2020.00778http://dx.doi.org/10.1109/CVPR42600.2020.00778]

Liu G L, Reda F A, Shih K J, Wang T C, Tao A and Catanzaro B. 2018. Image inpainting for irregular holes using partial convolutions//Proceedings of 2018 European Conference on Computer Vision. Munich, Germany: Springer: 89-105 [DOI: 10.1007/978-3-030-01252-6_6http://dx.doi.org/10.1007/978-3-030-01252-6_6]

Liu Z W, Luo P, Wang X G and Tang X O. 2015. Deep learning face attributes in the wild//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 3730-3738 [DOI: 10.1109/ICCV.2015.425http://dx.doi.org/10.1109/ICCV.2015.425]

Miyato T, Kataoka T, Koyama M and Yoshida Y. 2018. Spectral normalization for generative adversarial networks//Proceedings of the 6th International Conference on Learning Representations. Vancouver, Canada: OpenReview. net

Nazeri K, Ng E, Joseph T, Qureshi F and Ebrahimi M. 2019. EdgeConnect: structure guided image inpainting using edge prediction//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision Workshop. Seoul, Korea (South): IEEE: 3265-3274 [DOI: 10.1109/ICCVW.2019.00408http://dx.doi.org/10.1109/ICCVW.2019.00408]

Odena A, Buckman J, Olsson C, Brown T B, Olah C, Raffel C and Goodfellow I J. 2018. Is generator conditioning causally related to GAN performance?//Proceedings of the 35th International Conference on Machine Learning. Stockholm, Sweden: PMLR: 3849-3858

Pathak D, Krähenbühl P, Donahue J, Darrell T and Efros A A. 2016. Context encoders: feature learning by inpainting//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 2536-2544 [DOI: 10.1109/CVPR.2016.278http://dx.doi.org/10.1109/CVPR.2016.278]

Qiang Z P, He L B, Chen X and Xu D. 2019. Survey on deep learning image inpainting methods. Journal of Image and Graphics, 24(3): 447-463

强振平, 何丽波, 陈旭, 徐丹. 2019. 深度学习图像修复方法综述. 中国图象图形学报, 24(3): 447-463 [DOI: 10.11834/jig.180408]

Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z H, Karpathy A, Khosla A, Bernstein M, Berg A C and Li F F. 2015. ImageNet large scale visual recognition challenge. International Journal of Computer Vision, 115(3): 211-252 [DOI: 10.1007/s11263-015-0816-y]

Sajjadi M S M, Schölkopf B and Hirsch M. 2017. EnhanceNet: single image super-resolution through automated texture synthesis//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 4501-4510 [DOI: 10.1109/ICCV.2017.481http://dx.doi.org/10.1109/ICCV.2017.481]

Wadhwa G, Dhall A, Murala S and Tariq U. 2021. Hyperrealistic image inpainting with hypergraphs//Proceedings of 2021 IEEE Winter Conference on Applications of Computer Vision. Waikoloa, USA: IEEE: 3912-3921 [DOI: 10.1109/WACV48630.2021.00396http://dx.doi.org/10.1109/WACV48630.2021.00396]

Wan Z Y, Zhang B, Chen D D, Zhang P, Chen D, Liao J and Wen F. 2020. Bringing old photos back to life//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 2744-2754 [DOI: 10.1109/CVPR42600.2020.00282http://dx.doi.org/10.1109/CVPR42600.2020.00282]

Wang T C, Liu M Y, Zhu J Y, Tao A, Kautz J and Catanzaro B. 2018. High-Resolution image synthesis and semantic manipulation with conditional GANs//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 8798-8807 [DOI: 10.1109/CVPR.2018.00917http://dx.doi.org/10.1109/CVPR.2018.00917]

Wang Z, Bovik A C, Sheikh H R and Simoncelli E P. 2004. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4): 600-612 [DOI: 10.1109/TIP.2003.819861]

Yu J H, Lin Z, Yang J M, Shen X H, Lu X and Huang T. 2019. Free-Form image inpainting with gated convolution//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 4470-4479 [DOI: 10.1109/ICCV.2019.00457http://dx.doi.org/10.1109/ICCV.2019.00457]

Zhang G M and Li Y B. 2019. Image inpainting of fractional TV model combined with texture structure. Journal of Image and Graphics, 24(5): 700-713

张桂梅, 李艳兵. 2019. 结合纹理结构的分数阶TV模型的图像修复. 中国图象图形学报, 24(5): 700-713 [DOI: 10.11834/jig.180509]

Zhang L and Chang M H. 2021. An image inpainting method for object removal based on difference degree constraint. Multimedia Tools and Applications, 80(3): 4607-4626 [DOI: 10.1007/s11042-020-09835-0]

Zhu J Y, Park T, Isola P and Efros A A. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks//Proceedings of 2017 IEEE International Conference on Computer Vision.Venice, Italy: IEEE: 2242-2251

文章被引用时，请邮件提醒。

提交

结合门循环单元和生成对抗网络的图像文字去除

结合双边交叉增强与自注意力补偿的点云语义分割

面向无人机海岸带生态系统监测的语义分割基准数据集

面向人脸修复篡改检测的大规模数据集

基于深度学习的弱监督语义分割方法综述