全局与局部属性一致的图像修复模型
Image inpainting model with consistent global and local attributes
- 2020年25卷第12期 页码:2505-2516
收稿日期:2019-12-30,
修回日期:2020-03-17,
录用日期:2020-3-26,
纸质出版日期:2020-12-16
DOI: 10.11834/jig.190681
移动端阅览
浏览全部资源
扫码关注微信
收稿日期:2019-12-30,
修回日期:2020-03-17,
录用日期:2020-3-26,
纸质出版日期:2020-12-16
移动端阅览
目的
2
图像修复是计算机视觉领域的研究热点之一。基于深度学习的图像修复方法取得了一定成绩,但在处理全局与局部属性联系密切的图像时难以获得理想效果,尤其在修复较大面积图像缺损时,结果的语义合理性、结构连贯性和细节准确性均有待提高。针对上述问题,提出一种基于全卷积网络,结合生成式对抗网络思想的图像修复模型。
方法
2
基于全卷积神经网络,结合跳跃连接、扩张卷积等方法,提出一种新颖的图像修复网络作为生成器修复缺损图像;引入结构相似性(structural similarity,SSIM)作为图像修复的重构损失,从人眼视觉系统的角度监督指导模型学习,提高图像修复效果;使用改进后的全局和局部上下文判别网络作为双路判别器,对修复结果进行真伪判别,同时,结合对抗式损失,提出一种联合损失用于监督模型的训练,使修复区域内容真实自然且与整幅图像具有属性一致性。
结果
2
为验证本文图像修复模型的有效性,在CelebA-HQ数据集上,以主观感受和客观指标为依据,与目前主流的图像修复算法进行图像修复效果对比。结果表明,本文方法在修复结果的语义合理性、结构连贯性以及细节准确性等方面均取得了进步,峰值信噪比(peak signal-to-noise ratio,PSNR)和结构相似性的均值分别达到31.30 dB和90.58%。
结论
2
本文提出的图像修复模型对图像高级语义有更好的理解,对上下文信息和细节信息把握更精准,能取得更符合人眼视觉感受的图像修复结果。
Objective
2
Image inpainting is a hot research topic in computer vision. In recent years
this task has been considered a conditional pattern generation problem in deep learning that has received much attention from researchers. Compared with traditional algorithms
deep-learning-based image inpainting methods can be used in more extensive scenarios with better inpainting effects. Nevertheless
these methods have limitations. For instance
their image inpainting results need to be improved in terms of semantic rationality
structural coherence
and detail accuracy when processing the close association among global and local attributed images
especially when dealing with images involving a large defect area. This paper proposes a novel image inpainting model based on the fully convolutional neural network and the idea of generative adversarial network to solve the above problems. This model optimizes the network structure
loss constraints
and training strategies to obtain improved image inpainting effects.
Method
2
First
this paper proposes a novel image inpainting network as a generator to repair defective images by using effective methods in the field of image processing. A network framework based on a fully convolutional neural network is then built in the form of an encoder-decoder. For instance
we replace part of convolutional layers in the network decoding stage with dilated convolution. We also apply dilated convolution superposition with multiple dilation rates to obtain a larger input image area compared with ordinary convolution in small-size feature graphs and then effectively increase the receptive field of the convolution kernel without increasing the calculation amount to develop a better understanding of images. We also set long-skip connections in the corresponding stage of encoding-decoding. This connection strengthens the structural information by transmitting low-level features to the decoding stage. The setting enhances the correlation among deep features and reduces the difficulties in network training. Second
we introduce structural similarity (SSIM) as the reconstruction loss of image inpainting. This image quality evaluation index is built from the perspective of the human visual perception system and differs from the common mean square error (MSE) loss per pixel. This index comprehensively evaluates via an experiment the similarity between two images in their brightness
contrast
and structure. Structural similarity
as the reconstruction loss of an image
can effectively improve the visual effects of image inpainting results. We use the improved global and local context discriminator as a two-way discriminator to determine the authenticity of the inpainting results. The global context discriminator guarantees the consistency of attributes between the image inpainting area and the entire image
whereas the local context discriminator improves the detailed performance of the image inpainting area. Combined with adversarial loss
this paper proposes a joint loss to improve the performance of the model and reduce the difficulties in its training. By drawing lessons from the training mode of generative adversarial networks
we presents a novel method to alternately train image inpainting network and image discriminative network
which obtains an ideal result. In practical applications
we only use image inpainting network to repair defective images.
Result
2
To verify the effectiveness of the proposed image inpainting model
we compare the image inpainting effect of this model with that of mainstream image inpainting algorithms on the CelebA-HQ dataset by using subjective perception and objective indicators. To achieve the best inpainting effect in controlled experiments
we use official versions of codes and examples. The image inpainting result is taken from loading pre-training files or online demos. We place the specific defect mask onto 50 randomly selected images as test cases and then apply different image inpainting algorithms to repair and collect statistics for the comparison. The CelebA-HQ dataset is a cropped and super-resolution reconstructed version of the CelebA dataset
which contains 30 000 high-resolution face images. The human face represents a special image that not only contains specific features but also an infinite amount of details. Therefore
face images can fully test the expressiveness of the image inpainting method. Considering the algorithm consistent attribute of the global and local images in the controlled experiment
experiment results show that the image inpainting model demonstrates some improvements in its semantic rationality
structural coherence
and detail performance compared with other algorithms. Subjectively
this model has a natural edge transition and a very detailed image inpainting area. Objectively
this model has a peak signal-to-noise ratio(PSNR)
and SSIM of 31.30 dB and 90.58% on average
respective
both of which exceed those of mainstream deep learning-based image inpainting algorithms. To verify its generality
we test the image inpainting model on the Places2 dataset.
Conclusion
2
This paper proposes a novel image inpainting model that shows improvements in terms of network structure
cost
training strategy
and image inpainting results. This model also provides a better understanding of the high-level semantics of images. Given its highly accurate context and details
the proposed model obtains better image inpainting results from human visual perception. We will continue to improve the effect of image inpainting and explore the conditional image inpainting task in the future. Our plan is to improve and optimize this model in terms of network structure and loss constraint to reduce losses in an image during the feature extraction process under a controllable network training setup. We shall also try to make the defect mask do more work with channel domain attention mechanism to further improve the quality of image inpainting. We also plan to analyze the relationship between image boundary structure and feature reconstruction. We aim to improve the convergence speed of network training and the quality of image inpainting by using an accurate and effective loss function. Furthermore
we would use human-computer interaction or presupposed condition to affect the results of image inpainting
which explores more practical values of the model.
Bertalmio M, Sapiro G, Caselles V and Ballester C. 2000. Image inpainting//Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques. New York, USA: ACM Press: 417-424[ DOI: 10.1145/344779.344972 http://dx.doi.org/10.1145/344779.344972 ]
Criminisi A, Perez P and Toyama K. 2004. Region filling and object removal by exemplar-based image inpainting. IEEE Transactions on Image Processing, 13(9):1200-1212[DOI:10.1109/TIP.2004.833105]
Goodfellow I J, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A and Bengio Y. 2014. Generative adversarial nets//Proceedings of the 27th International Conference on Neural Information Processing Systems. Montreal, Canada: ACM: 2672-2680
Hays J and Efros A A. 2007. Scene completion using millions of photographs. ACM Transactions on Graphics, 26(3):#4[DOI:10.1145/1275808.1276382]
He K M, Zhang X Y, Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas,USA: IEEE: 770-778[ DOI: 10.1109/CVPR.2016.90 http://dx.doi.org/10.1109/CVPR.2016.90 ]
Iizuka S, Simo-Serra E and Ishikawa H. 2017. Globally and locally consistent image completion. ACM Transactions on Graphics, 36(4):#107[DOI:10.1145/3072959.3073659]
Pathak D, Krähenbühl P, Donahue J, Darrell T and Efros A A. 2016. Context encoders: feature learning by inpainting//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 2536-2544[ DOI: 10.1109/CVPR.2016.278 http://dx.doi.org/10.1109/CVPR.2016.278 ]
Portenier T, Hu Q Y, Szabó A, Bigdeli S A, Favaro P and Zwicker M. 2018. Faceshop:Deep sketch-based face image editing. ACM Transactions on Graphics, 37(4):#99[DOI:10.1145/3197517.3201393]
Qiang Z P, He L B, Chen X and Xu D. 2019a. Image inpainting using image structural component and patch matching. Journal of Computer-Aided Design and Computer Graphics, 31(5):821-830
强振平, 何丽波, 陈旭, 徐丹. 2019a.利用图像结构成分的优先块匹配图像修复方法.计算机辅助设计与图形学学报, 31(5):821-830[DOI:10.3724/SP.J.1089.2019.17368]
Qiang Z P, He L B, Chen X and Xu D. 2019b. Survey on deep learning image inpainting methods. Journal of Image and Graphics, 24(3):447-463
强振平, 何丽波, 陈旭, 徐丹. 2019b.深度学习图像修复方法综述.中国图象图形学报, 24(3):447-463[DOI:10.11834/jig.180408]
Ronneberger O, Fischer P and Brox T. 2015. U-Net: convolutional networks for biomedical image segmentation//Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention. Munich, Germany: Springer: 234-241[ DOI: 10.1007/978-3-319-24574-4_28 http://dx.doi.org/10.1007/978-3-319-24574-4_28 ]
Salimans T, Goodfellow I, Zaremba W, Cheung V, Radford A and Chen X. 2016. Improved techniques for training GANs//Proceedings of the 30th International Conference on Neural Information Processing Systems. Barcelona, Spain: ACM: 2234-2242
Shen B, Hu W, Zhang Y M and Zhang Y J. 2009. Image inpainting via sparse representation//Proceedings of 2009 IEEE International Conference on Acoustics, Speech, and Signal Processing. Taipei, China: IEEE: 697-700[ DOI: 10.1109/ICASSP.2009.4959679 http://dx.doi.org/10.1109/ICASSP.2009.4959679 ]
Shen J H and Chan T F. 2002. Mathematical models for local nontexture inpaintings. SIAM Journal on Applied Mathematics, 62(3):1019-1043[DOI:10.1137/S0036139900368844]
Tsai A, Yezzi A and Willsky A S. 2001. Curve evolution implementation of the Mumford-Shah functional for image segmentation, denoising, interpolation, and magnification. IEEE Transactions on Image Processing, 10(8):1169-1186[DOI:10.1109/83.935033]
Wang Y, Tao X, Qi X J, Shen X Y and Jia J Y. 2018. Image inpainting via generative multi-column convolutional neural networks//Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montréal, Canada: ACM: 329-338
Wang Z, Bovik A C, Sheikh H R and Simoncelli E P. 2004. Image quality assessment:from error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4):600-612[DOI:10.1109/TIP.2003.819861]
Yang C, Lu X, Lin Z, Shechtman E, Wang O and Li H. 2017. High-resolution image inpainting using multi-scale neural patch synthesis//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 4076-4084[ DOI: 10.1109/CVPR.2017.434 http://dx.doi.org/10.1109/CVPR.2017.434 ]
Yu F, Koltun V. 2016. Multi-Scale Context Aggregation by Dilated Convolutions[EB/OL]. https://arxiv.org/pdf/1511.07122.pdf https://arxiv.org/pdf/1511.07122.pdf
Yu J H, Lin Z, Yang J M, Shen X H, Lu X and Huang T S. 2018. Generative image inpainting with contextual attention//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 5505-5514[ DOI: 10.1109/CVPR.2018.00577 http://dx.doi.org/10.1109/CVPR.2018.00577 ]
相关作者
相关机构