混合双注意力机制生成对抗网络的图像修复模型
HDA-GAN: hybrid dual attention generative adversarial network for image inpainting
- 2023年28卷第11期 页码:3440-3452
纸质出版日期: 2023-11-16
DOI: 10.11834/jig.220919
移动端阅览
浏览全部资源
扫码关注微信
纸质出版日期: 2023-11-16 ,
移动端阅览
兰治, 严彩萍, 李红, 郑雅丹. 2023. 混合双注意力机制生成对抗网络的图像修复模型. 中国图象图形学报, 28(11):3440-3452
Lan Zhi, Yan Caiping, Li Hong, Zheng Yadan. 2023. HDA-GAN: hybrid dual attention generative adversarial network for image inpainting. Journal of Image and Graphics, 28(11):3440-3452
目的
2
图像修复是指用合理的内容来填补图像缺失或损坏的部分。尽管生成对抗网络(generative adversarial network,GAN)取得了巨大的进步,但当缺失区域很大时,现有的大多数方法仍然会产生扭曲的结构和模糊的纹理。其中一个主要原因是卷积操作的局域性,它不考虑全局或远距离结构信息,只是扩大了局部感受野。
方法
2
为了克服上述问题,提出了一种新的图像修复网络,即混合注意力生成对抗网络(hybrid dual attention generative adversarial network,HDA-GAN),它可以同时捕获全局结构信息和局部细节纹理。具体地,HDA-GAN 将两种级联的通道注意力传播模块和级联的自注意力传播模块集成到网络的不同层中。对于级联的通道注意力传播模块,将多个多尺度通道注意力块级联在网络的高层,用于学习从低级细节到高级语义的特征。对于级联的自注意力传播模块,将多个基于分块的自注意力块级联在网络的中低层,以便在保留更多的细节的同时捕获远程依赖关系。级联模块将多个相同的注意力块堆叠成不同的层,能够增强局部纹理传播到全局结构。
结果
2
本文采用客观评价指标:均方差(mean squared error,MSE)、峰值信噪比(peak signal-to-noise ratio,PSNR)和结构相似性指数(structural similarity index,SSIM) 在Paris Street View数据集和CelebA-HQ(CelebA-high quality)数据集上进行了大量实验。定量比较中,HDA-GAN在Paris Street View数据集上相比于Edge-LBAM(edge-guided learnable bidirectional attention maps)方法,在掩码不同的比例上,PSNR提升了1.28 dB、1.13 dB、0.93 dB和0.80 dB,SSIM分别提升了5.2%、8.2%、10.6%和13.1%。同样地,在CelebA-HQ数据集上相比于AOT-GAN(aggregated contextual transformations generative adversarial network)方法,在掩码不同的比例上,MAE分别降低了2.2%、5.4%、11.1%、18.5%和28.1%,PSNR分别提升了0.93 dB、0.68 dB、0.73 dB、0.84 dB和0.74 dB。通过可视化实验可以明显观察到修复效果优于以上方法。
结论
2
本文提出的图像修复方法,充分发挥了深度学习模型进行特征学习和图像生成的优点,使得修复图像缺失或损坏的部分更加准确。
Objective
2
Image inpainting has been extensively examined as a basic topic in the field of image processing over the past two decades. Image inpainting attempts to fill in the missing or corrupted parts of an image with satisfactory and reasonable content. Given their inability to generate semantically compliant images, traditional techniques can succeed in certain straightforward situations but fall short when the missing region is large or complex. Image inpainting methods based on deep learning and adversarial learning have produced increasingly promising results in recent years. However, most of these methods produce distorted structures and hazy textures when the missing region is large. One primary cause of this problem is that these methods do not consider global or long-range structural information due to the locality of vanilla convolution operations, even with dilated convolution that enlarges the local receptive field.
Method
2
To overcome this issue, this study proposes a novel image inpainting network called hybrid dual attention generative adversarial network (HDA-GAN), which captures both global structural information and local detailed textures. Specifically, HDA-GAN integrates two types of cascaded attention propagation modules, namely, cascaded channel-attention propagation and cascaded self-attention propagation, into different convolutional layers of the generator network. For the cascaded channel-attention propagation module, several multi-scale channel-attention blocks are cascaded into shallow layers to learn features from low-level details to high-level semantics. The multi-scale channel-attention block adopts the split-attention-merge strategy and residual-gated operations to aggregate multiple channel attention correlations for enhancing high-level semantics while preserving low-level details. For the cascaded self-attention propagation module, several positional-separated self-attention blocks are stacked into middle and deep layers. These blocks also adopt the same split-attention-merge strategy and residual-gated operations as the multi-scale channel-attention blocks but with some changes. The purpose of this design is to use the positional-separated self-attention blocks to maintain the details while learning long-range semantic information interaction. The design of these blocks also further reduces the computational complexity compared with original self-attention.
Result
2
Numerous tests using the Paris Street View and CelebA-HQ datasets demonstrate that HDA-GAN can produce superior image inpainting in terms of quality and quantity with better restoration effects compared with several state-of-the-art algorithms. The Paris Street View dataset includes 15 000 street images of Paris, 14 900 training images, and 100 test images, while the CelebA-HQ dataset contains 30 000 high-quality human face images. The fine-grained texture synthesis of models may be evaluated using the high-frequency features of the hair and skin. Following a standard configuration, 28 000 images are used for training, and 2 000 are used for testing. In both training and testing, free-form masks are employed while adhering to the standard settings. Free-form masks are highly applicable to real-world settings and thus are used in many inpainting techniques. Following a standard setting, all images are resized to 512 × 512 pixels or 256 × 256 pixels for training and testing depending on the datasets. The mean squared error (MSE), peak signal-to-noise ratio (PSNR), and structural similarity index (SSIM) are introduced to evaluate the performance of different methods in filling holes with different hole-to-image region ratios. In the Paris Street View dataset, the PSNR of the proposed method increases by 1.28 dB, 1.13 dB, 0.93 dB, and 0.80 dB, while its SSIM increases by 5.2%, 8.2%, 10.6%, and 13.1% compared with the Edge-LBAM method as the hole-to-image region ratios increase. Meanwhile, in the CelebA-HQ dataset, the MSE value of the proposed method decreases by 2.2%, 5.4%, 11.1%, 18.5%, and 28.1%, while its PSNR increases by 0.93 dB, 0.68 dB, 0.73 dB, 0.84 dB, and 0.74 dB compared with the AOT-GAN method as the hole-to-image region ratios increase. These experimental results show that the proposed method quantitatively and qualitatively outperform the other algorithms.
Conclusion
2
This study proposes a novel hybrid attention generative adversarial network for image inpainting called HDA-GAN that can generate reasonable and satisfactory content for a distorted image by fusing two carefully designed attention propagation modules. Using the cascaded attention propagation module in skip-connect layers can significantly improve the global structure and local texture captured by the generator, which is crucial for inpainting, particularly when filling complex missing regions or large holes. The cascaded attention propagation modules will be applied to other vision tasks, such as image denoising, image translation, and single image super-resolution, in future work.
图像修复生成对抗网络(GAN)级联的通道注意力传播模块级联的自注意力传播模块大面积修复
image inpaintinggenerative adversarial network (GAN)cascaded channel attention propagation modulecascaded self-attention propagation modulelarge area inpainting
Barnes C, Shechtman E, Finkelstein A and Goldman D B. 2009. PatchMatch: a randomized correspondence algorithm for structural image editing. ACM Transactions on Graphics, 28(3): #24 [DOI: 10.1145/1531326.1531330http://dx.doi.org/10.1145/1531326.1531330]
Doersch C, Singh S, Gupta A, Sivic J and Efros A A. 2012. What makes Paris look like Paris? ACM Transactions on Graphics, 31(4): #101 [DOI: 10.1145/2185520.2185597http://dx.doi.org/10.1145/2185520.2185597]
Drori I, Cohen-Or D and Yeshurun H. 2003. Fragment-based image completion//Proceedings of 2003 ACM SIGGRAPH 2003. San Diego, USA: Association for Computing Machinery: 303-312 [DOI: 10.1145/1201775.882267http://dx.doi.org/10.1145/1201775.882267]
Fu J, Liu J, Tian H J, Li Y, Bao Y J, Fang Z W and Lu H Q. 2019. Dual attention network for scene segmentation//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 3141-3149 [DOI: 10.1109/CVPR.2019.00326http://dx.doi.org/10.1109/CVPR.2019.00326]
Gatys L A, Ecker A S and Bethge M. 2016. Image style transfer using convolutional neural networks//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA: IEEE: 2414-2423 [DOI: 10.1109/CVPR.2016.265http://dx.doi.org/10.1109/CVPR.2016.265]
Iizuka S, Simo-Serra E and Ishikawa H. 2017. Globally and locally consistent image completion. ACM Transactions on Graphics, 36(4): #107 [DOI: 10.1145/3072959.3073659http://dx.doi.org/10.1145/3072959.3073659]
Johnson J, Alahi A and Li F F. 2016. Perceptual losses for real-time style transfer and super-resolution//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands: Springer: 694-711 [DOI: 10.1007/978-3-319-46475-6_43http://dx.doi.org/10.1007/978-3-319-46475-6_43]
Karras T, Aila T, Laine S and Lehtinen J. 2018. Progressive growing of GANs for improved quality, stability, and variation//Proceedings of the 6th International Conference on Learning Representations. Vancouver, Canada: OpenReview.net
Komodakis N and Tziritas G. 2007. Image completion using efficient belief propagation via priority scheduling and dynamic pruning. IEEE Transactions on Image Processing, 16(11): 2649-2661 [DOI: 10.1109/TIP.2007.906269http://dx.doi.org/10.1109/TIP.2007.906269]
Lin G S, Milan A, Shen C H and Reid I. 2017. RefineNet: multi-path refinement networks for high-resolution semantic segmentation//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE: 5168-5177 [DOI: 10.1109/CVPR.2017.549http://dx.doi.org/10.1109/CVPR.2017.549]
Liu G L, Reda F A, Shih K J, Wang T C, Tao A and Catanzaro B. 2018. Image inpainting for irregular holes using partial convolutions//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 89-105 [DOI: 10.1007/978-3-030-01252-6_6http://dx.doi.org/10.1007/978-3-030-01252-6_6]
Liu H Y, Jiang B, Song Y B, Huang W and Yang C. 2020. Rethinking image inpainting via a mutual encoder-decoder with feature equalizations//Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer: 725-741 [DOI: 10.1007/978-3-030-58536-5_43http://dx.doi.org/10.1007/978-3-030-58536-5_43]
Liu H Y, Jiang B, Xiao Y and Yang C. 2019. Coherent semantic attention for image inpainting//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Seoul, Korea (South): IEEE: 4169-4178 [DOI: 10.1109/ICCV.2019.00427http://dx.doi.org/10.1109/ICCV.2019.00427]
Liu K H, Wang X H, Xie Y T and Hu J Y. 2021. Edge-guided GAN: a depth image inpainting approach guided by edge information. Journal of Image and Graphics, 26(1): 186-197
刘坤华, 王雪辉, 谢玉婷, 胡坚耀. 2021. Edge-guided GAN: 边界信息引导的深度图像修复. 中国图象图形学报, 26(1): 186-197 [DOI: 10.11834/jig.200509http://dx.doi.org/10.11834/jig.200509]
Nazeri K, Ng E, Joseph T, Qureshi F and Ebrahimi M. 2019. EdgeConnect: structure guided image inpainting using edge prediction//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW). Seoul, Korea (South): IEEE: 3265-3274 [DOI: 10.1109/ICCVW.2019.00408http://dx.doi.org/10.1109/ICCVW.2019.00408]
Pathak D, Krähenbühl P, Donahue J, Darrell T and Efros A A. 2016. Context encoders: feature learning by inpainting//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA: IEEE: 2536-2544 [DOI: 10.1109/CVPR.2016.278http://dx.doi.org/10.1109/CVPR.2016.278]
Qiang Z P, He L B, Chen X and Xu D. 2019. Survey on deep learning image inpainting methods. Journal of Image and Graphics, 24(3): 447-463
强振平, 何丽波, 陈旭, 徐丹. 2019. 深度学习图像修复方法综述. 中国图象图形学报, 24(3): 447-463[DOI: 10.11834/jig.180408http://dx.doi.org/10.11834/jig.180408]
Sagong M C, Shin Y G, Kim S W, Park S and Ko S J. 2019. PEPSI: fast image inpainting with parallel decoding network//Proceddings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE: 11352-11360 [DOI: 10.1109/CVPR.2019.01162http://dx.doi.org/10.1109/CVPR.2019.01162]
Simonyan K and Zisserman A. 2015. Very deep convolutional networks for large-scale image recognition [EB/OL]. [2022-09-01]. https://arxiv.org/pdf/1409.1556.pdfhttps://arxiv.org/pdf/1409.1556.pdf
Wang D S, Xie C H, Liu S H, Niu Z X and Zuo W M. 2021. Image inpainting with edge-guided learnable bidirectional attention maps [EB/OL]. [2022-09-01]. https://arxiv.org/pdf/2104.12087.pdfhttps://arxiv.org/pdf/2104.12087.pdf
Xie S N, Girshick R, Dollár P, Tu Z W and He K M. 2017. Aggregated residual transformations for deep neural networks//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE: 5987-5995 [DOI: 10.1109/CVPR.2017.634http://dx.doi.org/10.1109/CVPR.2017.634]
Yang H J, Li L Q and Wang D. 2022. Deep learning image inpainting combining semantic segmentation reconstruction and edge reconstruction. Journal of Image and Graphics, 27(12): 3553-3565
杨红菊, 李丽琴, 王鼎. 2022. 联合语义分割与边缘重建的深度学习图像修复. 中国图象图形学报, 27(12): 3553-3565 [DOI: 10.11834/jig.210702http://dx.doi.org/10.11834/jig.210702]
Yu J H, Lin Z, Yang J M, Shen X H, Lu X and Huang T S. 2018. Generative image inpainting with contextual attention//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 5505-5514 [DOI: 10.1109/CVPR.2018.00577http://dx.doi.org/10.1109/CVPR.2018.00577]
Yu J H, Lin Z, Yang J M, Shen X H, Lu X and Huang T. 2019. Free-form image inpainting with gated convolution//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Seoul, Korea (South): IEEE: 4470-4479 [DOI: 10.1109/ICCV.2019.00457http://dx.doi.org/10.1109/ICCV.2019.00457]
Zeng Y H, Fu J L, Chao H Y and Guo B N. 2019. Learning pyramid-context encoder network for high-quality image inpainting//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach: IEEE: 1486-1494 [DOI: 10.1109/CVPR.2019.00158http://dx.doi.org/10.1109/CVPR.2019.00158]
Zeng Y H, Fu J L, Chao H Y and Guo B N. 2023. Aggregated contextual transformations for high-resolution image inpainting. IEEE Transactions on Visualization and Computer Graphics, 29(7): 3266-3280 [DOI: 10.1109/TVCG.2022.3156949http://dx.doi.org/10.1109/TVCG.2022.3156949]
Zhang G M and Li Y B. 2019. Image inpainting of fractional TV model combined with texture structure. Journal of Image and Graphics, 24(5): 700-713
张桂梅, 李艳兵. 2019. 结合纹理结构的分数阶TV模型的图像修复. 中国图象图形学报, 24(5): 700-713 [DOI: 10.11834/jig.180509http://dx.doi.org/10.11834/jig.180509]
Zhang H, Goodfellow I J, Metaxas D N and Odena A. 2019. Self-attention generative adversarial networks//Proceedings of the 36th International Conference on Machine Learning. Long Beach, USA: PMLR: 7354-7363
相关作者
相关机构