混合双注意力机制生成对抗网络的图像修复模型

兰治; 严彩萍; 李红; 郑雅丹

doi:10.11834/jig.220919

图像处理和编码 | 浏览量 : 0 下载量: 0 CSCD: 0

PDF
导出
分享
收藏
专辑

混合双注意力机制生成对抗网络的图像修复模型
HDA-GAN： hybrid dual attention generative adversarial network for image inpainting
2023年28卷第11期页码：3440-3452
纸质出版日期： 2023-11-16 ，
DOI： 10.11834/jig.220919
稿件说明：

移动端阅览

兰治，严彩萍，李红，郑雅丹. 2023. 混合双注意力机制生成对抗网络的图像修复模型. 中国图象图形学报， 28(11):3440-3452

Lan Zhi， Yan Caiping， Li Hong， Zheng Yadan. 2023. HDA-GAN： hybrid dual attention generative adversarial network for image inpainting. Journal of Image and Graphics， 28(11):3440-3452
兰治，严彩萍，李红，郑雅丹. 2023. 混合双注意力机制生成对抗网络的图像修复模型. 中国图象图形学报， 28(11):3440-3452 DOI： 10.11834/jig.220919.

Lan Zhi， Yan Caiping， Li Hong， Zheng Yadan. 2023. HDA-GAN： hybrid dual attention generative adversarial network for image inpainting. Journal of Image and Graphics， 28(11):3440-3452 DOI： 10.11834/jig.220919.

摘要

目的

图像修复是指用合理的内容来填补图像缺失或损坏的部分。尽管生成对抗网络（generative adversarial network，GAN）取得了巨大的进步，但当缺失区域很大时，现有的大多数方法仍然会产生扭曲的结构和模糊的纹理。其中一个主要原因是卷积操作的局域性，它不考虑全局或远距离结构信息，只是扩大了局部感受野。

方法

为了克服上述问题，提出了一种新的图像修复网络，即混合注意力生成对抗网络（hybrid dual attention generative adversarial network，HDA-GAN），它可以同时捕获全局结构信息和局部细节纹理。具体地，HDA-GAN 将两种级联的通道注意力传播模块和级联的自注意力传播模块集成到网络的不同层中。对于级联的通道注意力传播模块，将多个多尺度通道注意力块级联在网络的高层，用于学习从低级细节到高级语义的特征。对于级联的自注意力传播模块，将多个基于分块的自注意力块级联在网络的中低层，以便在保留更多的细节的同时捕获远程依赖关系。级联模块将多个相同的注意力块堆叠成不同的层，能够增强局部纹理传播到全局结构。

结果

本文采用客观评价指标：均方差（mean squared error，MSE）、峰值信噪比（peak signal-to-noise ratio，PSNR）和结构相似性指数（structural similarity index，SSIM）在Paris Street View数据集和CelebA-HQ（CelebA-high quality）数据集上进行了大量实验。定量比较中，HDA-GAN在Paris Street View数据集上相比于Edge-LBAM（edge-guided learnable bidirectional attention maps）方法，在掩码不同的比例上，PSNR提升了1.28 dB、1.13 dB、0.93 dB和0.80 dB，SSIM分别提升了5.2%、8.2%、10.6%和13.1%。同样地，在CelebA-HQ数据集上相比于AOT-GAN（aggregated contextual transformations generative adversarial network）方法，在掩码不同的比例上，MAE分别降低了2.2%、5.4%、11.1%、18.5%和28.1%，PSNR分别提升了0.93 dB、0.68 dB、0.73 dB、0.84 dB和0.74 dB。通过可视化实验可以明显观察到修复效果优于以上方法。

结论

本文提出的图像修复方法，充分发挥了深度学习模型进行特征学习和图像生成的优点，使得修复图像缺失或损坏的部分更加准确。

Abstract

Objective

Image inpainting has been extensively examined as a basic topic in the field of image processing over the past two decades. Image inpainting attempts to fill in the missing or corrupted parts of an image with satisfactory and reasonable content. Given their inability to generate semantically compliant images， traditional techniques can succeed in certain straightforward situations but fall short when the missing region is large or complex. Image inpainting methods based on deep learning and adversarial learning have produced increasingly promising results in recent years. However， most of these methods produce distorted structures and hazy textures when the missing region is large. One primary cause of this problem is that these methods do not consider global or long-range structural information due to the locality of vanilla convolution operations， even with dilated convolution that enlarges the local receptive field.

Method

To overcome this issue， this study proposes a novel image inpainting network called hybrid dual attention generative adversarial network （HDA-GAN）， which captures both global structural information and local detailed textures. Specifically， HDA-GAN integrates two types of cascaded attention propagation modules， namely， cascaded channel-attention propagation and cascaded self-attention propagation， into different convolutional layers of the generator network. For the cascaded channel-attention propagation module， several multi-scale channel-attention blocks are cascaded into shallow layers to learn features from low-level details to high-level semantics. The multi-scale channel-attention block adopts the split-attention-merge strategy and residual-gated operations to aggregate multiple channel attention correlations for enhancing high-level semantics while preserving low-level details. For the cascaded self-attention propagation module， several positional-separated self-attention blocks are stacked into middle and deep layers. These blocks also adopt the same split-attention-merge strategy and residual-gated operations as the multi-scale channel-attention blocks but with some changes. The purpose of this design is to use the positional-separated self-attention blocks to maintain the details while learning long-range semantic information interaction. The design of these blocks also further reduces the computational complexity compared with original self-attention.

Result

Numerous tests using the Paris Street View and CelebA-HQ datasets demonstrate that HDA-GAN can produce superior image inpainting in terms of quality and quantity with better restoration effects compared with several state-of-the-art algorithms. The Paris Street View dataset includes 15 000 street images of Paris， 14 900 training images， and 100 test images， while the CelebA-HQ dataset contains 30 000 high-quality human face images. The fine-grained texture synthesis of models may be evaluated using the high-frequency features of the hair and skin. Following a standard configuration， 28 000 images are used for training， and 2 000 are used for testing. In both training and testing， free-form masks are employed while adhering to the standard settings. Free-form masks are highly applicable to real-world settings and thus are used in many inpainting techniques. Following a standard setting， all images are resized to 512 × 512 pixels or 256 × 256 pixels for training and testing depending on the datasets. The mean squared error （MSE）， peak signal-to-noise ratio （PSNR）， and structural similarity index （SSIM） are introduced to evaluate the performance of different methods in filling holes with different hole-to-image region ratios. In the Paris Street View dataset， the PSNR of the proposed method increases by 1.28 dB， 1.13 dB， 0.93 dB， and 0.80 dB， while its SSIM increases by 5.2%， 8.2%， 10.6%， and 13.1% compared with the Edge-LBAM method as the hole-to-image region ratios increase. Meanwhile， in the CelebA-HQ dataset， the MSE value of the proposed method decreases by 2.2%， 5.4%， 11.1%， 18.5%， and 28.1%， while its PSNR increases by 0.93 dB， 0.68 dB， 0.73 dB， 0.84 dB， and 0.74 dB compared with the AOT-GAN method as the hole-to-image region ratios increase. These experimental results show that the proposed method quantitatively and qualitatively outperform the other algorithms.

Conclusion

This study proposes a novel hybrid attention generative adversarial network for image inpainting called HDA-GAN that can generate reasonable and satisfactory content for a distorted image by fusing two carefully designed attention propagation modules. Using the cascaded attention propagation module in skip-connect layers can significantly improve the global structure and local texture captured by the generator， which is crucial for inpainting， particularly when filling complex missing regions or large holes. The cascaded attention propagation modules will be applied to other vision tasks， such as image denoising， image translation， and single image super-resolution， in future work.

关键词

图像修复生成对抗网络（GAN）级联的通道注意力传播模块级联的自注意力传播模块大面积修复

Keywords

image inpaintinggenerative adversarial network （GAN）cascaded channel attention propagation modulecascaded self-attention propagation modulelarge area inpainting

references

Barnes C， Shechtman E， Finkelstein A and Goldman D B. 2009. PatchMatch： a randomized correspondence algorithm for structural image editing. ACM Transactions on Graphics， 28（3）： #24 ［DOI： 10.1145/1531326.1531330http://dx.doi.org/10.1145/1531326.1531330］

Doersch C， Singh S， Gupta A， Sivic J and Efros A A. 2012. What makes Paris look like Paris？ ACM Transactions on Graphics， 31（4）： #101 ［DOI： 10.1145/2185520.2185597http://dx.doi.org/10.1145/2185520.2185597］

Drori I， Cohen-Or D and Yeshurun H. 2003. Fragment-based image completion//Proceedings of 2003 ACM SIGGRAPH 2003. San Diego， USA： Association for Computing Machinery： 303-312 ［DOI： 10.1145/1201775.882267http://dx.doi.org/10.1145/1201775.882267］

Fu J， Liu J， Tian H J， Li Y， Bao Y J， Fang Z W and Lu H Q. 2019. Dual attention network for scene segmentation//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach， USA： IEEE： 3141-3149 ［DOI： 10.1109/CVPR.2019.00326http://dx.doi.org/10.1109/CVPR.2019.00326］

Gatys L A， Ecker A S and Bethge M. 2016. Image style transfer using convolutional neural networks//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition （CVPR）. Las Vegas， USA： IEEE： 2414-2423 ［DOI： 10.1109/CVPR.2016.265http://dx.doi.org/10.1109/CVPR.2016.265］

Iizuka S， Simo-Serra E and Ishikawa H. 2017. Globally and locally consistent image completion. ACM Transactions on Graphics， 36（4）： #107 ［DOI： 10.1145/3072959.3073659http://dx.doi.org/10.1145/3072959.3073659］

Johnson J， Alahi A and Li F F. 2016. Perceptual losses for real-time style transfer and super-resolution//Proceedings of the 14th European Conference on Computer Vision. Amsterdam， the Netherlands： Springer： 694-711 ［DOI： 10.1007/978-3-319-46475-6_43http://dx.doi.org/10.1007/978-3-319-46475-6_43］

Karras T， Aila T， Laine S and Lehtinen J. 2018. Progressive growing of GANs for improved quality， stability， and variation//Proceedings of the 6th International Conference on Learning Representations. Vancouver， Canada： OpenReview.net

Komodakis N and Tziritas G. 2007. Image completion using efficient belief propagation via priority scheduling and dynamic pruning. IEEE Transactions on Image Processing， 16（11）： 2649-2661 ［DOI： 10.1109/TIP.2007.906269http://dx.doi.org/10.1109/TIP.2007.906269］

Lin G S， Milan A， Shen C H and Reid I. 2017. RefineNet： multi-path refinement networks for high-resolution semantic segmentation//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition （CVPR）. Honolulu， USA： IEEE： 5168-5177 ［DOI： 10.1109/CVPR.2017.549http://dx.doi.org/10.1109/CVPR.2017.549］

Liu G L， Reda F A， Shih K J， Wang T C， Tao A and Catanzaro B. 2018. Image inpainting for irregular holes using partial convolutions//Proceedings of the 15th European Conference on Computer Vision. Munich， Germany： Springer： 89-105 ［DOI： 10.1007/978-3-030-01252-6_6http://dx.doi.org/10.1007/978-3-030-01252-6_6］

Liu H Y， Jiang B， Song Y B， Huang W and Yang C. 2020. Rethinking image inpainting via a mutual encoder-decoder with feature equalizations//Proceedings of the 16th European Conference on Computer Vision. Glasgow， UK： Springer： 725-741 ［DOI： 10.1007/978-3-030-58536-5_43http://dx.doi.org/10.1007/978-3-030-58536-5_43］

Liu H Y， Jiang B， Xiao Y and Yang C. 2019. Coherent semantic attention for image inpainting//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision （ICCV）. Seoul， Korea （South）： IEEE： 4169-4178 ［DOI： 10.1109/ICCV.2019.00427http://dx.doi.org/10.1109/ICCV.2019.00427］

Liu K H， Wang X H， Xie Y T and Hu J Y. 2021. Edge-guided GAN： a depth image inpainting approach guided by edge information. Journal of Image and Graphics， 26（1）： 186-197

刘坤华，王雪辉，谢玉婷，胡坚耀. 2021. Edge-guided GAN：边界信息引导的深度图像修复. 中国图象图形学报， 26（1）： 186-197 ［DOI： 10.11834/jig.200509http://dx.doi.org/10.11834/jig.200509］

Nazeri K， Ng E， Joseph T， Qureshi F and Ebrahimi M. 2019. EdgeConnect： structure guided image inpainting using edge prediction//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision Workshop （ICCVW）. Seoul， Korea （South）： IEEE： 3265-3274 ［DOI： 10.1109/ICCVW.2019.00408http://dx.doi.org/10.1109/ICCVW.2019.00408］

Pathak D， Krähenbühl P， Donahue J， Darrell T and Efros A A. 2016. Context encoders： feature learning by inpainting//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition （CVPR）. Las Vegas， USA： IEEE： 2536-2544 ［DOI： 10.1109/CVPR.2016.278http://dx.doi.org/10.1109/CVPR.2016.278］

Qiang Z P， He L B， Chen X and Xu D. 2019. Survey on deep learning image inpainting methods. Journal of Image and Graphics， 24（3）： 447-463

强振平，何丽波，陈旭，徐丹. 2019. 深度学习图像修复方法综述. 中国图象图形学报， 24（3）： 447-463［DOI： 10.11834/jig.180408http://dx.doi.org/10.11834/jig.180408］

Sagong M C， Shin Y G， Kim S W， Park S and Ko S J. 2019. PEPSI： fast image inpainting with parallel decoding network//Proceddings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Long Beach， USA： IEEE： 11352-11360 ［DOI： 10.1109/CVPR.2019.01162http://dx.doi.org/10.1109/CVPR.2019.01162］

Simonyan K and Zisserman A. 2015. Very deep convolutional networks for large-scale image recognition ［EB/OL］. ［2022-09-01］. https://arxiv.org/pdf/1409.1556.pdfhttps://arxiv.org/pdf/1409.1556.pdf

Wang D S， Xie C H， Liu S H， Niu Z X and Zuo W M. 2021. Image inpainting with edge-guided learnable bidirectional attention maps ［EB/OL］. ［2022-09-01］. https://arxiv.org/pdf/2104.12087.pdfhttps://arxiv.org/pdf/2104.12087.pdf

Xie S N， Girshick R， Dollár P， Tu Z W and He K M. 2017. Aggregated residual transformations for deep neural networks//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition （CVPR）. Honolulu， USA： IEEE： 5987-5995 ［DOI： 10.1109/CVPR.2017.634http://dx.doi.org/10.1109/CVPR.2017.634］

Yang H J， Li L Q and Wang D. 2022. Deep learning image inpainting combining semantic segmentation reconstruction and edge reconstruction. Journal of Image and Graphics， 27（12）： 3553-3565

杨红菊，李丽琴，王鼎. 2022. 联合语义分割与边缘重建的深度学习图像修复. 中国图象图形学报， 27（12）： 3553-3565 ［DOI： 10.11834/jig.210702http://dx.doi.org/10.11834/jig.210702］

Yu J H， Lin Z， Yang J M， Shen X H， Lu X and Huang T S. 2018. Generative image inpainting with contextual attention//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City， USA： IEEE： 5505-5514 ［DOI： 10.1109/CVPR.2018.00577http://dx.doi.org/10.1109/CVPR.2018.00577］

Yu J H， Lin Z， Yang J M， Shen X H， Lu X and Huang T. 2019. Free-form image inpainting with gated convolution//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision （ICCV）. Seoul， Korea （South）： IEEE： 4470-4479 ［DOI： 10.1109/ICCV.2019.00457http://dx.doi.org/10.1109/ICCV.2019.00457］

Zeng Y H， Fu J L， Chao H Y and Guo B N. 2019. Learning pyramid-context encoder network for high-quality image inpainting//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Long Beach： IEEE： 1486-1494 ［DOI： 10.1109/CVPR.2019.00158http://dx.doi.org/10.1109/CVPR.2019.00158］

Zeng Y H， Fu J L， Chao H Y and Guo B N. 2023. Aggregated contextual transformations for high-resolution image inpainting. IEEE Transactions on Visualization and Computer Graphics， 29（7）： 3266-3280 ［DOI： 10.1109/TVCG.2022.3156949http://dx.doi.org/10.1109/TVCG.2022.3156949］

Zhang G M and Li Y B. 2019. Image inpainting of fractional TV model combined with texture structure. Journal of Image and Graphics， 24（5）： 700-713

张桂梅，李艳兵. 2019. 结合纹理结构的分数阶TV模型的图像修复. 中国图象图形学报， 24（5）： 700-713 ［DOI： 10.11834/jig.180509http://dx.doi.org/10.11834/jig.180509］

Zhang H， Goodfellow I J， Metaxas D N and Odena A. 2019. Self-attention generative adversarial networks//Proceedings of the 36th International Conference on Machine Learning. Long Beach， USA： PMLR： 7354-7363

文章被引用时，请邮件提醒。

提交

双判别器深度残差GAN高光谱图像融合

面向人脸修复篡改检测的大规模数据集

图像复原中自注意力和卷积的动态关联学习

LLFlowGAN：以生成对抗方式约束可逆流的低照度图像增强