Edge-guided GAN：边界信息引导的深度图像修复

刘坤华; 王雪辉; 谢玉婷; 胡坚耀

doi:10.11834/jig.200509

自动驾驶场景感知与仿真 | 浏览量 : 0 下载量: 0 CSCD: 2

PDF
导出
分享
收藏
专辑

Edge-guided GAN：边界信息引导的深度图像修复
Edge-guided GAN: a depth image inpainting approach guided by edge information
2021年26卷第1期页码：186-197
纸质出版日期： 2021-01-16 ，

录用日期： 2020-10-26
DOI： 10.11834/jig.200509
稿件说明：

移动端阅览

刘坤华, 王雪辉, 谢玉婷, 胡坚耀. Edge-guided GAN：边界信息引导的深度图像修复[J]. 中国图象图形学报, 2021,26(1):186-197.

Kunhua Liu, Xuehui Wang, Yuting Xie, Jianyao Hu. Edge-guided GAN: a depth image inpainting approach guided by edge information[J]. Journal of Image and Graphics, 2021,26(1):186-197.
刘坤华, 王雪辉, 谢玉婷, 胡坚耀. Edge-guided GAN：边界信息引导的深度图像修复[J]. 中国图象图形学报, 2021,26(1):186-197. DOI： 10.11834/jig.200509.

Kunhua Liu, Xuehui Wang, Yuting Xie, Jianyao Hu. Edge-guided GAN: a depth image inpainting approach guided by edge information[J]. Journal of Image and Graphics, 2021,26(1):186-197. DOI： 10.11834/jig.200509.

摘要

目的

目前大多数深度图像修复方法可分为两类：色彩图像引导的方法和单个深度图像修复方法。色彩图像引导的方法利用色彩图像真值，或其上一帧、下一帧提供的信息来修复深度图像。若缺少相应信息，这类方法是无效的。单个深度图像修复方法可以修复数据缺失较少的深度图像。但是，无法修复带有孔洞（数据缺失较大）的深度图像。为解决以上问题，本文将生成对抗网络（generative adversarial network，GAN）应用于深度图像修复领域，提出了一种基于GAN的单个深度图像修复方法，即Edge-guided GAN。

方法

首先，通过Canny算法获得待修复深度图像的边界图像，并将此两个单通道图像（待修复深度图像和边界图像）合并成一个2通道数据；其次，设计Edge-guided GAN高性能的生成器、判别器和损失函数，将此2通道数据作为生成器的输入，训练生成器，以生成器生成的深度图像（假值）和深度图像真值为判别器的输入，训练判别器；最终得到深度图像修复模型，完成深度图像修复。

结果

在Apollo scape数据集上与其他4种常用的GAN、不带边界信息的Edge-guided GAN进行实验分析。在输入尺寸为256×256像素，掩膜尺寸为32×32像素情况下，Edge-guided GAN的峰值信噪比（peak signal-to-noise ratio，PSN）比性能第2的模型提高了15.76%；在掩膜尺寸为64×64像素情况下，Edge-guided GAN的PSNR比性能第2的模型提高了18.64%。

结论

Edge-guided GAN以待修复深度图像的边界信息为其修复的约束条件，有效地提取了待修复深度图像特征，大幅度地提高了深度图像修复的精度。

Abstract

Objective

Depth images play an important role in robotics

3D reconstruction

and autonomous driving. However

depth sensors

such as Microsoft Kinect and Intel RealSense

produce depth images with missing data. In some fields

such as those using high-dimension maps for autonomous driving (including RGB images and depth images)

objects not belonging to these maps (people

cars

etc.) should be removed. The corresponding areas are blank (i.e.

missing data) after removing objects from the depth image. Therefore

depth images with missing data should be repaired to accomplish some 3D tasks. Depth image inpainting approaches can be divided into two groups: image-guided depth image inpainting and single-depth image inpainting approaches. Image-guided depth image inpainting approaches repair depth images through information on the ground truth of its color images or its previous frames or its next frames. Without this information

these approaches are useless. Single-depth image inpainting approaches cannot repair images without any information from other color images. Currently

only a few studies have tackled this issue by using and improving depth low-rank components in depth images. Current single-depth image inpainting methods only repair depth images with sparse missing data rather than small or large holes. Generative adversarial network (GAN)-based approaches have been widely researched for RGB image inpainting and have achieved state-of-the-art (SOTA) results. However

to the best of our knowledge

no GAN-based approach is reported for depth image inpainting. The reasons are as follows. On the one hand

the depth image records the distance between different objects and lacks texture information. Some researchers have expressed concerns about whether convolutional neural networks (CNNs) can extract depth image features well due to this characteristic. On the other hand

no public depth image datasets are available for CNN-based approaches to train. For the first reason

CNNs have been verified that they can extract features of depth images. For the second reason

the Baidu company released the Apollo scape dataset in 2018 that contains 43 592 depth ground truth images. These images are sufficient to explore the GAN-based approach for depth image inpainting. Therefore

we explore a single-depth image inpainting approach.

Method

In this paper

we provided a GAN called edge-guided GAN for depth image inpainting. We first obtained the edge image of the deficient depth image by using the Canny algorithm and then combined the deficient depth image and its edge image into two-channel data. These data are used as inputs to the edge-guided GAN

and the output is the repaired depth image. The edge image presents the edge information of a deficient depth image that guides inpainting. The edge-guided GAN contains a generator and a discriminator. The generator is an encoder-decoder architecture and is designed for depth image inpainting. This generator first uses two asymmetric convolutional network(ACNet) layers and six residual block layers to extract depth image features and then utilizes two convolution transpose layers to generate a repaired depth image. ACNet can be trained to achieve a better performance than standard square-kernel convolutional layers but only uses less GPU memory. The discriminator uses repaired depth images or the ground truth of depth images as inputs and predicts whether the inputs are true or fake depth images. The architecture of the discriminator is similar to that of PatchGAN and contains five stand convolution layers. The loss functions of generator and a discriminator are designed. The input of the discriminator includes the ground truth of the depth image and the depth image generated by the generator. The discriminator loss can be separated into two categories. When the inputs are ground truth

the discriminator loss is the binary cross entropy (BCE) loss of its results with one. When the inputs are the generated depth image

the discriminator loss is the BCE loss of its results with zero. Therefore

the total loss of the discriminator is the average of the sum of the above two losses. The loss function of generator is the average of L1 loss between the pixels of deficient depth image and the pixels of depth image after inpainting. The optimization goal of edge-guided GAN is to minimize the generator loss and maximize the discriminator loss.

Result

We trained four commonly used methods and edge-guided GAN without edge information for comparison to verify the performance of our edge-guided GAN. When the size of input is 256×256 pixels and the size of mask is 32×32 pixels

the peak signal to noise ratio of edge-guided GAN is 35.250 8. Compared with the second performance method

the peak signal-to-noise ratio (higher is better) increases by 15.76%. When the size of mask is 64×64

the peak signal-to-noise ratio of edge-guided GAN is 29.157 3. Compared with the second method

the peak signal-to-noise ratio increases by 18.64%. The peak signal-to-noise ratios of all methods with 32×32 masks are higher than the corresponding methods with 64×64 masks. We conducted an experiment to verify the performance of edge-guided GAN on object removal. In this experiment

the objects that need to be removed were set as mask

and the edge-guided GAN achieved SOTA results.

Conclusion

The proposed edge-guided GAN is a single-depth image inpainting approach with high accuracy. This method takes the edge information of deficient depth image as the constraint condition

and its architecture and loss functions can effectively extract the features of deficient depth image.

关键词

生成对抗网络深度图像修复方法Edge-guided GAN边界信息Apollo scape数据集

Keywords

generative adversarial network(GAN)depth image inpainting approachesEdge-guided GANedge informationthe Apollo scape dataset

references

Aleotti F, Zaccaroni G, Bartolomei L, Poggi M, Tosi F and Mattoccia S. 2020. Real-time single image depth perception in the wild with handheld devices[EB/OL].[2020-06-10].https://arxiv.org/pdf/2006.05724.pdfhttps://arxiv.org/pdf/2006.05724.pdf

Candès E J and Recht B. 2009. Exact matrix completion via convex optimization. Foundations of Computational mathematics, 9(6): 717-772[DOI:10.1007/s10208-009-9045-5http://dx.doi.org/10.1007/s10208-009-9045-5]

Canny J. 1986. A computational approach to edge detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 8(6): 679-698[DOI:10.1109/TPAMI.1986.4767851http://dx.doi.org/10.1109/TPAMI.1986.4767851]

Chen W H, Yue H S, Wang J H and Wu X M. 2014. An improved edge detection algorithm for depth map inpainting. Optics and Lasers in Engineering, 55: 69-77[DOI:10.1016/j.optlaseng.2013.10.025http://dx.doi.org/10.1016/j.optlaseng.2013.10.025]

Ding X H, Guo Y C, Ding G G and Han J G. 2019. AcNet: strengthening the kernel skeletons for powerful CNN via asymmetric convolution blocks[EB/OL].[2020-08-08].https://arxiv.org/pdf/1908.03930v3.pdfhttps://arxiv.org/pdf/1908.03930v3.pdf

Goodfellow I J, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A and Bengio Y. 2014. Generative adversarial nets[EB/OL].[2020-08-08].https://arxiv.org/pdf/1406.2661.pdfhttps://arxiv.org/pdf/1406.2661.pdf

Grinvald M, Furrer F, Novkovic T, Chung J J, Cadena C, Siegwart R and Nieto J. 2019. Volumetric instance-aware semantic mapping and 3D object discovery. IEEE Robotics and Automation Letters, 4(3):3037-3044[DOI:10.1109/LRA.2019.2923960]

Han X, Zhang L H, ZhouK and Wang X N. 2019a. ProGAN:protein solubility generative adversarial nets for data augmentation in DNN framework. Computers and Chemical Engineering, 131:#106533[DOI:10.1016/j.compchemeng.2019.106533]

Han X G, Zhang Z X, Du D, Yang M D, Yu J M, Pan P, Yang X, Liu L G, Xiong Z X and Cui S G. 2019b. Deep reinforcement learning of volume-guided progressive view inpainting for 3D point scene completion from a single depth image[EB/OL].[2020-08-08].https://arxiv.org/pdf/1903.04019.pdfhttps://arxiv.org/pdf/1903.04019.pdf

Herrera C D, Kannala J, LadickýL and HeikkiläJ. 2013. Depth map inpainting under a second-order smoothness prior//Proceedings of the 18th Scandinavian Conference on Image Analysis. Espoo, Finland: Springer: 555-566[DOI:10.1007/978-3-642-38886-6_52http://dx.doi.org/10.1007/978-3-642-38886-6_52]

Huang X Y, Cheng X J, Geng Q C, Cao B B, Zhou D F, Wang P, Lin Y Q and Yang R G. 2018. The apolloscape dataset for autonomous driving//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Salt Lake City, USA: IEEE: 954-960[DOI:10.1109/CVPRW.2018.00141http://dx.doi.org/10.1109/CVPRW.2018.00141]

Huang Z Y, Lv C, Xing Y and Wu J D. 2020. Multi-modal sensor fusion-based deep neural network for end-to-end autonomous driving with scene understanding[EB/OL].[2020-08-08].https://arxiv.org/pdf/2005.09202.pdfhttps://arxiv.org/pdf/2005.09202.pdf

Iizuka S, Simo-Serra E and Ishikawa H. 2017. Globally and locally consistent image completion. ACM Transactions on Graphics, 36(4):#107[DOI:10.1145/3072959.3073659]

Isola P, Zhu J Y, Zhou T H and Efros A A. 2017. Image-to-image translation with conditional adversarial networks//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 5967-5976[DOI:10.1109/CVPR.2017.632http://dx.doi.org/10.1109/CVPR.2017.632]

Kingma D P and Ba J. 2017. Adam: a method for stochastic optimization[EB/OL].[2020-08-08].https://arxiv.org/pdf/1412.6980.pdfhttps://arxiv.org/pdf/1412.6980.pdf

Li J W, Monroe W, Shi T L, Jean S, Ritter A and Jurafsky D. 2017. Adversarial learning for neural dialogue generation[EB/OL].[2020-08-08].https://arxiv.org/pdf/1701.06547.pdfhttps://arxiv.org/pdf/1701.06547.pdf

Liu G L, Reda F A, Shih K J, Wang T C, Tao A and Catanzaro B. 2018. Image inpainting for irregular holes using partial convolutions//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 89-105[DOI:10.1007/978-3-030-01252-6_6http://dx.doi.org/10.1007/978-3-030-01252-6_6]

Liu H Y, Jiang B, Xiao Y and Yang C. 2019. Coherent semantic attention for image inpainting//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, South Korea: IEEE: 4169-4178[DOI:10.1109/ICCV.2019.00427http://dx.doi.org/10.1109/ICCV.2019.00427]

Liu J Y, Gong X J and Liu J L. 2012. Guided inpainting and filtering for kinect depth maps//Proceedings of the 21st International Conference on Pattern Recognition. Tsukuba, Japan: IEEE: 2055-2058

Maas A L, Hannun A Y and Ng A Y. 2013. Rectifier nonlinearities improve neural network acoustic models//Proceedings of the 30th International Conference on Machine Learning. Atlanta, USA: JMLR: #3

Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D and Riedmiller M. 2013. Playing atari with deep reinforcement learning[EB/OL].[2020-08-08].https://arxiv.org/pdf/1312.5602.pdfhttps://arxiv.org/pdf/1312.5602.pdf

Nair V and Hinton G E. 2010. Rectified linear units improve restricted Boltzmann machines//Proceedings of the 27th International Conference on International Conference on Machine. Haifa, Israel: ICML: 807-814[DOI:10.5555/3104322.3104425http://dx.doi.org/10.5555/3104322.3104425]

Nazeri K, Ng E, Joseph T, Qureshi F Z and Ebrahimi M. 2019. Edgeconnect: generative image inpainting with adversarial edge learning[EB/OL].[2020-08-08].https://arxiv.org/pdf/1901.00212.pdfhttps://arxiv.org/pdf/1901.00212.pdf

Pan J T, Ferrer C C, McGuinness K, O'Connor N E, Torres J, Sayrol E and Giro-i-Nieto X. 2018. SalGAN: visual saliency prediction with generative adversarial networks[EB/OL].[2020-08-08].https://arxiv.org/pdf/1701.01081.pdfhttps://arxiv.org/pdf/1701.01081.pdf

Pathak D, Krähenbuhl P, Donahue J, Darrell T and Efros A A. 2016. Context encoders: feature learning by inpainting[EB/OL].[2020-08-08].https://arxiv.org/pdf/1604.07379.pdfhttps://arxiv.org/pdf/1604.07379.pdf

Shi F, Cheng J, Wang L, Yap P T and Shen D G. 2013. Low-rank total variation for image super-resolution//Proceedings of the 16th International Conference on Medical Image Computing and Computer-Assisted Intervention. Nagoya, Japan: Springer: 155-162[DOI:10.1007/978-3-642-40811-3_20http://dx.doi.org/10.1007/978-3-642-40811-3_20]

Tateno K, Tombari F, Laina I and Navab N. 2017. CNN-SLAM: real-time dense monocular slam with learned depth prediction[EB/OL].[2020-08-08].https://arxiv.org/pdf/1704.03489.pdfhttps://arxiv.org/pdf/1704.03489.pdf

Telea A. 2004. An image inpainting technique based on the fast marching method. Journal of Graphics Tools, 9(1):23-34[DOI:10.1080/10867651.2004.10487596]

Tian M, Nie Q and Shen H. 2020. 3D scene geometry-aware constraint for camera localization with deep learning[EB/OL].[2020-08-08].https://arxiv.org/pdf/2005.06147.pdfhttps://arxiv.org/pdf/2005.06147.pdf

Wang B. 2019. Research on Robotic Grasping Detecion Based on Depth Image and Deep Learning. Hangzhou: Zhejiang University

王斌. 2019.基于深度图像和深度学习的机器人抓取检测算法研究.杭州: 浙江大学

Wang N, Li J Y, Zhang L F and Du B. 2019. MUSICAL: multi-scale image contextual attention learning for inpainting//Proceedings of the 28th International Joint Conference on Artificial Intelligence. Macao, China: IJCAI: 3748-3754[DOI:10.24963/ijcai.2019/520http://dx.doi.org/10.24963/ijcai.2019/520]

Wang T C, Liu M Y, Zhu J Y, Tao A, Kautz J and Catanzaro B. 2018. High-resolution image synthesis and semantic manipulation with conditional GANs[EB/OL].[2020-08-08].https://arxiv.org/pdf/1711.11585.pdfhttps://arxiv.org/pdf/1711.11585.pdf

Xu T, Zhang P C, Huang Q Y, Zhang H, Gan Z, Huang X L and He X D. 2017. AttnGAN: fine-grained text to image generation with attentional generative adversarial networks[EB/OL].[2020-08-08].https://arxiv.org/pdf/1711.10485.pdfhttps://arxiv.org/pdf/1711.10485.pdf

Xue H Y, Zhang S M and Cai D. 2017. Depth image inpainting:improving low rank matrix completion with low gradient regularization. IEEE Transactions on Image Processing, 26(9):4311-4320[DOI:10.1109/TIP.2017.2718183]

Yan Z Y, Li X M, Li M, Zuo W M and Shan S G. 2018. Shift-net: image inpainting via deep feature rearrangement//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 3-19[DOI:10.1007/978-3-030-01264-9_1http://dx.doi.org/10.1007/978-3-030-01264-9_1]

Yang Z, Chen W, Wang F and Xu B. 2017. Improving neural machine translation with conditional sequence generative adversarial nets[EB/OL].[2020-08-08].https://arxiv.org/pdf/1703.04887.pdfhttps://arxiv.org/pdf/1703.04887.pdf

Yin J W. 2018. The Research of Robot Simultaneous Localization and Mapping Based on RGBD Camera. Chongqing:Chongqing University of Posts and Telecommunications

殷剑文. 2018.基于RGBD的机器人同时定位与制图研究.重庆:重庆邮电大学[DOI:10.27675/d.cnki.gcydx.2018.000695

Yu J H, Lin Z, Yang J M, Shen X H, Lu X and Huang T S. 2018. Generative image inpainting with contextual attention[EB/OL].[2020-08-08].https://arxiv.org/pdf/1801.07892.pdfhttps://arxiv.org/pdf/1801.07892.pdf

Yu L T, Zhang W N, Wang J and Yu Y. 2017. SeqGAN: sequence generative adversarial nets with policy gradient[EB/OL].[2020-08-08].https://arxiv.org/pdf/1609.05473.pdfhttps://arxiv.org/pdf/1609.05473.pdf

Zhang H T, Yu J and Wang Z F. 2018. Probability contour guided depth map inpainting and superresolution using non-local total generalized variation. Multimedia Tools and Applications, 77(7):9003-9020[DOI:10.1007/s11042-017-4791-x]

Zhang M L. 2018. RGB-D Slam Algorithm of Indoor Mobile Robot. Harbin:Harbin Institute of Technology

张米令. 2018.室内移动机器人RGB-D SLAM算法研究.哈尔滨:哈尔滨工业大学

Zhu X Y, Liu Y F, Li J H, Wan T and Qin Z C. 2018. Emotion classification with data augmentation using generative adversarial networks//Proceedings of the 22nd Pacific-Asia Conference on Knowledge Discovery and Data Mining. Melbourne, Australia: Springer: 349-360[DOI:10.1007/978-3-319-93040-4_28http://dx.doi.org/10.1007/978-3-319-93040-4_28]

文章被引用时，请邮件提醒。

提交

域自适应城市场景语义分割

面向类内差距表情的深度学习识别

融合感知损失的生成式对抗超分辨率算法