Current Issue Cover
Edge-guided GAN:边界信息引导的深度图像修复

刘坤华1, 王雪辉1, 谢玉婷1, 胡坚耀2(1.中山大学数据科学与计算机学院无人系统研究所, 广州 510006;2.工业和信息化部电子第五研究所, 广州 510610)

摘 要
目的 目前大多数深度图像修复方法可分为两类:色彩图像引导的方法和单个深度图像修复方法。色彩图像引导的方法利用色彩图像真值,或其上一帧、下一帧提供的信息来修复深度图像。若缺少相应信息,这类方法是无效的。单个深度图像修复方法可以修复数据缺失较少的深度图像。但是,无法修复带有孔洞(数据缺失较大)的深度图像。为解决以上问题,本文将生成对抗网络(generative adversarial network,GAN)应用于深度图像修复领域,提出了一种基于GAN的单个深度图像修复方法,即Edge-guided GAN。方法 首先,通过Canny算法获得待修复深度图像的边界图像,并将此两个单通道图像(待修复深度图像和边界图像)合并成一个2通道数据;其次,设计Edge-guided GAN高性能的生成器、判别器和损失函数,将此2通道数据作为生成器的输入,训练生成器,以生成器生成的深度图像(假值)和深度图像真值为判别器的输入,训练判别器;最终得到深度图像修复模型,完成深度图像修复。结果 在Apollo scape数据集上与其他4种常用的GAN、不带边界信息的Edge-guided GAN进行实验分析。在输入尺寸为256×256像素,掩膜尺寸为32×32像素情况下,Edge-guided GAN的峰值信噪比(peak signal-to-noise ratio,PSN)比性能第2的模型提高了15.76%;在掩膜尺寸为64×64像素情况下,Edge-guided GAN的PSNR比性能第2的模型提高了18.64%。结论 Edge-guided GAN以待修复深度图像的边界信息为其修复的约束条件,有效地提取了待修复深度图像特征,大幅度地提高了深度图像修复的精度。
关键词
Edge-guided GAN: a depth image inpainting approach guided by edge information

Liu Kunhua1, Wang Xuehui1, Xie Yuting1, Hu Jianyao2(1.Institute of Unmanned Systems, School of Data and Computer Science, Sun Yat-sen Univercity, Guangzhou 510006, China;2.The Fifth Electronics Research Institute of Ministry of Industry and Information Technology, Guangzhou 510610, China)

Abstract
Objective Depth images play an important role in robotics, 3D reconstruction, and autonomous driving. However, depth sensors, such as Microsoft Kinect and Intel RealSense, produce depth images with missing data. In some fields, such as those using high-dimension maps for autonomous driving (including RGB images and depth images), objects not belonging to these maps (people, cars, etc.) should be removed. The corresponding areas are blank (i.e., missing data) after removing objects from the depth image. Therefore, depth images with missing data should be repaired to accomplish some 3D tasks. Depth image inpainting approaches can be divided into two groups: image-guided depth image inpainting and single-depth image inpainting approaches. Image-guided depth image inpainting approaches repair depth images through information on the ground truth of its color images or its previous frames or its next frames. Without this information, these approaches are useless. Single-depth image inpainting approaches cannot repair images without any information from other color images. Currently, only a few studies have tackled this issue by using and improving depth low-rank components in depth images. Current single-depth image inpainting methods only repair depth images with sparse missing data rather than small or large holes. Generative adversarial network (GAN)-based approaches have been widely researched for RGB image inpainting and have achieved state-of-the-art (SOTA) results. However, to the best of our knowledge, no GAN-based approach is reported for depth image inpainting. The reasons are as follows. On the one hand, the depth image records the distance between different objects and lacks texture information. Some researchers have expressed concerns about whether convolutional neural networks (CNNs) can extract depth image features well due to this characteristic. On the other hand, no public depth image datasets are available for CNN-based approaches to train. For the first reason, CNNs have been verified that they can extract features of depth images. For the second reason, the Baidu company released the Apollo scape dataset in 2018 that contains 43 592 depth ground truth images. These images are sufficient to explore the GAN-based approach for depth image inpainting. Therefore, we explore a single-depth image inpainting approach. Method In this paper, we provided a GAN called edge-guided GAN for depth image inpainting. We first obtained the edge image of the deficient depth image by using the Canny algorithm and then combined the deficient depth image and its edge image into two-channel data. These data are used as inputs to the edge-guided GAN, and the output is the repaired depth image. The edge image presents the edge information of a deficient depth image that guides inpainting. The edge-guided GAN contains a generator and a discriminator. The generator is an encoder-decoder architecture and is designed for depth image inpainting. This generator first uses two asymmetric convolutional network(ACNet) layers and six residual block layers to extract depth image features and then utilizes two convolution transpose layers to generate a repaired depth image. ACNet can be trained to achieve a better performance than standard square-kernel convolutional layers but only uses less GPU memory. The discriminator uses repaired depth images or the ground truth of depth images as inputs and predicts whether the inputs are true or fake depth images. The architecture of the discriminator is similar to that of PatchGAN and contains five stand convolution layers. The loss functions of generator and a discriminator are designed. The input of the discriminator includes the ground truth of the depth image and the depth image generated by the generator. The discriminator loss can be separated into two categories. When the inputs are ground truth, the discriminator loss is the binary cross entropy (BCE) loss of its results with one. When the inputs are the generated depth image, the discriminator loss is the BCE loss of its results with zero. Therefore, the total loss of the discriminator is the average of the sum of the above two losses. The loss function of generator is the average of L1 loss between the pixels of deficient depth image and the pixels of depth image after inpainting. The optimization goal of edge-guided GAN is to minimize the generator loss and maximize the discriminator loss. Result We trained four commonly used methods and edge-guided GAN without edge information for comparison to verify the performance of our edge-guided GAN. When the size of input is 256×256 pixels and the size of mask is 32×32 pixels, the peak signal to noise ratio of edge-guided GAN is 35.250 8. Compared with the second performance method, the peak signal-to-noise ratio (higher is better) increases by 15.76%. When the size of mask is 64×64, the peak signal-to-noise ratio of edge-guided GAN is 29.157 3. Compared with the second method, the peak signal-to-noise ratio increases by 18.64%. The peak signal-to-noise ratios of all methods with 32×32 masks are higher than the corresponding methods with 64×64 masks. We conducted an experiment to verify the performance of edge-guided GAN on object removal. In this experiment, the objects that need to be removed were set as mask, and the edge-guided GAN achieved SOTA results. Conclusion The proposed edge-guided GAN is a single-depth image inpainting approach with high accuracy. This method takes the edge information of deficient depth image as the constraint condition, and its architecture and loss functions can effectively extract the features of deficient depth image.
Keywords

订阅号|日报