发布时间: 2019-07-16
摘要点击次数:
全文下载次数:
DOI: 10.11834/jig.180620
2019 | Volume 24 | Number 7

图像理解和计算机视觉

RGB-D结构相似性度量下的多边自适应深度图像超分辨率重建

李青松, 张旭东, 张骏, 高欣健, 高隽

合肥工业大学计算机与信息学院, 合肥 230601

收稿日期: 2018-11-07; 修回日期: 2019-01-22

基金项目: 国家自然科学基金项目（61876057，61403116，61806066）

第一作者简介: 李青松, 1991年生, 男, 硕士研究生, 主要研究方向为图像处理、机器视觉。E-mail:qingslee@126.com;
张骏, 女, 副研究员, 主要研究方向为计算机视觉和机器学习。E-mail:zhangjun@hfut.edu.cn;
高欣健, 男, 讲师, 主要研究方向为机器学习、人工智能。E-mail:gaoxinjian@hfut.edu.cn;
高隽, 男, 教授, 主要研究方向为智能信息处理。E-mail:gaojun@hfut.edu.cn.

中图法分类号: TP391

文献标识码: A

文章编号: 1006-8961(2019)07-1160-16

摘要

目的深度相机能够对场景的深度信息进行实时动态捕捉，但捕获的深度图像分辨率低且容易形成空洞。利用高分辨率彩色图像作为引导，是深度图超分辨率重建的重要方式。现有方法对彩色边缘与深度不连续区域的不一致性问题难以有效解决，在深度图超分辨率重建中引入了纹理复制伪影。针对这一问题，本文提出了一种鲁棒的彩色图像引导的深度图超分辨率重建算法。方法首先，利用彩色图像边缘与深度图像边缘的结构相关性，提出RGB-D结构相似性度量，检测彩色图像与深度图像共有的边缘不连续区域，并利用RGB-D结构相似性度量自适应选取估计像素点邻域的最优图像块。接着，通过提出的定向非局部均值权重，在图像块区域内建立多边引导下的深度估计，解决彩色边缘和深度不连续区域的结构不一致性。最后，利用RGB-D结构相似性度量与图像平滑性之间的对应关系，对多边引导权重的参数进行自适应调节，实现鲁棒的深度图超分辨率重建。结果在Middlebury合成数据集、ToF和Kinect数据集以及本文自建数据集上的实验结果表明，相比其他先进方法，本文方法能够有效抑制纹理复制伪影。在Middlebury、ToF和Kinect数据集上，本文方法相较于次优算法，平均绝对偏差平均降低约63.51%、39.47 %和7.04 %。结论对于合成数据集以及真实场景的深度数据集，本文方法均能有效处理存在于彩色边缘和深度不连续区域的不一致性问题，更好地保留深度边缘的不连续性。

关键词

深度图像; 超分辨率; RGB-D结构相似性度量; 多边引导; 自适应模型

Multilateral adaptive depth image super-resolution reconstruction viaRGB-D structure similarity measure

Li Qingsong, Zhang Xudong, Zhang Jun, Gao Xinjian, Gao Jun

School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230601, China

Supported by: National Natural Science Foundation of China (61876057, 61403116, 61806066)

Abstract

Objective Depth cameras can capture depth images of the dynamic scene in real time, which exhibits unique advantages in depth information acquisition. However, the depth images are often sensitive to noise and subject to low spatial resolution, and depth values are missing in some areas. Depth information and color edges are two complementary aspects that describe the same scene and therefore have a strong correlation in structure. Depth discontinuous transitions often coincide with color transitions. The color-depth correlation can be used for depth image reconstruction due to the blurred edges of low resolution depth images. Utilizing a high resolution color image as a reference is an important approach for reconstructing a high-resolution depth image. However, rich texture regions can be found in color images, except depth images. Among the highly challenging problems in color-guided depth image reconstruction is the inconsistency between color edges and depth discontinuities in the texture region. Simply passing structural information in the color image to the target one could introduce significant errors. Existing methods tend to consider only color images and ignore the correlation with depth images, which ineffectively solves the inconsistency, resulting in texture copying artifacts and even blurred depth edges. In this paper, we propose a color-guided depth image super-resolution reconstruction algorithm that is robust to inconsistency. Method We propose RGB-D structure similarity measure to predict the color edges that are most likely to coincide with depth discontinuities by using the structural correlation between the color and depth images. We examine local structural gradients rather than the gradient magnitude of individual pixel to measure the inconsistency effectively. Result shows that the proposed RGB-D structure similarity measure is less affected by color texture. We use the proposed RGD-D structure similarity measure as an image patch adaptive selection indicator, which can effectively reflect the discontinuity of the depth edges. The conventional image patch is centered on the pixel to be estimated. However, when the pixel is located in the depth edge region, the depth estimation is blurred due to the changes in the nearby depth gradients. In contrast with the conventional image patches, we select the optimal image patch that is least likely to contain prominent depth edges among all the image patches in the pixel neighborhood, which helps preserve sharp depth edges. Then, the multilateral guided estimation of depth values is performed in the selected optimal patch. We propose an oriented nonlocal means weighting scheme using high-quality structural gradients and directional information by utilizing the nonlocal characteristics of the color and the depth images. The weighting scheme combines with spatial and range kernel as the multilateral guidance for depth estimation, which effectively solves the structural inconsistency, preserves depth discontinuities, and is robust to depth holes. Finally, the three bandwidth parameters of our multilateral guidance weighting scheme are important benchmarks of our depth image super-resolution reconstruction model. The proposed RGB-D structure similarity measure is related to the depth image smoothness, which corresponds to depth discontinuity and is less affected by the incoherent texture. Small bandwidth parameters can effectively preserve depth discontinuities but poorly perform in smoothing noise. Large bandwidth parameters can effectively smooth noise but may blur depth discontinuities. We can adaptively adjust the multilateral guidance weight parameters in accordance with the RGB-D structure similarity measure to achieve robust depth image construction. The framework of our depth image super-resolution reconstruction is based on the multilateral guidance. Moreover, the corresponding relationship between the proposed RGB-D structure similarity measure and image smoothness is used to select the position of the neighborhood image patch and the size of the guide weight parameters adaptively. Result Quantitative and qualitative evaluation results show that our method has promising performance compared with other state-of-the-art methods on Middlebury synthetic, ToF real, and Kinect real datasets and our own dataset. Our method can effectively suppress texture copying artifacts, restore the depth hole image, and preserve depth discontinuities. We use the mean absolute difference as an evaluation metric, which is also the commonly used evaluation analysis metric for the depth image reconstruction. In particular, the mean absolute difference of the proposed method is decreased by approximately 63.51%, 39.47%, and 7.04% on the Middlebury, ToF, and Kinect datasets on average compared with the suboptimal algorithm. Furthermore, when the up-sampling factor of depth image reconstruction increases, the results of our reconstruction are more evident than those of other state-of-the-art methods as we fully utilize the structural information of color images. The other methods ineffectively solve the influence of color textures, whereas the depth image information is no longer reliable. For the depth hole image, most of the previous methods can only restore the depth image but cannot increase the depth image resolution, or the two can be separated. Our method can effectively restore the depth hole image, and the depth image super-resolution reconstruction and experiments on the NYU raw dataset verify the effectiveness of our method. Conclusion Our method can effectively handle the inconsistency between color edges and depth discontinuities in color-guided depth image super-resolution reconstruction, effectively restoring depth holes. In particular, our method can be used for not only synthetic datasets but also real-world depth datasets to preserve depth discontinuities effectively.

Key words

depth image; super-resolution; RGB-D structure similarity measure; multilateral guidance; adaptive model

0 引言

深度图像反映了场景的3维信息，可靠的深度信息可用于计算机视觉相关领域，如3维重建^[1]、机器人导航^[2]和虚拟现实^[3]等。目前，主流深度信息的获取大致可分为被动式^[4]和主动式^[5-6]两类。

被动式通过多目视觉寻找两视图图像或多视图图像每个像素的对应像素，利用不同视点图像同一位置像素的坐标差异计算场景深度信息^[4]。多视点立体视觉近年来研究受限，主要困难是像素点匹配，特别是在单调的颜色区域或无纹理区域因缺少特征^[7]，匹配精度难以保证，被动式往往需要复杂的相机标定。

主动式通过向目标场景发射特定光和接收反射光来计算目标表面深度值，可以在动态场景中获得实时性能。一种是基于飞行时间技术(ToF)，如ToF相机^[5]，通过用调制光源照射场景并测量反射光的相位延迟来确定深度信息，ToF相机独特的感应架构意味着其边缘噪声敏感、空间分辨率低^[8]。另一种是基于结构光的技术，如Kinect相机^[6]，根据物体表面造成的光信号的变化来计算物体的深度信息。当背景被前景遮挡或受物体表面材质的影响，生成的深度图像会包含大量的空洞，造成深度值丢失。主动式深度获取往往受光照环境限制，常用于室内环境，分辨率低且深度测量范围有限^[9]。

尽管基于传感器的深度捕获技术仍在改进，但由于这些设备的物理限制，其使用主要受限于分辨率低、噪声敏感和空洞现象，生成高质量的深度图成为一个活跃的研究领域^[10-11]。一种代表性的方法是应用配准的高分辨率彩色图像作为参考图像来引导深度图像的超分辨率重建^[12-13]。彩色图像引导深度图超分辨率重建的基本思想^[13]是将引导图像的结构信息传递到输入深度图像，彩色图像可以提供更高质量的边缘引导信息，并且不需要复杂的相机标定。但由于彩色图像存在丰富的纹理，彩色边缘和深度不连续区域之间存在不一致性^[14]，这种结构不一致性容易引入纹理复制伪影，如图 1(c)。同时深度图像边缘附近分布着较多噪声甚至深度值丢失，重建过程中需要解决空洞问题，如图 2(b)，这里的上采样因子为8×。

本文在彩色图像引导的深度图重建框架下，提出了对彩色不相干纹理鲁棒，同时有效修复空洞的深度图超分辨率重建算法。针对彩色边缘与深度不连续区域的不一致性，利用提出的RGB-D结构相似性度量，检测彩色图像中与深度不连续一致的边缘区域，获取彩色与深度图共有的边缘结构信息，有效避免了不相干纹理的干扰。不同于传统的以估计像素点为中心的图像块，本文利用RGB-D结构相似性度量，选取像素邻域最优块以排除深度边缘对深度估计的影响。同时利用彩色与深度图的结构信息，提出了定向非局部均值权重，并联合空间内核和强度内核形成多边引导，对深度进行估计，解决了深度估计中彩色与深度的不一致性带来的干扰。然后利用提出的RGB-D结构相似性度量，自适应地调整不同区域块中引导权重的带宽参数，实现稳定和可靠的深度图超分辨率重建。针对不同的上采样因子$n$，算法将整个上采样过程划分为${\rm{log}}_2\;n$个子阶段，并通过尺度因子2×单阶段重建深度图。该方法可用于ToF和Kinect相机获取的真实场景深度图。图 1(d)和图 2(d)的实验结果表明，本文方法可在抑制纹理复制伪影的同时，有效解决空洞问题。

图 1 纹理复制伪影

Fig. 1 Texture copying artifacts ((a) color image; (b) ground truth; (c) guided filter^[15]; (d) our method)

图 2 深度空洞

Fig. 2 Depth holes ((a) color image; (b) raw depth hole image; (c) total generalized variation^[16]; (d) our method)

本文主要贡献可归纳如下：1)联合彩色与深度图像中的结构信息，提出了RGB-D结构相似性度量，提取了彩色图像中与深度不连续一致的边缘区域。2)利用RGB-D结构相似性度量，提出了一种图像块自适应选取方法，获取像素邻域最优块进行深度值估计。3)利用彩色与深度图像非局部信息，提出了定向非局部均值权重，对深度值进行多边引导估计，有效抑制纹理复制伪影。同时，权重带宽参数依据图像块区域自适应调整，实现了更加鲁棒的深度恢复。

1 相关工作

通过传统光学相机能够获得高分辨率彩色图像，利用配准的彩色图像作为引导图对深度图像进行超分辨率重建，是当前深度图重建的一个热点研究领域。依据的基本假设是场景的彩色图像边缘和深度不连续区域的相关性，但由于彩色纹理等的存在，彩色与深度的部分区域存在不一致性问题。彩色图像引导深度图超分辨率重建的相关工作可以分为全局方法^{[10, 16-20]}和局部方法^{[13, 15, 21-26]}。

1.1 全局方法

全局方法通过最小化损失函数将深度重建作为约束优化问题。例如，文献[17]使用两层马尔可夫随机场(MRF)进行上采样，其中数据项计算估计和实际测量的深度图像差异，平滑项权重系数由彩色图像计算。但这种方法很容易在边缘获得过度平滑的效果。Harrison等人^[18]在MRF方法^[17]的基础上添加了二阶平滑约束项，以更好地处理平面和曲面区域。但文献[17-18]均未解决彩色和深度的不一致性问题。Park等人^[19]提出了非局部均值方案和基于高分辨率图像特征的附加边缘权重方案，以保留精细结构和深度不连续性，边缘加权方案结合了分割、颜色相似性、边缘显著性和插值深度图信息，利用深度信息作为引导权重，试图解决不一致性问题，但插值后的深度图噪声多，难以保留深度不连续性。Ferstl等人^[16]定义了全广义变分(TGV)模型, 并通过各向异性扩散张量进行加权以建立正则化项，根据配准的彩色图像计算出的各向异性扩散张量用于指导深度上采样，由于过度依赖彩色信息，引入了纹理复制伪影。Yang等人^[20]通过自适应自回归预测器定义了自回归(AR)模型，该预测器利用彩色图像中的非局部相似性和插值后深度图中的局部相关性，一定程度缓解了彩色与深度的不一致性问题。文献[10]提出了一种任务驱动的训练策略，并通过不断更新的深度图更新引导信息各阶段的参数，完成特定任务的深度重构，但并未有效解决彩色与深度的不一致性问题。

全局方法利用图像的全局信息，重建的深度图整体视觉效果更好，但计算时间较长，由于彩色图像所占损失权重更高，难以解决彩色与深度的不一致性问题，常会引入纹理复制伪影。

1.2 局部方法

局部方法主要利用彩色图像的局部特性提高深度图的分辨率和精度。Kopf等人^[13]提出了一种基于空间距离和强度差异的深度上采样的联合双边滤波器(JBF)，但未考虑到彩色图像中不相干纹理的影响。Chan等人^[21]将传统的联合双边滤波器扩展为具有噪声感知能力的多边上采样滤波器，试图解决彩色纹理与深度不连续的不一致问题，但受噪声影响较大，在真实深度图中难以有效解决纹理问题。Yang等人^[22]使用基于深度概率分布成本量的双边滤波来制定迭代深度细化模块，并且随后应用胜者为王(WTA)和子像素深度细化算法，保证了深度不连续性。基于相似的工作，Fukushima等人^[23]将上采样过程制定为两个阶段，加速了成本量滤波，同时抑制边缘散射和模糊，优化的算法^[23]在GPU上达到了实时的处理速度。但文献[22-23]都未解决彩色边缘和深度不连续的不一致性。Min等人^[24]提出了基于联合直方图寻找全局模式的加权模式滤波(WMF)来抑制混叠对深度上采样的影响，并利用时间一致性，进一步将该方法应用于深度视频。Zuo等人^[25]通过置信度图和基于图像块的非局部均值将加权模式滤波^[24]扩展到精化加权模式滤波，以获得清晰的深度不连续区域，以此解决纹理复制伪影，但效果难以保证。He等人^[15]设计了一个从局部线性模型导出的高效边缘保留平滑滤波器，但边缘区域会出现伪影。文献[26]将自适应边缘感知权重纳入引导滤波器(GF)^[15]，该加权引导图像滤波器(WGIF)复杂度低，但仅能在一定程度上抑制伪影。

局部方法相比于全局方法计算效率更高，但往往仅考虑彩色和深度图局部的特性，不能充分利用图像的全局信息，对于彩色边缘和深度不连续的不一致问题考虑较少。

本文使用高分辨率彩色图像作为低质量深度图的参考图来引导深度图超分辨率重建，有效解决了彩色边缘与深度不连续区域的不一致性，同时自适应模型方法对深度图重建更显鲁棒性，可有效运用于动态场景，方法简单有效。算法利用高分辨率彩色图像与深度图的结构关系，通过提出的RGB-D结构相似性度量，获取了高质量深度图的结构边缘信息。通过RGB-D结构相似性度量有效获取了像素邻域最优图像块，联合利用彩色与深度图像的非局部特性进行多边引导的深度估计，并通过参数自适应方法，有效抑制了纹理复制伪影，保留了深度不连续性。

2 本文方法

给定输入的低分辨率深度图像$\mathit{\boldsymbol{D}}_L$和配准的高分辨率彩色图像$\mathit{\boldsymbol{G}}$，其中$\mathit{\boldsymbol{D}}_L$中位置$(i, j)$处的像素$p_L$对应$\mathit{\boldsymbol{G}}$中位置$(i_t, j_t)$处的像素$p$，$t$是上采样因子，利用$\mathit{\boldsymbol{D}}_L$中已知的稀疏像素点深度值，通过彩色图像的附加结构信息作为引导，算法可重建高分辨率深度图$\mathit{\boldsymbol{D}}$。本文采用基于滤波的超分辨率重建思想，通过$m$×$m$图像块对每个像素点进行深度估计，获得重建深度图$\mathit{\boldsymbol{D}}$为

$ \mathit{\boldsymbol{D}}\left( p \right) = \sum\limits_{{q_L} \in {\mathit{\boldsymbol{ \boldsymbol{\varOmega} }}_{{p_L}}}} K \cdot {\mathit{\boldsymbol{D}}_L}\left( {{q_L}} \right) $

(1)

式中，$p_L$和$q_L$是低分辨率图像$\mathit{\boldsymbol{D}}_L$中像素对应坐标，$\mathit{\pmb{\Omega}}_{p_{L}}$表示包含像素$p_L$的邻域块，$K$是用于测量像素相似度的核函数，提供关键的结构引导信息，合理的$K$函数可以有效解决彩色边缘与深度不连续的不一致性, 抑制纹理复制伪影。Kopf等人^[13]提出的联合双边上采样算法，使用类似的框架，其中空间内核和强度内核构成给出的$K$核函数为

$ K = \frac{1}{{{k_p}}}\exp \left( {\frac{{ - \left\| {{p_L} - {q_L}} \right\|_2^2}}{{2\sigma _s^2}}} \right)\exp \left( {\frac{{ - \left\| {{\mathit{\boldsymbol{G}}_p} - {\mathit{\boldsymbol{G}}_q}} \right\|_2^2}}{{2\sigma _r^2}}} \right) $

(2)

式中，$p$和$q$表示高分辨率图像中像素坐标$\mathit{\boldsymbol{G}}_p$和$\mathit{\boldsymbol{G}}_g$表示$p$和$g$坐标处的像素强度值, $σ_s$和$σ_r$分别为空间内核和强度内核的带宽。$k_p$是归一化项。

该滤波框架使用的内核函数，仅考虑彩色图像局部像素相似性，忽略了彩色图像边缘与深度不连续区域之间的不一致性。带宽参数$σ_s$和$σ_r$在整个计算过程中取值固定，难以适应不同区域块的平滑性要求。

针对以上问题，本文提出了鲁棒的深度图超分辨率重建算法。算法实行逐阶段的2倍上采样，每个阶段步骤如图 3所示。

图 3 本文算法框图

Fig. 3 The pipeline of our method

1) RGB-D结构相似性度量。利用RGB-D结构相似性度量，获取彩色图中与深度边缘一致的不连续区域，避免纹理的干扰。

2) 自适应块选取。根据RGB-D结构相似性度量，选取待估计像素点邻域最优图像块，排除包含边缘的图像块，保留深度的不连续性。

3) 定向非局部均值权重。在选取的图像块内，融合定向非局部均值，建立多边引导下的深度上采样模型。

4) 参数自适应。利用RGB-D结构相似性度量，对上采样模型中的带宽参数自适应调节。

2.1 定向非局部均值权重

为有效解决彩色边缘与深度不连续区域的不一致性对深度估计的影响，联合彩色与深度的非局部结构信息，本文提出了一种定向非局部均值权重项$W_{\rm{N}}(p, q)$，并通过与空间内核和强度内核联合形成多边引导的权重，作为深度估计的基本框架，新的核函数定义为

$ \begin{array}{*{20}{c}} {K = \frac{1}{{{k_p}}}\exp \left( {\frac{{ - \left\| {{p_L} - {q_L}} \right\|_2^2}}{{2\sigma _s^2}}} \right) \times }\\ {\exp \left( {\frac{{ - \left\| {{\mathit{\boldsymbol{G}}_p} - {\mathit{\boldsymbol{G}}_q}} \right\|_2^2}}{{2\sigma _r^2}}} \right){W_N}\left( {p,q} \right)} \end{array} $

(3)

式中，$W_{\rm{N}}(p, q)$是非局部项，测量$p$和$q$在深度图像中位于相同结构上的相似度，定义为

$ {W_{\rm{N}}}\left( {p,q} \right) = \left( {1 - {\alpha _p}} \right){W_{\rm{T}}}\left( {p;q} \right) + {\alpha _p}{W_{\rm{T}}}\left( {p;q} \right) $

(4)

式中，$p$和$q$上的权重$W_{\rm{T}}(p; q)$和$W_{\rm{T}}(q; p)$成非对称性，权重$W_{\rm{T}}(q; p)$与$W_{\rm{T}}(p; q)$类似，$W_{\rm{T}}(p; q)$定义为

$ \begin{array}{*{20}{c}} {{W_{\rm{T}}}\left( {p;q} \right) = \exp \left\{ { - {{\left( {{\mathit{\boldsymbol{D}}_p} - {\mathit{\boldsymbol{D}}_q}} \right)}^{\rm{T}}} \times } \right.}\\ {\left. {{{\left( {{J_p} + \varepsilon } \right)}^{ - 1}}\left( {{\mathit{\boldsymbol{D}}_p} - {\mathit{\boldsymbol{D}}_q}} \right)/\sigma _n^2} \right\}} \end{array} $

(5)

式中，$q$是$p$的邻域像素，$\mathit{\boldsymbol{D}}$为低分辨率深度图$\mathit{\boldsymbol{D}}_L$插值上采样后的深度图，${\sigma}_{n}^{2}$是带宽参数，$ε$是一个小的正常量，防止$J_p$为零。根据局部结构张量^[27]的思想，本文定义了非局部结构感知矩阵分量$J_p$，$\boldsymbol{G}_{\nabla}\left(p^{\prime}\right)=\left[\boldsymbol{G}_{\mathrm{v}}^{x}\left(p^{\prime}\right) \cos \left(\theta_{p}\right), \boldsymbol{G}_{\mathrm{\nabla }}^{v}\left(p^{\prime}\right) \sin \left(\theta_{p}\right)\right]$表示$p$点的方向梯度矢量，$J_p$定义为

$ {J_p} = \frac{1}{{\left\| {{\mathit{\boldsymbol{A}}_0}} \right\|}}\sum\limits_{p' \in \mathit{\boldsymbol{A}}\left( p \right)} {{\mathit{\boldsymbol{G}}_\nabla }\left( {p'} \right){\mathit{\boldsymbol{G}}_\nabla }{{\left( {p'} \right)}^{\rm{T}}}} $

(6)

式中，$\mathit{\boldsymbol{A}}$是以$p$为中心的邻域，$\|\mathit{\boldsymbol{A}}\|_0$是$\mathit{\boldsymbol{A}}$的${\rm{L}}_0$范数，表征像素的个数。定义的权重方案在本文中称为定向非局部均值权重(ONMW)，表征$p$和$q$位于相同深度结构上的相似度，式(4)中

$ {\alpha _p} = \frac{1}{{1 + \exp \left( { - {\sigma _\alpha }\left( {JWIV\left( p \right) - JWIV\left( {p'} \right)} \right)} \right)}} $

(7)

式中，$p′$位于待估计像素点$p$的邻域最优块中心，$JWIV(p)$项为本文提出的RGB-D结构相似性度量，表示$p$点的值，$JWIV(p)－JWIV(p′)$可表征局部结构差异。权重$α_p∈[0.5, 1)$在深度边缘区域较大，真实深度图的边缘常分布噪声，当$p$位于边缘区域时，$W_{\rm{T}}(q; p)$起到非局部均值滤噪的作用，此时，权重$α_p$取较大值。参数$σ_α$控制从平滑到深度不连续区域的过渡带，$σ_α$值越大，$α_p$对局部结构差异越敏感。图 4给出了加入定向非局部均值权重的8×深度重建对比，图中蓝色矩形区域中的彩色图像背景与目标颜色相近，同时该深度图边缘伴有空洞，可以看出该权重对深度的不连续区域重建具有明显的改善。

图 4 有无ONMW权重项的深度图重建结果对比

Fig. 4 Comparison of depth image reconstruction results without and with the ONMW weight ((a) color image and ground truth; (b) without the ONMW; (c) with the ONMW)

2.2 RGB-D结构相似性度量

在本文提出的深度估计基本框架的基础上，为解决彩色边缘和深度不连续区域的不一致性对自适应模型带来的问题，本文提出了一种结构相似性度量，用来检测彩色图像中与深度不连续一致的边缘区域，作为局部最优图像块选取和参数自适应的重要度量依据。算法不仅考虑单个像素的梯度幅度变化，而且对彩色边缘与深度不连续的局部信息进行统计，为降低计算复杂度，算法将彩色图像下采样到与深度图像相同的分辨率，为便于描述，用$\mathit{\boldsymbol{G}}$和$\mathit{\boldsymbol{D}}$表示下采样后的彩色和深度图像，本文提出的结构相似性度量定义为

$ \begin{array}{*{20}{c}} {JWIV\left( p \right) = \left[ {{\mathit{\boldsymbol{G}}_x}\left( p \right){\mathit{\boldsymbol{G}}_y}\left( p \right)} \right] \times }\\ {{{\left[ {{\mathit{\boldsymbol{D}}_x}\left( p \right){\mathit{\boldsymbol{D}}_y}\left( p \right)} \right]}^{\rm{T}}}\Delta \left( {{\mathit{\boldsymbol{ \boldsymbol{\varOmega} }}_p}} \right)} \end{array} $

(8)

$ {\mathit{\boldsymbol{G}}_x}\left( p \right) = \left| {\sum\limits_{q \in {\mathit{\boldsymbol{ \boldsymbol{\varOmega} }}_p}} {{g_{p,q}}} \cdot {{\left( {{\partial _x}\mathit{\boldsymbol{G}}} \right)}_q}} \right| $

(9)

$ {\mathit{\boldsymbol{G}}_y}\left( p \right) = \left| {\sum\limits_{q \in {\mathit{\boldsymbol{ \boldsymbol{\varOmega} }}_p}} {{g_{p,q}}} \cdot {{\left( {{\partial _y}\mathit{\boldsymbol{G}}} \right)}_q}} \right| $

(10)

$ {g_{p,q}} = \exp \left( { - \frac{{{{\left( {{x_p} - {x_q}} \right)}^2} + {{\left( {{y_p} - {y_q}} \right)}^2}}}{{2\sigma _g^2}}} \right) $

(11)

式中，$\mathit{\boldsymbol{G}}_x(p)$和$\mathit{\boldsymbol{G}}_y(p)$为局部区域块$\mathit{\pmb{\Omega}}_p$中水平($x$)和垂直($y$)方向上偏导数和的绝对值，$g_p, q$是基于距离的加权函数，$x_p$和$y_p$表示像素的坐标，$\mathit{\boldsymbol{D}}_x(p)$和$\mathit{\boldsymbol{D}}_y(p)$与$\mathit{\boldsymbol{G}}_x(p)$和$\mathit{\boldsymbol{G}}_y(p)$的定义类似。为了反映平滑区域和不连续区域中图像局部块幅度的绝对大小，本文采用深度图像中区域块内的方差作为尺度因子，表示为$\Delta \left({{\mathit{\boldsymbol{ \boldsymbol{\varOmega} }}_p}} \right) = Var\left({\mathit{\boldsymbol{D}}\left({{\mathit{\boldsymbol{ \boldsymbol{\varOmega} }}_p}} \right)} \right)$, 其对深度图的噪声也具有鲁棒性。本文将RGB-D结构相似性度量称为联合窗口内变分(JWIV)。$\mathit{\boldsymbol{G}}_x(p)$和$\mathit{\boldsymbol{G}}_y(p)$值的大小取决于窗口中每个梯度的方向是否一致，通常，具有深度边缘结构的窗口比具有复杂纹理的窗口更可能在相同方向上产生梯度值，JWIV在深度边缘区域能够产生较大值，在纹理区域及平滑区域产生较小值。图 5(c)和图 5(d)分别显示彩色和深度图的窗口内变分，浅色代表较大值，这里上采样因子为8×。图 5(e)是本文提出的联合窗口内变分，可以有效提取存在于彩色图像中与深度不连续一致的边缘区域，解决了彩色与深度的不一致问题。

图 5 不同图像窗口内变分对比

Fig. 5 Comparison of windowed inherent variation with different images ((a) color image; (b) ground truth; (c) windowed inherent variation of color image; (d) windowed inherent variation of depth image; (e) joint windowed inherent variation (JWIV))

2.3 自适应块选取

利用提出的RGB-D结构相似性度量，可获得深度估计中像素邻域的最优图像块，并在该图像块区域通过基本深度估计框架式(1)对估计像素点深度值进行多边引导估计。传统图像块以待估计像素点为中心, 如图 6(a)所示，两相邻像素$e$和$p$位于深度边缘两侧，各自的图像块$\mathit{\pmb{\Omega}}_e$和$\mathit{\pmb{\Omega}}_p$($e$和$p$分别为绿色和红色块区域的中心像素点)覆盖了深度边缘，导致$e$和$p$各自深度值估计不准确。本文的任务是找到包含$p$的最优邻域图像块$\mathit{\pmb{\Omega}}_q$，使其满足不包含深度边缘的最大可能性，其中$q$∈$\mathit{\pmb{\Omega}}_p$。由于彩色纹理的干扰，不能直接用彩色信息进行邻域块的选取，本文利用RGB-D结构相似性度量作为像素邻域最优块选取的依据，反映了深度的不连续性。给定像素$p$，假定图像块大小为$m×m$，则包含$p$的图像块个数为$m$²，计算其邻域$m^2$个图像块的联合窗口内变分值，选取最小值的图像块作为深度估计点的最优块。图 6(b)选取了红色块作为$p$点的最优块，避免了边缘另一侧区域对$p$点深度值估计带来的干扰。

图 6 传统图像块与本文提出的自适应块图示

Fig. 6 Illustrations of the conventional patch and our proposed adaptive patch ((a) conventional patches; (b) and (c) adaptive patches; (d) combined with affinity distance term)

实际情况更为复杂，如图 6(c)所示，绿色块覆盖了更多接近估计像素点深度值的区域，更适合用来计算深度值，但红色块比绿色块包含更多平滑区域，红色块被认为最优块。针对这一问题，本文添加了一个亲和距离项用来优化图像块的选取。假设$\mathit{\pmb{κ}}$表示连接待估计点$p$和邻域点$q$的一条路径，路径上的每个像素可以表示为$p=x_κ(1)$, $x_κ(2)$, $x_κ(3)$, …, ${x_\kappa }\left({{\rm{||}}\mathit{\boldsymbol{\kappa }}{\rm{|}}{{\rm{|}}_0}} \right) = q, \; {\rm{||}}\mathit{\boldsymbol{\kappa }}{\rm{|}}{{\rm{|}}_0}$为$\mathit{\boldsymbol{κ}}$的${\rm{L}}_0$范数, 表示路径上像素的数目，$σ_d$是带宽参数，$p$和$q$之间的亲和距离为

$ d\left( {p,q} \right) = \sum\limits_{i = 2}^{{{\left\| \mathit{\boldsymbol{\kappa }} \right\|}_0}} {\exp \left( {\frac{{ - \left\| {\mathit{\boldsymbol{G}}\left( {{x_\kappa }\left( i \right)} \right) - \mathit{\boldsymbol{G}}\left( {{x_\kappa }\left( 1 \right)} \right)} \right\|_2^2}}{{2\sigma _d^2}}} \right)} $

(12)

图 6(d)显示了该方法的有效性，红色和绿色点分别表示各自图像块的中心，由于黑色点位于与绿色点相似的区域中，中间没有不连续的边缘，因此亲和距离小于与红色点的距离。

$ Z\left( p \right) = \mathop {\arg \min }\limits_q \left( {JWIV\left( q \right) + \lambda \cdot d\left( {p,q} \right)} \right) $

(13)

对于每个像素$p$，本文通过最小化式(13)的代价函数获得像素点最优邻域块位置。式中，$λ$是用于平衡RGB-D结构相似性度量项和亲和距离项的超参数。

2.4 参数自适应

本文依据$K$核函数对深度值进行估计，式(3)和式(5)中的带宽参数$σ_s$、$σ_r$和$σ_n$是算法中的3个重要参数。利用提出的RGB-D结构相似性度量可自适应调整参数的值，小的带宽参数值可以更好地保留深度不连续性，但对噪声的平滑处理较差。相反，大的带宽值可以更好地平滑噪声，但会模糊深度不连续区域。因此，根据深度区域的平滑性自适应选取带宽参数值可以达到在保留深度不连续性的同时平滑去噪的目的。图 7给出了参数自适应函数曲线。

图 7 参数自适应函数曲线

Fig. 7 Parameter adaptive function curve

本文的参数自适应函数$σ$为

$ \sigma = a + b \cdot \exp \left( { - \varepsilon \cdot \left( {r + \tau } \right)} \right) $

(14)

式中，$r$表示待估计像素点最优图像块的RGB-D结构相似性度量JWIV的值，$a$和$b$控制带宽取值范围，如果$r$值过小，则认为局部区域相对平滑，参数$σ$取较大值，如果$r$值过大，则像素可能处于深度不连续区域，参数值更接近最低值$a$。参数$ε$控制平滑和不连续之间的过渡区域宽度，$ε$值越大，过渡区域越陡，对该区域的平滑性越敏感(图 7中粉色曲线)。参数$τ$控制过渡区域中心点的相对平移，图中使用了蓝色和绿色曲线表示。参数$a、b$、$ε$和$τ$的取值由深度传感器的特性决定，可通过实验测得。

图 8是不同带宽参数深度重建结果对比。图 8(c)第1行给出了算法在较大带宽参数下的上采样结果，其中部分深度不连续区域变得模糊。图 8(c)第2行给出了在小带宽条件下的结果，其保留了深度不连续性，但出现了纹理伪影等噪声。图 8(d)给出了本文提出的参数自适应方法，可有效保留深度不连续性并平滑噪声，上采样因子为8×。图 8(e)是本文提出的RGB-D结构相似性度量JWIV，以此作为平滑性依据，本质上反映了深度的不连续性，同时不受不相干纹理的影响。

图 8 不同带宽参数深度重建结果对比

Fig. 8 Comparison of depth reconstruction results in terms of different bandwidth parameter sizes ((a) color images; (b) ground truth; (c) large bandwidth for the first row and small bandwidth for the second row; (d) adaptive bandwidth based on (14); (e) the corresponding JWIV maps based on (8))

3 实验结果及分析

3.1 实验设置

实验使用的PC机配置为：Intel Xeon E5-2670 CPU 2.60 GHz×16 cores、RAM 16 GB，Ubuntu16.04的64位操作系统，编译软件为MATLAB 2016b。

1) 实验数据集。实验在Middlebury数据集^[28-29]以及真实场景的ToFMark^[16]和NYU^[30]数据集上进行实验分析。Middlebury合成数据集是采用结构光技术合成的深度数据集，包含真值图，实验使用包含部分深度值丢失的原始深度图。ToFMark数据集是使用PMD Nano型号ToF相机拍摄的3张真实场景深度图，NYU数据集是使用Kinect相机拍摄的室内场景深度图。另外本文还在自己搭建的Color-Depth相机实验采集平台中对捕获的深度图进行定性比较，高分辨率相机为Point Grey^Ⓡ GZL-CL-41C6M-C，分辨率为2 048×2 048像素，ToF相机是PMD^Ⓡ Camcube 2.0，分辨率为204×204像素，实验利用Halcon标定板对两种相机进行标定实验。

2) 参数设置。实验选取的图像块大小为9×9，式(6)中非局部均值块大小为7×7，式(13)中的超参量$λ$为1.5。各权重带宽参数的参量最优值由深度传感器特性决定，假设深度图被归一化为[0, 1]，则合成数据集中参数$σ_s$、$σ_r$和$σ_n$各自$a、b、ε$、$τ$分别测得为0.3、0.15、1.0、0.36；0.025、0.025、1.0、0.18和0.25、0.15、1.0、0.18。无空洞真实深度图中$σ_s$、$σ_r$和$σ_n$各自$a、b、ε$、$τ$分别为0.25、0.08、1.0、0.36；0.08、0.04、1.0、0.18和2.0、1.0、1.0、0.18。空洞真实深度图中$σ_s$、$σ_r$和$σ_n$各自$a、b、ε$、$τ$分别为0.8、0.5、1.0、0.36；0.06、0.04、1.0、0.18和2.0、1.0、1.0、0.18。

3) 评价指标。均方根误差之前曾用于评估深度图像的超分辨率重建算法^{[16, 19]}，但Liu等人^[31]在实验分析中发现在深度图重构中这样的误差指标对重建的模糊深度边界不敏感，而对那些产生尖锐边界的深度区域，反而会过度放大由少数异常深度值产生的误差。因此本文采用现有方法^{[20, 32]}中使用的平均绝对偏差(MAD)作为评价指标。

3.2 实验结果分析

3.2.1 RGB-D结构相似性度量和参数自适应对重建效果的影响

实验对有无RGB-D结构相似性度量和带宽参数自适应分别进行了定量比较，实验在Middlebury数据集上进行，分别进行2×、4×、8×、16×上采样因子的平均MAD误差比较。图 9是使用结构相似性度量JWIV和参数自适应的定量比较结果。

图 9 不同方法对重建效果的影响

Fig. 9 Influence effectiveness of different methods on reconstruction

((a) quantitative comparisons with and without JWIV; (b) quantitative comparisons with and without parameter adaptation)

如图 9中蓝色柱状图所示，本文使用结构相似性度量JWIV和参数自适应方法均产生了更低的误差。且随着上采样因子的增大，误差对比越明显，原因在于大的上采样因子重建中，低分辨率深度图提供的结构信息更加模糊，而本文方法充分利用了彩色图像，获取了有效的结构引导信息。

3.2.2 实验对比

除双三次插值方法外，与近年常用的彩色图像引导的深度图重建算法对比，包括引导图像滤波(GF)^[15]、基于MRF的上采样^[17]、边缘加权非局部均值正则化^[19]、全广义变分(TGV)^[16]、基于CNN的联合图像滤波^[11]、联合静态和动态引导滤波(SDF)^[33]以及近期提出的基于动态指导学习的深度重建^[10]。

1) 合成数据集实验。实验在Middlebury合成数据集上进行分析，包括Art、Books、Dolls、Laundry、Moebius和Reindeer等6个数据集。表 1和表 2给出了在Middlebury合成数据集中相关方法上采样结果的定量比较，本文方法在所有测试图像上始终优于其他方法，具有最低MAD误差。

表 1 平均绝对偏差(MAD)在Middlebury数据集上的定量比较(A)
Table 1 Quantitative comparison of mean absolute deviation (MAD) on the Middlebury dataset (A)

下载CSV

方法	Art				Books				Dolls
方法	2×	4×	8×	16×	2×	4×	8×	16×	2×	4×	8×	16×
Bicubic	0.85	1.45	2.66	4.86	0.72	1.15	1.70	2.59	0.83	1.23	1.90	2.68
MRF^[17]	0.64	1.38	2.90	5.59	0.17	0.37	0.74	1.52	0.24	0.50	1.05	2.08
Park^[19]	0.55	0.69	1.69	4.61	0.24	0.30	0.72	2.37	0.29	0.36	0.81	2.28
GF^[15]	1.02	1.65	2.79	4.79	0.90	1.39	1.90	3.12	0.96	1.39	1.98	2.86
TGV^[16]	0.65	0.91	3.11	7.54	0.59	0.76	2.31	6.27	0.53	0.56	2.18	6.66
DJIF^[11]	-	1.47	1.63	3.04	-	1.39	1.23	1.94	-	1.42	1.30	2.01
SDF^[33]	0.49	1.13	2.60	5.25	0.27	0.53	1.08	2.36	0.46	0.91	1.48	2.82
GDE^[10]	0.55	1.05	1.93	3.68	0.66	0.96	1.39	1.97	0.77	0.99	1.30	1.79
本文	0.20	0.40	0.66	1.34	0.09	0.20	0.36	0.72	0.12	0.22	0.40	0.73
注:加粗字体为最优结果。

表 2 平均绝对偏差(MAD)在Middlebury数据集上的定量比较(B
Table 2 Quantitative comparison of mean absolute deviation (MAD) on Middlebury dataset (B)

下载CSV

方法	Laundry				Moebius				Reindeer
方法	2×	4×	8×	16×	2×	4×	8×	16×	2×	4×	8×	16×
Bicubic	0.60	0.97	1.59	2.63	0.68	1.06	1.75	2.78	0.57	0.89	1.62	3.01
MRF^[17]	0.27	0.58	1.20	2.50	0.24	0.53	1.13	2.33	0.30	0.65	1.37	2.78
Park^[19]	0.31	0.38	0.88	2.45	0.24	0.31	0.70	2.02	0.34	0.42	0.98	2.83
GF^[15]	0.68	0.99	1.52	2.62	0.81	1.20	1.86	3.00	0.68	1.07	1.69	3.04
TGV^[16]	0.44	0.63	2.45	6.58	0.46	0.55	1.95	7.03	0.43	0.54	2.31	8.10
DJIF^[11]	-	0.99	1.07	1.77	-	1.10	1.25	2.06	-	1.02	1.09	1.95
SDF^[33]	0.32	0.63	1.42	3.26	0.30	0.60	1.22	2.51	0.38	0.76	1.45	3.14
GDE^[10]	0.59	0.87	1.36	2.14	0.56	0.81	1.20	1.81	0.46	0.72	1.16	1.91
本文	0.12	0.23	0.41	0.79	0.10	0.20	0.38	0.71	0.12	0.24	0.38	0.68
注:加粗字体为最优结果。

如图 10所示，本文对Dolls和Reindeer图像进行了8×重建的定性比较。SDF方法^[33]产生了与本文相近的结果，但当引导图像中的边缘较弱时，相应的深度不连续性会严重模糊，而MRF方法^[17]和Park方法^[19]的结果在深度边缘会引入伪影且边缘模糊，TGV方法^[16]在配准的彩色图像具有丰富纹理区域中会引入一些纹理复制伪影。本文的方法可以有效抑制纹理伪影，同时保留深度不连续性，如图 10(g)第1个例子中标出的红色矩形区域，头发具有与其背景相似的橙红色，与深度图出现了结构不一致，本文方法很好地解决了这一问题。图 10(h)的JWIV图准确反映了深度图像的特性，有利于图像块和权重参数的自适应选取，有助于解决彩色边缘与深度不连续的不一致性问题。

图 10 基于Middlebury数据集的8×上采样结果

Fig. 10 8×upsampling results of Middlebury dataset ((a) color images; (b) ground truth; (c) MRF^[17]; (d) TGV^[16]; (e) Park^[19]; (f) SDF^[33]; (g) our method; (h) JWIV maps)

2) ToF数据集实验。在真实场景的ToFMark数据集上对算法进行评估。由于提供的原始深度图像丢失了很多深度值，实验使用文献[20]的重建结果作为真值更好地进行定量分析。图 11给出了本文方法与其他方法的视觉比较，表 3给出了定量比较结果，与在合成数据集中不同的是，其他方法^{[15-16, 19, 33]}在彩色图像具有丰富纹理的区域中产生了非常明显的纹理伪影，而从图 11(g)可看出，本文方法有效地解决了这个问题。图 11(h)的JWIV参数图对纹理和噪声具有很高的鲁棒性，在真实场景深度数据中同样适用。

图 11 基于ToFMark数据集的8×上采样结果

Fig. 11 8×upsampling results of ToFMark dataset ((a) intensity images; (b) ground truth; (c) GF^[15]; (d) Park^[19]; (e) SDF^[33]; (f) TGV^[16]; (g) our method; (h) JWIV maps)

表 3 平均绝对偏差(MAD)在ToFMark数据集的定量比较
Table 3 Quantitative comparison of mean absolute deviation (MAD) on ToFMark dataset

下载CSV

方法	Books				Devil				Shark
方法	2×	4×	8×	16×	2×	4×	8×	16×	2×	4×	8×	16×
Bicubic	0.65	1.44	3.46	4.74	0.44	0.96	2.19	2.58	0.72	1.51	3.75	5.23
MRF^[17]	0.54	1.22	2.62	5.62	0.30	0.79	1.89	4.42	0.61	1.15	2.24	4.72
Park^[19]	0.63	1.44	3.28	6.70	0.39	0.98	2.27	4.95	0.62	1.42	3.40	7.64
GF^[15]	0.85	1.74	3.53	6.69	0.52	1.21	2.89	5.92	0.97	1.83	3.90	8.08
TGV^[16]	0.29	0.62	2.53	9.46	0.21	0.44	2.13	8.50	0.42	0.87	2.56	9.10
DJIF^[11]	-	1.05	1.69	3.21	-	0.79	1.00	1.85	-	1.18	1.88	3.16
SDF^[33]	1.10	2.16	3.90	7.28	0.86	1.95	4.05	7.59	1.23	2.32	4.27	7.79
GDE^[10]	0.78	1.34	2.88	5.45	0.70	1.00	1.81	3.43	0.96	1.66	3.58	6.08
本文	0.27	0.57	1.01	1.96	0.18	0.32	0.57	1.39	0.37	0.69	1.05	2.01
注:加粗字体为最优结果。

图 12给出了在自己搭建的实验平台中所获深度图进行重建的定性比较，为减小相机标定带来的误差，实验进行了4×深度图超分辨率重建，与其他方法相比，本文方法在边缘处产生了更少的噪声，受纹理影响更小。

图 12 基于PMD数据集的4×上采样结果

Fig. 12 4×upsampling results of PMD dataset

((a) intensity and depth images; (b) GF^[15]; (c) MRF^[17]; (d) Park^[19]; (e) TGV^[16]; (f) our method; (g) JWIV maps)

3) Kinect数据集实验。在真实场景的NYU数据集上随机选取449个RGB-D图像对进行实验分析。表 4是平均MAD的定量比较(部分实验结果由Li等人^[11]提供)，本文方法获得了所有449个图像对中最低平均MAD。图 13给出了两个用于视觉比较的室内场景，MRF^[17]和GF^[15]两种方法过度平滑了深度不连续区域，而Park等人^[19]方法倾向于在深度不连续区域周围产生锯齿状伪影，TGV方法^[16]可以保留锐利的深度不连续区域，但对彩色图中不相关的纹理过于敏感，可能在平滑的深度区域产生纹理伪影。本文方法可以很好地保留不连续的边缘，同时抑制由彩色边缘和深度不连续区域的不一致性引起的纹理复制伪影，针对NYU数据集的进一步评估，验证了本文算法在处理真实复杂室内场景中的有效性。

表 4 平均绝对偏差(MAD)在NYU数据集的定量比较
Table 4 Quantitative comparison of mean absolute deviation (MAD) on the NYU dataset

下载CSV

方法	2×	4×	8×	16×
方法	Bicubic	0.57	1.50	3.17	6.04
MRF^[17]	-	1.47	3.14	6.01
Park^[19]	-	0.92	2.09	4.97
GF^[15]	-	1.45	3.09	5.99
TGV^[16]	-	1.20	2.70	9.08
DJIF^[11]	-	0.84	1.42	2.43
SDF^[33]	-	0.87	2.45	4.78
GDE^[10]	0.91	1.13	1.74	3.02
本文	0.26	0.58	1.35	2.43
注:加粗字体为最优结果。

图 13 基于NYU数据集的8×上采样结果

Fig. 13 8×upsampling results of NYU dataset ((a) color images; (b) ground truth; (c) MRF^[17]; (d) GF^[15]; (e) Park^[19]; (f) TGV^[16]; (g) our method; (h) JWIV maps)

同时，在NYU数据集上对深度图像中存在的空洞情况进行了相关实验，如图 14所示，原始深度图包含了大量的空洞，在深度不连续区域尤为显著。由于一些相关方法不能直接应用于原始深度图上采样，实验仅与MRF^[17]、TGV^[16]和SDF^[33]进行定性比较。TGV方法^[16]可以处理小的空洞，但对大的空洞处理不足，MRF方法^[17]可填充空洞，但倾向于过度平滑深度不连续区域，而SDF方法^[33]在沿深度边缘引入了一些锯齿状伪影。本文方法可有效地修复深度图空洞，保留深度不连续性，JWIV结构相似性度量同样发挥了重要作用，同时多阶段重建也有利于空洞的修复。在NYU数据集的实验表明，本文方法能够在深度图超分辨率重建的同时修复空洞，生成高质量的深度图像。

图 14 基于NYU空洞数据集的8×上采样结果

Fig. 14 8× upsampling results of NYU dataset with depth holes ((a) color images; (b) raw depth images; (c) TGV^[16]; (d) MRF^[17]; (e) SDF^[33]; (f) our method; (g) JWIV maps)

3.2.3 算法时间效率对比

实验对不同算法的时间效率进行了比较，分别对比了不同算法在3种数据集上单幅图像的平均计算时间。如表 5所示，其中GDE^[10]和DJIF^[11]都属于通过预训练的模型直接进行深度重建，本文算法相比于另外3种算法都取得了最优的计算效率。

表 5 算法计算时间比较
Table 5 Cost time for different algorithms

下载CSV

/s
数据集	GDE^[10]	DJIF^[11]	Park^[19]	SDF^[33]	TGV^[16]	本文方法
Middlebury	20	12	236	177	751	167
ToFMark	8	5	77	56	230	53
NYU	4	3	72	36	155	34
注:加粗字体为最优结果。

4 结论

本文以彩色图像作为引导图对低分辨率的深度图进行超分辨率重建为研究背景，提出了一种对彩色边缘和深度不连续区域的不一致性鲁棒的深度图超分辨率重建算法。利用提出的RGB-D结构相似性度量获取了彩色图像中与深度图一致的边缘区域，并进行像素点邻域块的自适应选取，保留深度不连续性。联合定向非局部均值权重对深度值进行多边引导估计，解决了彩色与深度图的不一致性问题，然后基于图像块区域的参数自适应实现鲁棒的深度图超分辨率重建。实验结果表明，与其他方法相比，在合成数据集和真实数据集上，本文方法均可有效解决存在于彩色和深度图上的不一致性问题，抑制纹理复制伪影，较好地修复空洞，深度估计更加精确。但是，本文方法在高光和阴影的环境下，深度重建效果会受到影响。下一步，将研究并解决高光和阴影对深度估计的影响，同时针对算法的计算效率，利用滑动窗口原理，提高算法的时效性。

参考文献

[1] Nakagawa Y, Kihara K, Tadoh R, et al. Super resolving of the depth map for 3D reconstruction of underwater terrain using kinect[C]//Proceedings of the 22nd International Conference on Parallel and Distributed Systems. Wuhan, China: IEEE, 2016: 1237-1240.[DOI:10.1109/icpads.2016.0168]

[2] Häne C, Zach C, Lim J, et al. Stereo depth map fusion for robot navigation[C]//Proceedings of 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems. San Francisco, CA: IEEE, 2011: 1618-1625.[DOI:10.1109/iros.2011.6094704]

[3] Chen C F, Bolas M, Rosenberg E S. Rapid creation of photorealistic virtual reality content with consumer depth cameras[C]//Proceedings of 2017 IEEE Virtual Reality. Los Angeles, CA: IEEE, 2017: 473-474.[DOI:10.1109/vr.2017.7892385]

[4] Yang Y X, Gao M Y, Yin K, et al. High-quality depth map reconstruction combining stereo image pair[J]. Journal of Image and Graphics, 2015, 20(1): 1–10. [杨宇翔, 高明煜, 尹克, 等. 结合同场景立体图对的高质量深度图像重建[J]. 中国图象图形学报, 2015, 20(1): 1–10. ] [DOI:10.11834/jig.20150101]

[5] Xiao L, Heide F, O'Toole M, et al. Defocus deblurring and superresolution for time-of-flight depth cameras[C]//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA: IEEE, 2015: 2376-2384.[DOI:10.1109/cvpr.2015.7298851]

[6] Smisek J, Jancosek M, Pajdla T. 3D with Kinect[M]//Fossati A, Gall J, Grabner H, et al. Consumer Depth Cameras for Computer Vision: Research Topics and Applications. London: Springer, 2013: 3-25.[DOI:10.1007/978-1-4471-4640-7_1]

[7] Guðmundsson S Á, Aanaes H, Larsen R. Fusion of stereo vision and time-of-flight imaging for improved 3d estimation[J]. International Journal of Intelligent Systems Technologies and Applications, 2008, 5(3-4): 425–433. [DOI:10.1504/ijista.2008.021305]

[8] Li L. Time-of-flight camera——an introduction[R]. Technical White Paper SLOA190B. Dallas: Texas Texas Instruments, 2014.

[9] Hansard M, Lee S, Choi O, et al. Time-of-Flight Cameras:Principles, Methods and Applications[M]. London: Springer, 2013.

[10] Gu S H, Zuo W M, Guo S, et al. Learning dynamic guidance for depth image enhancement[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI: IEEE, 2017: 712-721.[DOI:10.1109/cvpr.2017.83]

[11] Li Y J, Huang J B, Ahuja N, et al. Deep joint image filtering[C]//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, The Netherlands: Springer, 2016: 154-169.[DOI:10.1007/978-3-319-46493-0_10]

[12] Bartczak B, Koch R. Dense depth maps from low resolution time-of-flight depth and high resolution color views[C]//The 5th International Symposium on Advances in Visual Computing. Las Vegas, NV: Springer, 2009: 228-239.[DOI:10.1007/978-3-642-10520-3_21]

[13] Kopf J, Cohen M F, Lischinski D, et al. Joint bilateral upsampling[J]. ACM Transactions on Graphics, 2007, 26(3). [DOI:10.1145/1276377.1276497]

[14] Shen X Y, Zhou C, Xu L, et al. Mutual-structure for joint filtering[C]//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE, 2015: 3406-3414.[DOI:10.1109/iccv.2015.389]

[15] He K M, Sun J, Tang X O. Guided image filtering[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(6): 1397–1409. [DOI:10.1109/TPAMI.2012.213]

[16] Ferstl D, Reinbacher C, Ranftl R, et al. Image guided depth upsampling using anisotropic total generalized variation[C]//Proceedings of 2013 IEEE International Conference on Computer Vision. Sydney, NSW: IEEE, 2013: 993-1000.[DOI:10.1109/iccv.2013.127]

[17] Diebel J, Thrun S. An application of Markov random fields to range sensing[C]//Proceedings of the 18th International Conference on Neural Information Processing Systems. Cambridge, MA: MIT Press, 2005: 291-298.

[18] Harrison A, Newman P. Image and sparse laser fusion for dense scene reconstruction[C]//Proceedings of the 7th International Conference on Field and Service Robotics. Berlin, Heidelberg: Springer, 2010: 219-228.[DOI:10.1007/978-3-642-13408-1_20]

[19] Park J, Kim H, Tai Y W, et al. High quality depth map upsampling for 3D-TOF cameras[C]//Proceedings of 2011 International Conference on Computer Vision. Barcelona, Spain: IEEE, 2011: 1623-1630.[DOI:10.1109/iccv.2011.6126423]

[20] Yang J Y, Ye X C, Li K, et al. Depth recovery using an adaptive color-guided auto-regressive model[C]//Proceedings of the 12th European Conference on Computer Vision. Florence, Italy: Springer, 2012: 158-171.[DOI:10.1007/978-3-642-33715-4_12]

[21] Chan D, Buisman H, Theobalt C, et al. A noise-aware filter for real-time depth upsampling[C]//Proceedings of Workshop on Multi-camera and Multi-modal Sensor Fusion Algorithms and Applications. Marseille, France: HAL, 2008.

[22] Yang Q X, Yang R G, Davis J, et al. Spatial-depth super resolution for range images[C]//Proceedings of 2007 IEEE Conference on Computer Vision and Pattern Recognition. Minneapolis, MN: IEEE, 2007: 1-8.[DOI:10.1109/cvpr.2007.383211]

[23] Fukushima N, Takeuchi K, Kojima A. Self-similarity matching with predictive linear upsampling for depth map[C]//Proceedings of 20163DTV-Conference: The True Vision-Capture, Transmission and Display of 3D Video. Hamburg, Germany: IEEE, 2016: 1-4.[DOI:10.1109/3dtv.2016.7548889]

[24] Min D B, Lu J B, Do M N. Depth video enhancement based on weighted mode filtering[J]. IEEE Transactions on Image Processing, 2012, 21(3): 1176–1190. [DOI:10.1109/tip.2011.2163164]

[25] Zuo X X, Zheng J B. A refined weighted mode filtering approach for depth video enhancement[C]//Proceedings of 2013 International Conference on Virtual Reality and Visualization. Xi'an, China: IEEE, 2013: 138-144.[DOI:10.1109/icvrv.2013.30]

[26] Li Z G, Zheng J H, Zhu Z J, et al. Weighted guided image filtering[J]. IEEE Transactions on Image Processing, 2015, 24(1): 120–129. [DOI:10.1109/TIP.2014.2371234]

[27] Chen J, Tang C K, Wang J. Noise brush:interactive high quality image-noise separation[J]. ACM Transactions on Graphics, 2009, 28(5): #146. [DOI:10.1145/1661412.1618492]

[28] Scharstein D, Pal C. Learning conditional random fields for stereo[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Minneapolis, MN: IEEE, 2007: 1-8.[DOI:10.1109/cvpr.2007.383191]

[29] Hirschmuller H, Scharstein D. Evaluation of cost functions for stereo matching[C]//Proceedings of 2007 IEEE Conference on Computer Vision and Pattern Recognition. Minneapolis, MN: IEEE, 2007: 1-8.[DOI:10.1109/cvpr.2007.383248]

[30] Silberman N, Hoiem D, Kohli P, et al. Indoor segmentation and support inference from RGBD images[C]//Proceedings of the 12th European Conference on Computer Vision. Florence, Italy: Springer, 2012: 746-760.[DOI:10.1007/978-3-642-33715-4_54]

[31] Liu M Y, Tuzel O, Taguchi Y. Joint geodesic upsampling of depth images[C]//Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland, OR: IEEE, 2013: 169-176.[DOI:10.1109/cvpr.2013.29]

[32] Liu W, Chen X G, Yang J, et al. Robust color guided depth map restoration[J]. IEEE Transactions on Image Processing, 2017, 26(1): 315–327. [DOI:10.1109/tip.2016.2612826]

[33] Ham B, Cho M, Ponce J. Robust image filtering using joint static and dynamic guidance[C]//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA: IEEE, 2015: 4823-4831.[DOI:10.1109/cvpr.2015.7299115]