发布时间: 2018-03-16
摘要点击次数:
全文下载次数:
DOI: 10.11834/jig.170436
2018 | Volume 23 | Number 3

图像理解和计算机视觉

图像深度估计硬件实现算法

杨媛, 陈福

西安理工大学自动化与信息工程学院, 西安 710048

收稿日期: 2017-08-17; 修回日期: 2017-11-08

基金项目: 国家自然科学基金项目（61102017）

第一作者简介: 杨媛(1974-), 女, 教授, 2004年于西安理工大学获微电子学与固体电子学博士学位, 主要研究方向为超大规模集成电路设计。E-mail:yangyuan@xaut.edu.cn.

中图法分类号: TP391

文献标识码: A

文章编号: 1006-8961(2018)03-0362-10

摘要

目的近年来，3DTV（3-dimension television）与VR（virtual reality）技术迅速发展，但3D内容的短缺却成为该类技术发展的瓶颈。为快速提供更多的3D内容，需将现有的2D视频转换为3D视频。深度估计是2D转3D技术的关键，为满足转换过程中实时性较高的要求，本文提出基于相对高度深度线索方法的硬件实现方案。方法首先对灰度图进行Sobel边缘检测得到边缘图，然后对其进行线性追踪以及深度赋值完成深度估计得到深度图。在硬件实现方案中，Sobel边缘检测采用五级流水设计以及并行线轨迹计算方式，充分利用硬件设计的并行性，以提高系统的处理效率；在深度估计中通过等效处理简化“能量函数”的方式将算法中大量的乘法、除法以及指数运算简化成加法、减法和比较运算，以减小硬件资源开销；同时方案设计中巧妙借助SDRAM（synchronous dynamic random access memory）突发特性完成行列转换，节省系统硬件资源。结果最后完成了算法的FPGA（field programmable gate array）实现，并选取了2幅图像进行深度信息提取。将本文方法的软硬件处理效果与基于运动估计的深度图提取方法进行对比，结果表明本文算法相较于运动估计方法对图像深度图提取效果更好，同时硬件处理可以实现对2D图像的深度信息提取，且具有和软件处理一致的效果。在100 MHz的时钟频率下，估算帧率可达33.18帧/s。结论本文提出的硬件实现方案可以完成对单幅图像的深度信息提取且估算帧率远大于3DTV等3维视频应用中实时要求的24帧/s，具有很好的实时性和可移植性，为后期的视频信息处理奠定了基础。

关键词

相对高度深度线索; Sobel; 线性追踪; 五级流水; 深度图

Hardware implementation algorithm of image depth estimation

Yang Yuan, Chen Fu

Faculty of Automation and Information Engineering, Xi'an University of Technology, Xi'an 710048, China

Supported by: National Natural Science Foundation of China (61102017)

Abstract

Objective In recent years, 3D television (3DTV) and virtual reality technology have developed rapidly, but the shortage of 3D resources has become the bottleneck of this technology development. Existing 2D videos must be converted to 3D videos to provide more 3D resources quickly. Depth estimation is the key step of 2D to 3D technology. Hardware implementation is one of the effective methods to meet the requirements of real-time conversion process. Most depth estimation algorithms make hardware implementation highly complex. Considering the depth estimation effect and easy implementation, this study proposes a hardware implementation scheme based on relative height and depth cue method to realize high-speed processing and hardware resource saving. Method For the algorithm level, a color image is first converted to grayscale, and the edge graph is obtained by Sobel edge detection of a grayscale image. Line trace is obtained by a line tracing algorithm, and the depth map is obtained by the depth assignment of the line trajectory.In hardware implementation, Sobel edge detection uses a five-stage pipeline design and parallel trajectory calculation to maximize the parallelism of hardware design to improve system efficiency. In the depth estimation, energy function is simplified by equivalent processing. Thus, a large number of multiplication, division, and exponential operations are replaced by addition, subtraction, and comparison operations. More than 2 300 multiplication and division operations and more than 780 exponential operations are reduced, thereby reducing hardware resource cost. Given that linear tracking and depth assignment are performed in columns, edge graph informationneeds to be converted from rows to columns. In this design, SDRAM burst characteristics are used to complete row-column conversion and save system hardware resources. The hardware implementation scheme is designed with VERILOG-HDL, a hardware description language. Result The study selects two typical images, including buildings and people, and verifies the algorithm based on the Altera DE2-115 FPGA platform to verify the feasibility of the hardware implementation method. The verification method is as follows:First, the design with VERILOG-HDL is simulated with QUARTUS-Ⅱ.A grayscale picture with a size of 1 024×768 pixels is downloaded to FPGA through the serial port, and the depth map is estimated by FPGA. The data are later sent to the PC terminal through a serial port, and the depth map is drawn by MATLAB. Simulation and verification results show that the proposed hardware implementation method can extract the depth of 2D images correctly, and the estimated frame rate is up to 33.18 fps at 100 MHz clock frequency.Finally, the hardware processing effect is compared with the software processing effect of this method and the typical motion estimation algorithm, and the peak signal-to-noise ratio(PSNR) after image processing is calculated. Experimental results show that the PSNR of the three methods for the building picture is 13.147, 13.028, and 13.208 4 and that the PSNR of the three methods for the character image is 11.072 8, 10.94, and 10.980 4. Thus, the proposed algorithm is more effective than the motion estimation method, and the hardware processing method can achieve the depth of 2D image extraction, which is consistent with the software processing. Conclusion The proposed hardware implementation can complete the depth information extraction of an image. The estimated frame rate is larger than the real-time requirements in 3D video applications, such as 24 fram/s in 3DTV, and has good real-time performance and portability, thereby establishing a foundation for video information processing. However, similar to methods based on other typical algorithms such as motion estimation, the edge of the extracted depth map in this study remains sharp with burr. Future works can consider the use of a digital filter to smooth the depth map and improve quality.

Key words

relative height depth clue; Sobel; linear tracing; five-stage pipeline; depth map

0 引言

随着3DTV与VR的商业化和产业化进程加快，人们对数字视频的要求进一步提高，传统的数字2D视频已无法满足人们对立体逼真感的强烈需求^[1]。但是由于这些技术发展不久，内容供应商无法在短时间内提供大量的3D视频来满足用户需求，致使3D内容制作成为了上述技术发展的瓶颈^[2-4]。

目前3D内容制作常采用两种方式：1)主动视觉方式^[5]，该方式主要通过3维设备(如：深度摄像机^[6]和3维扫描仪等)直接捕捉现实场景中的3维信息作为立体素材。这类方法能较为准确的获取场景的3维信息，但设备复杂、价格昂贵且对场景的要求较高。2)被动视觉方式^[7]，该方式是采用估计的方法计算场景中的3维信息，获取一个近似的深度值信息，2D转3D技术就是被动视觉方式的主要代表。2D转3D技术不仅可以在不增加传输带宽的情况下在终端完成高质量的视频转换，且其成本也完全低于直接的3D拍摄，同时也缓解了立体显示素材不足的尴尬境地。

在2D转3D技术中，能否获取准确的图像深度信息是3D视频转换成败的关键^[8]。现有的深度估计方法很多，主要包括：相对高度深度线索^[9]、运动估计^[10]、几何透视^[11]、大气散射^[12]、图案纹理、统计模式等^[13]，由于算法复杂度较高，不利于采用硬件实现。本文在考虑到深度估计效果的基础上，采用基于相对高度深度线索的深度估计算法对图像进行深度信息的提取，并通过等效处理将算法中大量的乘法、除法以及指数运算简化成加法、减法和比较运算，以此达到在硬件平台上实现该算法的目的，为2D转3D技术提供较高质量的深度信息。

1 算法框架

相对高度深度线索是指拍摄设备距离物体越远，其深度值越小；距离越近，其深度值越大^[14]。基于相对高度深度线索的深度估计算法可以描述如下：将彩色图经过灰度转换得到灰度图，然后利用Sobel算子检测出灰度图的边缘并利用线型追踪算法得到线轨迹图，最后对线轨迹图进行深度赋值得到深度图。如图 1所示，是基于相对高度深度线索的深度估计算法的整体流程图。鉴于本文主要是深度估计算法的硬件实现，在此以灰度图作为输入图像来进行后面的算法处理。

图 1 基于相对高度深度线索的深度估计算法框图

Fig. 1 A block diagram of the depth estimation algorithm based on relative height depth cues

1.1 边缘检测

边缘检测是对灰度图中亮暗变化明显的位置进行检测标记^[15]。检测标记一般是采用检测算子实现，目前常见的检测算子有：Sobel算子^[16]、Prewitt算子^[17]、Canny算子^[18]、Laplacian算子^[19]等，其中Prewitt算子对边缘处理较为模糊，Canny和Laplacian算子是二阶算子不利于硬件实现，因此综合考虑到检测效果和实现难易程度，本文采用Sobel算子来进行边缘检测。Sobel算子包含水平和垂直方向两组3×3的检测模板${\mathit{\boldsymbol{G}}_x}$、${\mathit{\boldsymbol{G}}_y}$，将其与图像模板作平面卷积，即可分别得到水平和垂直方向的亮度差分近似值，最后即可得到总梯度值。其实现过程如下：

水平方向梯度

$ {\mathit{\boldsymbol{G}}_x} \times {\mathit{\boldsymbol{G}}_p} = \left( {{p_{13}} + 2{p_{23}} + {p_{33}}} \right) - \left( {{p_{11}} + 2{p_{21}} + {p_{31}}} \right) $

垂直方向梯度

$ {\mathit{\boldsymbol{G}}_y} \times {\mathit{\boldsymbol{G}}_p} = \left( {{p_{31}} + 2{p_{32}} + {p_{33}}} \right) - \left( {{p_{11}} + 2{p_{12}} + {p_{13}}} \right) $

总梯度$\mathit{\boldsymbol{G}} = \sqrt {{{\left( {{\mathit{\boldsymbol{G}}_x} \times {\mathit{\boldsymbol{G}}_p}} \right)}^2} + {{\left( {{\mathit{\boldsymbol{G}}_y} \times {\mathit{\boldsymbol{G}}_p}} \right)}^2}} $，为简化运算，总梯度$\mathit{\boldsymbol{G}}$可以表示为$\mathit{\boldsymbol{G'}} = |{\mathit{\boldsymbol{G}}_x} \times {\mathit{\boldsymbol{G}}_p}| + |{\mathit{\boldsymbol{G}}_y} \times {\mathit{\boldsymbol{G}}_p}|$。

图像模板窗

$ {\mathit{\boldsymbol{G}}_p} = \left[{\begin{array}{*{20}{c}} {{p_{11}}}&{{p_{12}}}&{{p_{13}}}\\ {{p_{21}}}&{{p_{22}}}&{{p_{23}}}\\ {{p_{31}}}&{{p_{32}}}&{{p_{33}}} \end{array}} \right] $

水平方向算子

$ {\mathit{\boldsymbol{G}}_x} = \left[{\begin{array}{*{20}{c}} {-1}&0&{ + 1}\\ {-2}&0&{ + 2}\\ {-1}&0&{ + 1} \end{array}} \right] $

垂直方向算子

$ {\mathit{\boldsymbol{G}}_y} = \left[{\begin{array}{*{20}{c}} {-1}&{-2}&{-1}\\ 0&0&0\\ { + 1}&{ + 2}&{ + 1} \end{array}} \right] $

经过计算得到梯度值$\mathit{\boldsymbol{G}}$，然后需要对边缘图中每一个像素点进行赋值。若梯度值$\mathit{\boldsymbol{G}}$大于给定阈值${\mathit{\boldsymbol{G}}_t}$，则将边缘值赋值为255，反之，赋值为0。该阈值${\mathit{\boldsymbol{G}}_t}$不是固定值，通过实验取17。此外，由于Sobel算子模板是3×3的检测模板，并且边缘检测主要是标记出物体的边界，对图像边界即四周不进行平面卷积运算不会影响边缘图的最终效果，因此本文不做处理。

1.2 线轨迹追踪及深度赋值

线轨迹追踪是从边缘图中提取深度信息的关键，线性追踪方法是实现线轨迹追踪的核心。线性追踪是在边缘图中按“明显边缘”从左边界到右边界依次进行追踪，得到线轨迹图。根据文献[9]追踪过程需要满足如下5个准则:1)线轨迹数是常数，且大于5；2)线轨迹不能相互交叉；3)任何一点的线轨迹斜率不能无穷大；4)线轨迹必须从左边界到右边界延伸；5)被线轨迹分割的有限区域赋一个常数深度值。具体追踪过程是：首先在边缘图的左边界等间距选取$n$个像素点作为初始像素点，根据准则1可知，$n$至少大于5，考虑到计算工作量，$n$一般小于80。然后根据其他准则，从左边界开始依次向右边界追踪，直到遍历到右边界为止，这样就可以得到线轨迹图，此时边缘图被$n$条线轨迹分成$n + 1$个有限区域，如图 2所示。

图 2 线性追踪工作流程示意图

Fig. 2 schematic view of the linear tracking workflow

为得到最优线性追踪器，采用“能量函数”进行设计，而“能量函数”可以通过边缘追踪约束、平滑约束、弹性约束进行建模，具体公式为

$ {E_{lt}}\left( {x, y} \right) = {\rm{exp}}( - edge\left( {x, y} \right)/a) $

(1)

$ {E_s}\left( {x, y} \right) = {d_s}\left( {x, y} \right)/b $

(2)

$ {E_e}\left( {x, y} \right) = {d_e}\left( {x, y} \right)/c $

(3)

式中，参数$a$、$b$、$c$依赖于输入图像特征。${E_{lt}}\left( {x, y} \right)$是边缘追踪约束，用以得到“明显的边缘”。$edge\left( {x, y} \right)$是$\left( {x, y} \right)$像素点的边缘值，$a$是其控制参数；${E_s}\left( {x, y} \right)$是平滑约束，用以限制垂直方向上突发畸变。${d_s}\left( {x, y} \right)$表示在$\left( {x, y} \right)$像素点处当前像素与候选像素的垂直距离，$b$是其控制参数；${E_e}\left( {x, y} \right)$是弹性约束，用以限制在一条轨迹线上严重的垂直变化，以此避免当前垂直位置远离左边界开始的垂直位置。${d_e}\left( {x, y} \right)$表示左边界开始位置距离候选像素点的垂直距离，$c$是其控制参数。

最后根确定出下一个线轨迹像素点的位置，即

$ \begin{array}{l} y\prime = {\rm{argmi}}{{\rm{n}}_y}\left\{ {\alpha {E_{lt}}\left( {x, y} \right) + \beta {E_s}\left( {x, y} \right) + } \right.\\ \left. {\gamma {E_e}\left( {x, y} \right)} \right\} \end{array} $

(4)

式(4)是上述3个约束条件的加权和的最小值。式中$\alpha $、$\beta $、$\gamma $是加权系数。通过式(4)完成线轨迹追踪后，根据准则5)可知，需要对被线轨迹分隔的有限区域进行深度赋值。深度赋值是根据“上远下近”模型对线轨迹图进行深度赋值。

如图 3所示，下方曲线表示第$i$条线轨迹，上方曲线表示第$i + 1$条线轨迹，深度赋值是针对二者之间的阴影区域进行赋值，赋值大小为：$255 - {d_{{\rm{depth}}}} \times i$，式中${d_{{\rm{depth}}}} = {\rm{Int}}\left[{\frac{{255}}{n}} \right]$，Int[]为取整函数。深度赋值严格从线轨迹图下方开始，最低是0深度，在最高的追踪线赋最大深度值。

图 3 深度赋值示意图

Fig. 3 schematic view of a depth assignment

2 硬件实现

2.1 Sobel算子的硬件实现

Sobel算子包含水平和垂直方向两组3×3的检测模板，每次计算需要3×3的数据，而灰度图是按行读入，所以需要将数据进行缓存以此组合成So-bel模板所需要的数据模板。该模块采用五级流水实现，具体工作流程如图 4所示。

图 4 Sobel数据流水线图

Fig. 4 The pipeline map of sobel data

在T1阶段从内存中分别加载灰度图数据到3个32位移位寄存器中。T2阶段在同一移位信号驱动下，将3个32位移位寄存器的数据加载到3×3模板移位寄存器中。T3阶段将该模板数据和Sobel检测算子进行运算得到水平和垂直方向的梯度值。T4阶段对T3阶段得到的计算结果求绝对值。T5阶段将计算结果暂存到32位待传总线的暂存寄存器中。

由于是五级流水，所以在第5个处理阶段时第一个字节数据进入32位待传总线的暂存寄存器中，再经过3个处理过程，32位暂存寄存器刚好有32位有效数据，此时流水线启动结束，即将进入流水稳定阶段，流水线启动需要8个处理阶段。流水线进入稳定阶段后，每4个处理过程出一个有效的32位数据。当检测到无有效原始图像数据时，此时需要控制不要加载原始数据，流水线进入导出阶段，即将内部32位数据处理4个过程即可导出流水。

最后将计算结果和阈值比较，根据比较结果将该像素值设为0或255，如图 5所示。

图 5 Sobel算法框图

Fig. 5 Sobel algorithm block diagram

2.2 深度图估计的硬件实现

深度图估计包含线型追踪和深度赋值两部分功能。该模块将按列输入的边缘值根据式(4)求出每列的线轨迹点，并将当前追踪过的列按线轨迹点的地址区间进行深度赋值运算，最后得到深度图。由于式(4)涉及大量的除法和指数运算，不利于硬件实现，因此做如下等效处理。式(1)中边缘值$edge\left( {x, y} \right)$的最小值和最大值分别为0和255，则 ${E_{lt}}$$\left( {x, y} \right)$的取值区间为(0, 1]，因此可用一次函数近似该指数函数以减少硬件资源的消耗。

$ {{E'}_{lt}}\left( {x, y} \right){\rm{ }} = {\rm{ }}1 - edge\left( {x, y} \right)/256 $

(5)

把式(2)(3)(5)代入式(4)，且根据文献[9]可知，一般加权系数$\alpha $: $\beta $ : $\gamma $=4 : 3 : 3且b=c= ${H_i}$ /4，式中${H_i}$是图像的高度，本文处理的图像大小是1 024×768，所以b=c=192，则式(4)变为

$ \begin{array}{l} y'' = {\rm{arg}}\;\mathop {{\rm{min}}}\limits_y \left\{ {\alpha {{E'}_{lt}}\left( {x, y} \right) + \beta {E_s}\left( {x, y} \right){\rm{ }} + } \right.\\ \left. {\gamma {E_e}\left( {x, y} \right)} \right\} = {\rm{arg}}\;\mathop {{\rm{min}}}\limits_y \left\{ {4 \times \left( {1 - edge\left( {x, y} \right)/256} \right) + } \right.\\ \left. {3 \times \left( {{d_s}\left( {x, y} \right)/192} \right) + 3 \times \left( {{d_e}\left( {x, y} \right)/192} \right)} \right\} = \\ {\rm{arg}}\;\mathop {{\rm{min}}}\limits_y \left\{ {256 - edge\left( {x, y} \right) + } \right.\\ \left. {{d_s}\left( {x, y} \right) + {d_e}\left( {x, y} \right)} \right\} \end{array} $

(6)

式(6)为线性追踪硬件实现的最终表达式，其只有加法和比较运算，十分适合硬件实现，算法优化前后对比结果如表 1所示。

表 1 优化前后运算量对比
Table 1 Optimized before and after calculation

下载CSV

	运算类别/千次
	加法	乘法	除法	指数	比较
原算法	1 571	2 357	2 357	786	785
改进算法	2 357	0	0	0	785

本文的线轨迹数$n$取值为48，则将图像等分为49个有限区域，相邻线轨迹间深度值间隔$ {d_{{\rm{depth}}}} = {\rm{Int}}\left[{\frac{{255}}{{48}}} \right]$ =5。图 6是深度图估计的实现框图。

图 6 深度图估计算法框图

Fig. 6 The Block diagram of depth map estimation algorithm

根据式(6)并行求出48个当前坐标点在下一列上对应的匹配点，并把求取的匹配坐标点按地址区间进行深度赋值。以其中一个当前坐标点为例，具体流程如图 7所示。1)首先输入下一列的第一个边缘值，计算出对应的${{y''}_1}$，将该值缓存至当前结果暂存器中，并记录其在BLOCK RAM中的地址；2)输入当前边缘值的下一个边缘值，计算出对应的${{y''}_2}$，比较${{y''}_1}$与${{y''}_2}$的大小，缓存较小的值到当前结果暂存器中，并记录其在BLOCK RAM中的地址；3)判断当前列的边缘值是否计算结束，若结束则开始计算图像下一列边缘值，否则继续计算当前列，直到该列计算结束；4)判断图像最后一列是否计算结束，若结束则整幅图像线轨迹追踪完成，否则继续重复1)2)3)过程。48个坐标点同时执行上述流程，找到对应的匹配点。根据匹配点坐标进行深度赋值，具体操作是：通过判断BLOCK RAM的写地址在哪个地址区间；然后在该地址区间写入对应的深度值。逐列找到匹配点并完成深度赋值，直到遍历完图像的全部列即完成深度图估计。

图 7 线轨迹追踪算法流程图

Fig. 7 Line trajectory tracking algorithm flow chart

2.3 数据缓存及行列转换的硬件实现

由于FPGA内部EBR资源有限，需要将图像数据缓存在外部存储器中，本文采用32 MB×32 bit的SDRAM作为外部存储器。首先从SDRAM中读取3行连续图像数据分别缓存到3个不同的BLOCK RAM中，以此为Sobel边缘检测提供正确的数据，如图 8所示。

图 8 Sobel算法数据缓存框图

Fig. 8 Sobel algorithm block diagram of data cache

经过Sobel边缘检测后的边缘图是逐行得到的，而深度图估计是按列进行，因此需要进行行列转换。

如图 9所示，Sobel边缘检测后的边缘图第1、第2、第3、第4行数据分别缓存到row_RAM0、row_RAM1、row_RAM2、row_RAM3中，由于SDRAM采用突发长度为4的方式工作，所以同时从4个row_RAM读取一个数据(图中row_RAM阴影部分)并按低8位、次8位、中8位、高8位为一组(图中黄色框表示4个数据的低8位)重新组合成4个32 bit数据构成一次突发长度，突发写入SDRAM中。同样的操作，将row_RAM中所有数据依次行列转换后写入SDRAM中。与此同时，Sobel边缘检测后的边缘图数据又写入到row_RAM中。由于图像共768行，每次操作4行，所以需要操作192组。

图 9 行列转换框图

Fig. 9 The diagram of row transformed column

SDRAM数据读取是一次突发4个连续数据，此时将连续读出4列边缘图数据(图中SDRAM红框所示)，将其分别存入4个col_RAM中(图中col_RAM红框所示)，第2次突发读取第二组边缘图前四列数据(图中SDRAM蓝框所示)，同样分别存入4个col_RAM中(图中col_RAM蓝框所示)，以此类推，读出边缘图的前四列并分别写入到4个col_RAM中，每一个col_RAM单独存储一列边缘图数据。需要注意的是，由于每组存了4行数据，则每组占的地址空间是4×256=1 024，突发读时基地址为${A_0}$，每次突发基地址加1 024，直到读取完边缘图前4列。然后基地址${A_0}$+4，重复上述过程，直到整幅图像读取完毕，然后基地址重新复位。

3 验证分析

本文硬件验证平台采用Altera DE2-115开发板，其具有2个16×64 MB SDRAM、3 888 Kbit的EBR、432 M9K内存模块、114 480个LE和4个PLL等丰富资源。

如图 10所示，本文验证流程是：1)通过串口下载一幅1 024×768的灰度图到SDRAM的Bank0中；2)进行Sobel边缘检测；3)将边缘图进行行列转换，并将转换后的数据存入SDRAM的Bank1中；4)将Bank1中数据进行深度图估计；5)将估计后的数据写入SDRAM的Bank3中；6)通过串口将数据发送给PC端；7)通过matlab绘制出深度图。

图 10 硬件验证框图

Fig. 10 A block diagram of hardware verification

3.1 资源消耗及仿真结果

要实现上述验证流程，需要用BLOCK RAM作为数据缓存，表 2是各模块BLOCK RAM的消耗情况，表 3是系统的资源消耗情况。

表 2 BLOCK RAM消耗
Table 2 BLOCK RAM consumption

下载CSV

模块名	RAM大小	片数	消耗/kbit
边缘检测前数据准备	32 bit×256	3	24
边缘检测后数据缓存	32 bit×256	4	32
线性追踪前数据准备	32 bit×192	4	24
线性追踪前数据准备buffer	8 bit×768	4	24
深度赋值	8 bit×768	1	6
总计			110

表 3 硬件资源消耗
Table 3 Hardware resource consumption

下载CSV

资源	资源消耗	总资源	占比/%
逻辑单元数	17 380	114 480	15
总引脚数	56	529	11
存储单元数/bits	122 880	3 981 312	3
锁相环数	1	4	25

图 11是Sobel边缘检测算法的仿真图，由于采用五级流水操作，所以在第6拍开始出数据，再通过移位操作将计算后的数据缓存到一个32位的暂存器中，最后通过输出使能将数据写入到BLOCK RAM中。

图 11 Sobel算法仿真图

Fig. 11 The diagram of sobel algorithm simulation

图 12(a)(b)是第1条线轨迹的仿真图，48条线轨迹是并行计算完成，其他线轨迹同此。图像按列读入进行计算，得到当前值，然后将该值和预设的最小值进行比较，若较小则替换该最小值；反之，最小值保持不变，如图 12(a)所示。遍历图像整列后，即得到整列的最小值，如图 12(b)所示。通过仿真，在100 MHz时钟频率下，Sobel边缘检测每100 ns处理4个像素点，则一幅1 024×768的图片完成Sobel边缘检测需要[(1 024×768)/4]×100 ns=19 660 800 ns。将Sobel边缘检测后的数据从SDRAM中读出进行深度图估计，每处理一列需要10 240 ns，那么处理一幅1 024×768像素分辨率的图像则需要1 023×10 240 ns=10 475 520 ns。推算出本文算法的帧率为33.18帧/s，大于3DTV等3维视频应用中实时要求的24帧/s^[20]，为后期处理视频信息奠定了基础。

图 12 线性追踪算法仿真图

Fig. 12 The simulation chart of linear tracking algorithm

((a)single line trajectory minimum calculation chart; (b)the minimum value of a single line)

3.2 实验结果分析

为了验证本文算法的有效性，选取有代表性的建筑物和人物图像各1幅分别通过本文算法的软件和硬件进行深度信息的提取，并与基于运动估计的深度图像进行对比分析，结果如图 13所示。

图 13 实际处理效果

Fig. 13 The actual effect of processing

((a) original image; (b)depth image based on motion estimation; (c) software implementation diagram; (d) hardware implementation diagram)

通过对比图 13(b)(c)(f)(g)的实际处理效果，可以直观看出基于运动估计的深度图边缘相较于本文算法较尖锐，毛刺较多，图像粗糙，而本文算法的硬件实现对图像的深度图提取效果介于上述算法实现效果之间。为衡量硬件处理效果，采用峰值信噪比( $PSNR$ )进行定量计算比较，计算公式

$ {\rm{PSNR}} = 10 \times {\rm{log}}\left( {\frac{{{{255}^2}}}{{{\rm{MSE}}}}} \right) $

(7)

$ {\rm{MSE}} = \frac{1}{{W \times H}}\sum\limits_x^W {\sum\limits_y^H {{{\left[{f\left( {x, y} \right)-f{\rm{ }}\prime \left( {x, y} \right)} \right]}^2}} } $

(8)

式中, MSE是均方差，$f\left( {x, y} \right)$、$f'\left( {x, y} \right)$分别是参考图像和待测图像在像素点$\left( {x, y} \right)$处的像素值，$W$、$H$是图像的长、宽。

如表 4所示，是两组图像的处理结果。实验结果表明，基于运动估计的深度图与本文算法处理效果相当，但其峰值信噪比略低于本文算法。本文算法软件实现与硬件实现相比，处理后的图像质量基本一致，在误差范围内硬件实现可以较好的提取图像的深度信息，达到预期的处理效果。

表 4 深度图的PSNR对比
Table 4 Comparison of PSNR of depth map

下载CSV

/dB
图像	基于运动估计的深度图	本文算法软件实现	本文算法硬件实现
图 1	13.028	13.147	13.208 4
图 2	10.94	11.072 8	10.980 4

4 结论

针对2D图像转3D的深度信息提取，本文采取相对深度高度线索的方法，通过等效处理简化“能量函数”的计算，采用Verilog硬件描述语言进行算法实现，最后通过在FPGA开发板上搭建完整的验证平台完成算法验证，验证结果表明硬件设计可以实现对2D图像的深度信息提取，估算帧率可达到33.18帧/s。其中，通过Sobel五级流水设计和线轨迹并行计算提高了系统的处理效率，使得本文算法硬件设计可应用在实时高清2D转3D系统中。与基于运动估计的方法一样，本文提取的深度图仍存在边缘较尖锐、有毛刺等不足，后续工作将结合数字滤波对深度图进行平滑处理，提高深度图的质量。

参考文献

[1] Meesters L M J, Ijsselsteijn W A, Seuntiens P J H. A survey of perceptual evaluations and requirements of three-dimensional TV[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2004, 14(3): 381–391. [DOI:10.1109/TCSVT.2004.823398]

[2] Feng Y, Ren J C, Jiang J M. Object-Based 2D-to-3D video conversion for effective stereoscopic content generation in 3D-TV applications[J]. IEEE Transactions on Broadcasting, 2011, 57(2): 500–509. [DOI:10.1109/TBC.2011.2131030]

[3] Zhang Z Y, An P, Zhang Z J, et al. Overview on key technologies and application trends of 3DTV[J]. Video Engineering, 2010, 34(6): 4–6, 22. [张兆杨, 安平, 张之江, 等. 发展3DTV需解决的技术及其应用趋势[J]. 电视技术, 2010, 34(6): 4–6, 22. ] [DOI:10.16280/j.videoe.2010.06.009]

[4] Liu Y. The dilemma and outlet of VR commercial application from the perspective of communication studies[J]. Radio & TV Journal, 2017(2): 5–7. [刘羽. 从传播学视角看VR商业应用的困境与出路[J]. 视听, 2017(2): 5–7. ] [DOI:10.19395/j.cnki.1674-246x.2017.02.001]

[5] Lidegaard M, Larsen R F, Kraft D, et al. Enhanced 3D face processing using an active vision system[C]//Proceedings of the 9th International Conference on Computer Vision Theory and Applications. Lisbon, Portugal: Scitepress, 2014: 466-473. [DOI:10.5220/0004667904660473]

[6] Kim J, Kim T, Kim W J, et al. Depth camera for3DTV applications[C]//Proceedings of SPIE 7237, Stereoscopic Displays and Applications XX. Bellingham, USA: SPIE, 2009: 72371I. [DOI:10.1117/12.806941]

[7] Fang W, He B W. Automatic view planning for 3D reconstruction and occlusion handling based on the integration of active and passive vision[C]//Proceedings of 2012 IEEE International Symposium on Industrial Electronics. Hangzhou: IEEE, 2012: 1116-1121. [DOI:10.1109/ISIE.2012.6237245]

[8] Patil S, Charles P. Review on 2D-to-3D image and video conversion methods[C]//Proceeding of 2015 International Conference on Computing Communication Control and Automation. Pune, India: IEEE, 2015: 728-732. [DOI:10.1109/ICCUBEA.2015.192]

[9] Yong J J, Baik A, Kim J, et al. A novel 2D-to-3D conversion technique based on relative height depth cue[C]//Proceedings of SPIE Volume 7237, Stereoscopic Displays and Applications XX. San Jose, California, United States: SPIE, 2009: 72371U. [DOI:10.1117/12.806058]

[10] Pei Q, Liu G F, Xu M Q. Motion estimation based on block feature classification[J]. Journal of Image and Graphics, 2011, 16(6): 933–938. [裴琴, 刘国繁, 徐美清. 基于块特征分类的运动估计算法[J]. 中国图象图形学报, 2011, 16(6): 933–938. ] [DOI:10.11834/jig.20110604]

[11] Liu T L, Mo Y M, Xu G B, et al. Depth estimation ofmonocular video using non-parametric fusionof multiple cues[J]. Journal of Southeast University:Natural Science Edition, 2015, 45(5): 834–839. [刘天亮, 莫一鸣, 徐高帮, 等. 多线索非参数化融合的单目视频深度估计[J]. 东南大学学报:自然科学版, 2015, 45(5): 834–839. ] [DOI:10.3969/j.issn.1001-0505.2015.05.004]

[12] Kuo T Y, Lo Y C. Depth estimation from a monocular view of the outdoors[J]. IEEE Transactions on Consumer Electronics, 2011, 57(2): 817–822. [DOI:10.1109/TCE.2011.5955227]

[13] Huang D D, Zhang Y X, Shi J H. 2D to 3D video depth filter based on motion estimation[J]. Journal of Xiamen University:Natural Science, 2013, 52(4): 473–478. [黄冬冬, 张贻雄, 石江宏. 基于运动估计的2D转3D视频深度滤波[J]. 厦门大学学报:自然科学版, 2013, 52(4): 473–478. ] [DOI:10.6043/j.issn.0438-0479.2013.04.008]

[14] Tai G Q. Research on real-time High definition 2D to 3D conversion system based on FPGA[D]. Chongqing: Chongqing University, 2013. [邰国钦. 基于FPGA的实时高清2D转3D系统研究[D]. 重庆: 重庆大学, 2013.]

[15] Cheng C C, Li C T, Chen L G. A 2D-to-3D conversion system using edge information[C]//Proceedings of 2010 Digest of Technical Papers International Conference on Consumer Electronics. Las Vegas, NV, USA: IEEE, 2010: 377-378. [DOI:10.1109/ICCE.2010.5418746]

[16] Gao W S, Zhang X G, Yang L, et al. An improved Sobel edge detection[C]//Proceedings of 20103rd International Conference on Computer Science and Information Technology. Chengdu, China: IEEE, 2010: 67-71. [DOI:10.1109/ICCSIT.2010.5563693]

[17] Seif A, Salut M M, Marsono M N. A hardware architecture of Prewitt edge detection[C]//Proceedings of 2010 IEEE Conference on Sustainable Utilization and Development in Engineering and Technology. Petaling Jaya, Malaysia: IEEE, 2010: 99-101. [DOI:10.1109/STUDENT.2010.5686999]

[18] Kim J, Lee S. Extracting major lines by recruiting zero-threshold Canny edge links along Sobel highlights[J]. IEEE Signal Processing Letters, 2015, 22(10): 1689–1692. [DOI:10.1109/LSP.2015.2400211]

[19] Wang X. Laplacian operator-based edge detectors[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2007, 29(5): 886–890. [DOI:10.1109/TPAMI.2007.1027]

[20] Li J H, An P, Zhang Z Y, et al. FPGA-based real-time depth estimation for 3D video system[J]. Journal of Optoelectronics·Laser, 2014(5): 974–980. [李贺建, 安平, 张兆杨, 等. 基于FPGA的三维视频系统实时深度估计[J]. 光电子·激光, 2014(5): 974–980. ] [DOI:10.16136/j.joel.2014.05.027]