图像深度估计硬件实现算法
Hardware implementation algorithm of image depth estimation
- 2018年23卷第3期 页码:362-371
收稿:2017-08-17,
修回:2017-11-8,
纸质出版:2018-03-16
DOI: 10.11834/jig.170436
移动端阅览

浏览全部资源
扫码关注微信
收稿:2017-08-17,
修回:2017-11-8,
纸质出版:2018-03-16
移动端阅览
目的
2
近年来,3DTV(3-dimension television)与VR(virtual reality)技术迅速发展,但3D内容的短缺却成为该类技术发展的瓶颈。为快速提供更多的3D内容,需将现有的2D视频转换为3D视频。深度估计是2D转3D技术的关键,为满足转换过程中实时性较高的要求,本文提出基于相对高度深度线索方法的硬件实现方案。
方法
2
首先对灰度图进行Sobel边缘检测得到边缘图,然后对其进行线性追踪以及深度赋值完成深度估计得到深度图。在硬件实现方案中,Sobel边缘检测采用五级流水设计以及并行线轨迹计算方式,充分利用硬件设计的并行性,以提高系统的处理效率;在深度估计中通过等效处理简化“能量函数”的方式将算法中大量的乘法、除法以及指数运算简化成加法、减法和比较运算,以减小硬件资源开销;同时方案设计中巧妙借助SDRAM(synchronous dynamic random access memory)突发特性完成行列转换,节省系统硬件资源。
结果
2
最后完成了算法的FPGA(field programmable gate array)实现,并选取了2幅图像进行深度信息提取。将本文方法的软硬件处理效果与基于运动估计的深度图提取方法进行对比,结果表明本文算法相较于运动估计方法对图像深度图提取效果更好,同时硬件处理可以实现对2D图像的深度信息提取,且具有和软件处理一致的效果。在100 MHz的时钟频率下,估算帧率可达33.18帧/s。
结论
2
本文提出的硬件实现方案可以完成对单幅图像的深度信息提取且估算帧率远大于3DTV等3维视频应用中实时要求的24帧/s,具有很好的实时性和可移植性,为后期的视频信息处理奠定了基础。
Objective
2
In recent years
3D television (3DTV) and virtual reality technology have developed rapidly
but the shortage of 3D resources has become the bottleneck of this technology development. Existing 2D videos must be converted to 3D videos to provide more 3D resources quickly. Depth estimation is the key step of 2D to 3D technology. Hardware implementation is one of the effective methods to meet the requirements of real-time conversion process. Most depth estimation algorithms make hardware implementation highly complex. Considering the depth estimation effect and easy implementation
this study proposes a hardware implementation scheme based on relative height and depth cue method to realize high-speed processing and hardware resource saving.
Method
2
For the algorithm level
a color image is first converted to grayscale
and the edge graph is obtained by Sobel edge detection of a grayscale image. Line trace is obtained by a line tracing algorithm
and the depth map is obtained by the depth assignment of the line trajectory.In hardware implementation
Sobel edge detection uses a five-stage pipeline design and parallel trajectory calculation to maximize the parallelism of hardware design to improve system efficiency. In the depth estimation
energy function is simplified by equivalent processing. Thus
a large number of multiplication
division
and exponential operations are replaced by addition
subtraction
and comparison operations. More than 2 300 multiplication and division operations and more than 780 exponential operations are reduced
thereby reducing hardware resource cost. Given that linear tracking and depth assignment are performed in columns
edge graph informationneeds to be converted from rows to columns. In this design
SDRAM burst characteristics are used to complete row-column conversion and save system hardware resources. The hardware implementation scheme is designed with VERILOG-HDL
a hardware description language.
Result
2
The study selects two typical images
including buildings and people
and verifies the algorithm based on the Altera DE2-115 FPGA platform to verify the feasibility of the hardware implementation method. The verification method is as follows:First
the design with VERILOG-HDL is simulated with QUARTUS-Ⅱ.A grayscale picture with a size of 1 024×768 pixels is downloaded to FPGA through the serial port
and the depth map is estimated by FPGA. The data are later sent to the PC terminal through a serial port
and the depth map is drawn by MATLAB. Simulation and verification results show that the proposed hardware implementation method can extract the depth of 2D images correctly
and the estimated frame rate is up to 33.18 fps at 100 MHz clock frequency.Finally
the hardware processing effect is compared with the software processing effect of this method and the typical motion estimation algorithm
and the peak signal-to-noise ratio(PSNR) after image processing is calculated. Experimental results show that the PSNR of the three methods for the building picture is 13.147
13.028
and 13.208 4 and that the PSNR of the three methods for the character image is 11.072 8
10.94
and 10.980 4. Thus
the proposed algorithm is more effective than the motion estimation method
and the hardware processing method can achieve the depth of 2D image extraction
which is consistent with the software processing.
Conclusion
2
The proposed hardware implementation can complete the depth information extraction of an image. The estimated frame rate is larger than the real-time requirements in 3D video applications
such as 24 fram/s in 3DTV
and has good real-time performance and portability
thereby establishing a foundation for video information processing. However
similar to methods based on other typical algorithms such as motion estimation
the edge of the extracted depth map in this study remains sharp with burr. Future works can consider the use of a digital filter to smooth the depth map and improve quality.
Meesters L M J, Ijsselsteijn W A, Seuntiens P J H. A survey of perceptual evaluations and requirements of three-dimensional TV[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2004, 14(3):381-391.[DOI:10.1109/TCSVT.2004.823398]
Feng Y, Ren J C, Jiang J M. Object-Based 2D-to-3D video conversion for effective stereoscopic content generation in 3D-TV applications[J]. IEEE Transactions on Broadcasting, 2011, 57(2):500-509.[DOI:10.1109/TBC.2011.2131030]
Zhang Z Y, An P, Zhang Z J, et al. Overview on key technologies and application trends of 3DTV[J]. Video Engineering, 2010, 34(6):4-6, 22.
张兆杨, 安平, 张之江, 等.发展3DTV需解决的技术及其应用趋势[J].电视技术, 2010, 34(6):4-6, 22.[DOI:10.16280/j.videoe.2010.06.009]
Liu Y. The dilemma and outlet of VR commercial application from the perspective of communication studies[J]. Radio&TV Journal, 2017, (2):5-7.
刘羽.从传播学视角看VR商业应用的困境与出路[J].视听, 2017, (2):5-7.[DOI:10.19395/j.cnki.1674-246x.2017.02.001]
Lidegaard M, Larsen R F, Kraft D, et al. Enhanced 3D face processing using an active vision system[C]//Proceedings of the 9th International Conference on Computer Vision Theory and Applications. Lisbon, Portugal: Scitepress, 2014: 466-473. [ DOI:10.5220/0004667904660473 http://dx.doi.org/10.5220/0004667904660473 ]
Kim J, Kim T, Kim W J, et al. Depth camera for3DTV applications[C]//Proceedings of SPIE 7237, Stereoscopic Displays and Applications XX. Bellingham, USA: SPIE, 2009: 72371I. [ DOI:10.1117/12.806941 http://dx.doi.org/10.1117/12.806941 ]
Fang W, He B W. Automatic view planning for 3D reconstruction and occlusion handling based on the integration of active and passive vision[C]//Proceedings of 2012 IEEE International Symposium on Industrial Electronics. Hangzhou: IEEE, 2012: 1116-1121. [ DOI:10.1109/ISIE.2012.6237245 http://dx.doi.org/10.1109/ISIE.2012.6237245 ]
Patil S, Charles P. Review on 2D-to-3D image and video conversion methods[C]//Proceeding of 2015 International Conference on Computing Communication Control and Automation. Pune, India: IEEE, 2015: 728-732. [ DOI:10.1109/ICCUBEA.2015.192 http://dx.doi.org/10.1109/ICCUBEA.2015.192 ]
Yong J J, Baik A, Kim J, et al. A novel 2D-to-3D conversion technique based on relative height depth cue[C]//Proceedings of SPIE Volume 7237, Stereoscopic Displays and Applications XX. San Jose, California, United States: SPIE, 2009: 72371U. [ DOI:10.1117/12.806058 http://dx.doi.org/10.1117/12.806058 ]
Pei Q, Liu G F, Xu M Q. Motion estimation based on block feature classification[J]. Journal of Image and Graphics, 2011, 16(6):933-938.
裴琴, 刘国繁, 徐美清.基于块特征分类的运动估计算法[J].中国图象图形学报, 2011, 16(6):933-938.[DOI:10.11834/jig.20110604]
Liu T L, Mo Y M, Xu G B, et al. Depth estimation ofmonocular video using non-parametric fusionof multiple cues[J]. Journal of Southeast University:Natural Science Edition, 2015, 45(5):834-839.
刘天亮, 莫一鸣, 徐高帮, 等.多线索非参数化融合的单目视频深度估计[J].东南大学学报:自然科学版, 2015, 45(5):834-839.[DOI:10.3969/j.issn.1001-0505.2015.05.004]
Kuo T Y, Lo Y C. Depth estimation from a monocular view of the outdoors[J]. IEEE Transactions on Consumer Electronics, 2011, 57(2):817-822.[DOI:10.1109/TCE.2011.5955227]
Huang D D, Zhang Y X, Shi J H. 2D to 3D video depth filter based on motion estimation[J]. Journal of Xiamen University:Natural Science, 2013, 52(4):473-478.
黄冬冬, 张贻雄, 石江宏.基于运动估计的2D转3D视频深度滤波[J].厦门大学学报:自然科学版, 2013, 52(4):473-478.[DOI:10.6043/j.issn.0438-0479.2013.04.008]
Tai G Q. Research on real-time High definition 2D to 3D conversion system based on FPGA[D]. Chongqing: Chongqing University, 2013.
邰国钦. 基于FPGA的实时高清2D转3D系统研究[D]. 重庆: 重庆大学, 2013.
Cheng C C, Li C T, Chen L G. A 2D-to-3D conversion system using edge information[C]//Proceedings of 2010 Digest of Technical Papers International Conference on Consumer Electronics. LasVegas, NV, USA: IEEE, 2010: 377-378. [ DOI:10.1109/ICCE.2010.5418746 http://dx.doi.org/10.1109/ICCE.2010.5418746 ]
Gao W S, Zhang X G, Yang L, et al. An improved Sobel edge detection[C]//Proceedings of 20103rd International Conference on Computer Science and Information Technology. Chengdu, China: IEEE, 2010: 67-71. [ DOI:10.1109/ICCSIT.2010.5563693 http://dx.doi.org/10.1109/ICCSIT.2010.5563693 ]
Seif A, Salut M M, Marsono M N. A hardware architecture of Prewitt edge detection[C]//Proceedings of 2010 IEEE Conference on Sustainable Utilization and Development in Engineering and Technology. Petaling Jaya, Malaysia: IEEE, 2010: 99-101. [ DOI:10.1109/STUDENT.2010.5686999 http://dx.doi.org/10.1109/STUDENT.2010.5686999 ]
Kim J, Lee S. Extracting major lines by recruiting zero-threshold Canny edge links along Sobel highlights[J]. IEEE Signal Processing Letters, 2015, 22(10):1689-1692.[DOI:10.1109/LSP.2015.2400211]
Wang X. Laplacian operator-based edge detectors[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2007, 29(5):886-890.[DOI:10.1109/TPAMI.2007.1027]
Li J H, An P, Zhang Z Y, et al. FPGA-based real-time depth estimation for 3D video system[J]. Journal of Optoelectronics·Laser, 2014, (5):974-980.
李贺建, 安平, 张兆杨, 等.基于FPGA的三维视频系统实时深度估计[J].光电子·激光, 2014, (5):974-980.[DOI:10.16136/j.joel.2014.05.027]
相关作者
相关机构
京公网安备11010802024621