GPU近实时线性双目立体代价聚合

陈彬; 陈和平; 李晓卉

doi:10.11834/jig.20141010

图像理解和计算机视觉 | 浏览量 : 0 下载量: 394 CSCD: 0

PDF
导出
分享
收藏
专辑

GPU近实时线性双目立体代价聚合
Near real time linear stereo cost aggregation on GPU
2014年19卷第10期页码：1481-1489
网络出版：2014-10-13，

纸质出版：2014
DOI： 10.11834/jig.20141010
稿件说明：

移动端阅览

陈彬, 陈和平, 李晓卉. GPU近实时线性双目立体代价聚合[J]. 中国图象图形学报, 2014,19(10):1481-1489. DOI： 10.11834/jig.20141010.

Chen Bin, Chen Heping, Li Xiaohui. Near real time linear stereo cost aggregation on GPU[J]. Journal of Image and Graphics, 2014, 19(10): 1481-1489. DOI： 10.11834/jig.20141010.

摘要

近年来双目视觉领域的研究重点逐步转而关注其“实时化”策略的研究，而立体代价聚合是双目视觉中最为复杂且最为耗时的步骤，为此，提出一种基于GPU通用计算(GPGPU)技术的近实时双目立体代价聚合算法。选用一种匹配精度接近于全局匹配算法的局部算法——线性立体匹配算法(linear stereo matching)作为代价聚合策略；结合线性代价聚合的原理，对其主要步骤(代价计算、均值滤波及系数求解等)的计算流程进行有针对性地并行优化。对于相同的实验样本，用本文方法在NVIDA GTX780 实验平台上能在更短的时间计算出代价矩阵，与原有的CPU实现方法相比，代价聚合的效率平均有了数十倍的提升。实时双目立体代价聚合方法，为在个人通用PC平台上实时获取高质量双目视觉深度信息提供了一个高效可靠的途径。

Abstract

Stereo vision depends on feasible approaches for real-time/hardware implementation. Cost aggregation

the most complex part of the stereo matching algorithm

substantially affects the overall running time. Therefore

this study proposes a novel parallelization strategy to map the stereo cost aggregation of graphics processing units (GPUs) using compute unified device architecture (CUDA). The linear stereo matching algorithm is selected as the stereo cost aggregation strategy in the proposed approach. Linear stereo matching with constant complexity can achieve more accurate disparity maps than global disparity optimization methods. Although its computation complexity is considerably less than that of most global approaches

linear stereo matching

even when optimized by some effective strategies

remains to demonstrate a performance that exceeds real-time or near real-time requirements for practical applications. The parallelization strategy introduced in this study is based on a separable filter with linear complexity in the filter window size and with proven efficiency on GPU platforms. The computation for each step (cost computation

mean filter

and coefficients computation) of the cost aggregation is reformulated

and the rational use of different types of GPU memory is ensured. This study proposes several parallelization optimizations to increase parallelism degree and data throughput. After being optimized by these parallelization optimizations

our approach ensures that the computation of each CUDA thread is independent of other threads and maximizes parallelism degree. These parallelization optimizations also reduce the complexity of each thread from the exponential relationship to the linear relationship with window radius and further improve the efficiency. The efficiency of the memory access and the data throughput are also dramatically improved in our final implementation

cached by texture or shared memories in certain circumstances. These experimental results show that the proposed strategy is effective and efficient. We dramatically accelerate the stereo cost aggregation on GPUs under the assistance of the outstanding parallel computation performance of GPUs. Compared with the original CPU implementation accelerated by the integral image technology

our CUDA implementation on a specific NVIDIA GTX780 GPU provides

on the same stereo image pairs

accurate cost matrix within a significantly shorter running time (less than 80 ms) and improves the average efficiency by tenfold. Our approach also outperforms other real-time or near real-time stereo cost aggregation implementations on GPUs. The proposed approach outperforms the previous constant time stereo solutions and produces accurate results comparable with those of adaptive weight aggregation on GPUs with CUDA. It also provides an efficient and feasible method to obtain an accurate disparity map on general PC platforms in real time.