Current Issue Cover
GPU近实时线性双目立体代价聚合

陈彬1, 陈和平2, 李晓卉1(1.武汉科技大学信息科学与工程学院, 武汉 430081;2.武汉科技大学计算机科学与技术学院, 武汉 430074)

摘 要
目的 近年来双目视觉领域的研究重点逐步转而关注其“实时化”策略的研究,而立体代价聚合是双目视觉中最为复杂且最为耗时的步骤,为此,提出一种基于GPU通用计算(GPGPU)技术的近实时双目立体代价聚合算法。方法 选用一种匹配精度接近于全局匹配算法的局部算法——线性立体匹配算法(linear stereo matching)作为代价聚合策略;结合线性代价聚合的原理,对其主要步骤(代价计算、均值滤波及系数求解等)的计算流程进行有针对性地并行优化。结果 对于相同的实验样本,用本文方法在NVIDA GTX780 实验平台上能在更短的时间计算出代价矩阵,与原有的CPU实现方法相比,代价聚合的效率平均有了数十倍的提升。结论 实时双目立体代价聚合方法,为在个人通用PC平台上实时获取高质量双目视觉深度信息提供了一个高效可靠的途径。
关键词
Near real time linear stereo cost aggregation on GPU

Chen Bin1, Chen Heping2, Li Xiaohui1(1.School of Information Science and Engineering, Wuhan University of Science and Technology, Wuhan 430081, China;2.School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan 430074, China)

Abstract
Objective Stereo vision depends on feasible approaches for real-time/hardware implementation. Cost aggregation, the most complex part of the stereo matching algorithm, substantially affects the overall running time. Therefore, this study proposes a novel parallelization strategy to map the stereo cost aggregation of graphics processing units (GPUs) using compute unified device architecture (CUDA). Method The linear stereo matching algorithm is selected as the stereo cost aggregation strategy in the proposed approach. Linear stereo matching with constant complexity can achieve more accurate disparity maps than global disparity optimization methods. Although its computation complexity is considerably less than that of most global approaches, linear stereo matching, even when optimized by some effective strategies, remains to demonstrate a performance that exceeds real-time or near real-time requirements for practical applications. The parallelization strategy introduced in this study is based on a separable filter with linear complexity in the filter window size and with proven efficiency on GPU platforms. The computation for each step (cost computation, mean filter, and coefficients computation) of the cost aggregation is reformulated, and the rational use of different types of GPU memory is ensured. This study proposes several parallelization optimizations to increase parallelism degree and data throughput. After being optimized by these parallelization optimizations, our approach ensures that the computation of each CUDA thread is independent of other threads and maximizes parallelism degree. These parallelization optimizations also reduce the complexity of each thread from the exponential relationship to the linear relationship with window radius and further improve the efficiency. The efficiency of the memory access and the data throughput are also dramatically improved in our final implementation, cached by texture or shared memories in certain circumstances. These experimental results show that the proposed strategy is effective and efficient. Result We dramatically accelerate the stereo cost aggregation on GPUs under the assistance of the outstanding parallel computation performance of GPUs. Compared with the original CPU implementation accelerated by the integral image technology, our CUDA implementation on a specific NVIDIA GTX780 GPU provides, on the same stereo image pairs, accurate cost matrix within a significantly shorter running time (less than 80 ms) and improves the average efficiency by tenfold. Our approach also outperforms other real-time or near real-time stereo cost aggregation implementations on GPUs. Conclusion The proposed approach outperforms the previous constant time stereo solutions and produces accurate results comparable with those of adaptive weight aggregation on GPUs with CUDA. It also provides an efficient and feasible method to obtain an accurate disparity map on general PC platforms in real time.
Keywords

订阅号|日报