Chen Bin, Chen Heping, Li Xiaohui. Near real time linear stereo cost aggregation on GPU[J]. Journal of Image and Graphics, 2014, 19(10): 1481-1489. DOI: 10.11834/jig.20141010.
Stereo vision depends on feasible approaches for real-time/hardware implementation. Cost aggregation
the most complex part of the stereo matching algorithm
substantially affects the overall running time. Therefore
this study proposes a novel parallelization strategy to map the stereo cost aggregation of graphics processing units (GPUs) using compute unified device architecture (CUDA). The linear stereo matching algorithm is selected as the stereo cost aggregation strategy in the proposed approach. Linear stereo matching with constant complexity can achieve more accurate disparity maps than global disparity optimization methods. Although its computation complexity is considerably less than that of most global approaches
linear stereo matching
even when optimized by some effective strategies
remains to demonstrate a performance that exceeds real-time or near real-time requirements for practical applications. The parallelization strategy introduced in this study is based on a separable filter with linear complexity in the filter window size and with proven efficiency on GPU platforms. The computation for each step (cost computation
mean filter
and coefficients computation) of the cost aggregation is reformulated
and the rational use of different types of GPU memory is ensured. This study proposes several parallelization optimizations to increase parallelism degree and data throughput. After being optimized by these parallelization optimizations
our approach ensures that the computation of each CUDA thread is independent of other threads and maximizes parallelism degree. These parallelization optimizations also reduce the complexity of each thread from the exponential relationship to the linear relationship with window radius and further improve the efficiency. The efficiency of the memory access and the data throughput are also dramatically improved in our final implementation
cached by texture or shared memories in certain circumstances. These experimental results show that the proposed strategy is effective and efficient. We dramatically accelerate the stereo cost aggregation on GPUs under the assistance of the outstanding parallel computation performance of GPUs. Compared with the original CPU implementation accelerated by the integral image technology
our CUDA implementation on a specific NVIDIA GTX780 GPU provides
on the same stereo image pairs
accurate cost matrix within a significantly shorter running time (less than 80 ms) and improves the average efficiency by tenfold. Our approach also outperforms other real-time or near real-time stereo cost aggregation implementations on GPUs. The proposed approach outperforms the previous constant time stereo solutions and produces accurate results comparable with those of adaptive weight aggregation on GPUs with CUDA. It also provides an efficient and feasible method to obtain an accurate disparity map on general PC platforms in real time.