特征融合的双目半全局匹配算法及其并行加速实现

吕倪祺; 宋广华; 杨波威

doi:10.11834/jig.170157

图像理解和计算机视觉 | 浏览量 : 0 下载量: 4 CSCD: 6

PDF
导出
分享
收藏
专辑

特征融合的双目半全局匹配算法及其并行加速实现
Semi-global stereo matching algorithm based on feature fusion and its CUDA implementation
2018年23卷第6期页码：874-886
收稿：2017-11-10，

修回：2017-12-26，

纸质出版：2018-06-16
DOI： 10.11834/jig.170157
稿件说明：

移动端阅览

吕倪祺, 宋广华, 杨波威. 特征融合的双目半全局匹配算法及其并行加速实现[J]. 中国图象图形学报, 2018,23(6):874-886. DOI： 10.11834/jig.170157.

Niqi Lyu, Guanghua Song, Bowei Yang. Semi-global stereo matching algorithm based on feature fusion and its CUDA implementation[J]. Journal of Image and Graphics, 2018, 23(6): 874-886. DOI： 10.11834/jig.170157.

摘要

目的

在微小飞行器系统中，如何实时获取场景信息是实现自主避障及导航的关键问题。本文提出了一种融合中心平均Census特征与绝对误差（AD）特征、基于纹理优化的半全局立体匹配算法（ADCC-TSGM），并利用统一计算设备架构（CUDA）进行并行加速。

方法

使用沿极线方向的一维差分计算纹理信息，使用中心平均Census特征及AD特征进行代价计算，通过纹理优化的SGM算法聚合代价并获得初始视差图；然后，通过左右一致性检验检查剔除粗略视差图中的不稳定点和遮挡点，使用线性插值和中值滤波对视差图中的空洞进行填充；最后，利用GPU特性，对立体匹配中的代价计算、半全局匹配（SGM）计算、视差计算等步骤使用共享内存、单指令多数据流（SIMD）及混合流水线进行优化以提高运行速度。

结果

在Quarter Video Graphics Array（QVGA）分辨率的middlebury双目图像测试集中，本文提出的ADCC-TSGM算法总坏点率较Semi-Global Block Matching（SGBM）算法降低36.1%，较SGM算法降低28.3%；平均错误率较SGBM算法降低44.5%，较SGM算法降低49.9%。GPU加速实验基于NVIDIA Jetson TK1嵌入式计算平台，在双目匹配性能不变的情况下，通过使用CUDA并行加速，可获得117倍以上加速比，即使相较于已进行SIMD及多核并行优化的SGBM，运行时间也减少了85%。在QVGA分辨率下，GPU加速后的运行帧率可达31.8帧/s。

结论

本文算法及其CUDA加速可为嵌入式平台提供一种实时获取高质量深度信息的有效途径，可作为微小飞行器、小型机器人等设备进行环境感知、视觉定位、地图构建的基础步骤。

Abstract

Objective

In unmanned aerial vehicle systems

estimation of scene information in real time is a key issue in conducting automatic obstacle avoidance and navigation. A binocular stereo vision system is an effective means to obtain scene information; this system simulates the working principle of the human eyes by using two cameras to capture the same sense at the same time and generates a disparity map by using a stereo matching algorithm. In this work

we propose ADCC-TSGM

a novel texture-optimized semi-global stereo matching algorithm based on the fusion of absolute difference (AD) feature and center average census feature. Efforts are made to speed up the algorithm through CUDA parallel acceleration.

Method

First

a one-dimensional difference method is used to calculate the texture information along the epipolar line

the center average census feature and AD feature are exploited to conduct the cost computation

and the global stereo matching algorithm is texture-optimized to aggregate the cost and obtain the initial disparity. Second

left-right consistency check is used to detect unstable pixels and occlusion pixels

and linear interpolation and median filter method are used to fill the holes of the disparity map. Lastly

to improve the running speed

we optimize the code of GPU acceleration for each step of the stereo matching. The time consumption of memory access is considered in the feature calculation of types

such that center average census is much higher than that of computation

and a large number of data-intensive computing tasks are conducted between adjacent threads. Consequently

we divide the dataset of the entire thread block into four regions

copy them into a shared memory

and use the shared memory for computation to reduce the overhead of memory accessing. A single thread can simultaneously handle two consecutive disparity calculations by using SIMD instructions. When the GPU is processing

the CPU is basically idle. Therefore

a hybrid pipeline is designed to fully utilize the computing resources of the embedded platform.

Result

To demonstrate the effectiveness of the proposed algorithm

we use NVIDIA Jetson TK1 developer kit

which has a quad-core ARM Cortex-A15 CPU

a Kepler GPU with 192 CUDA cores

and 2 GB memory

as the embedded computing platform to conduct experiments on Middlebury stereo datasets that have been resized to QVGA resolution. With the actual application scenarios and resolution of images

the maximum disparity for each algorithm is set to 64

and the block matching window size of SGBM and BM is set to 9×9. The texture penalty coefficients

1 and

2 in the proposed algorithm are set to 0.25 and 0.125

respectively. Experimental results show that the total bad-pixel rate and the average error rate of the proposed algorithm are significantly lower than those of BM

SGBM

and SGM

respectively. The total bad-pixel rate of the ADCC-TSGM algorithm is 73.9% lower than that of BM algorithm

36.1% lower than that of SGBM algorithm

and 28.3% lower than that of SGM algorithm. The average error rate of the proposed algorithm is 83.2% lower than that of the BM algorithm

44.5% lower than that of the SGBM algorithm

and 49.9% lower than that of the SGM algorithm. In particular

the use of center average census in feature matching can reduce the bad-pixel and error rates. The texture-based optimization can adaptively increase the penalty coefficient in low-texture regions and reduce the average error rate from 6.62 to 4.84. The post-processing method

including disparity consistency check and hole filling

can reduce the total bad-pixel rate from 14.46 to 7.12. Through GPU parallel acceleration

the CUDA implementation of the proposed algorithm becomes hundreds of times faster than that of pure CPU implementation without any loss in the quality of disparity map. Compared with SGBM

which has been optimized by using SIMD and multi-core parallel method

our proposed algorithm has a running time that is reduced by 85%. For QVGA resolution

the frame processing rate is as high as 31.8 FPS.

Conclusion

The proposed algorithm outperforms existing algorithms

such as BM

SGM

and SGBM

which have been used in industries. The CUDA-accelerated implementation of the proposed algorithm provides an effective and feasible method to obtain high-quality disparity information and can be used as a basic means of environmental perception

visual positioning

and map construction for real-time embedded applications

such as micro-aircraft systems.

关键词

Keywords

references

Scharstein D, Szeliski R. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms[J]. International Journal of Computer Vision, 2002, 47(1-3):7-42.[DOI:10.1023/A:1014573219977]

Kim J, Kolmogorov V, Zabih R. Visual correspondence using energy minimization and mutual information[C]//Proceedings of 2003 Ninth IEEE International Conference on Computer Vision. Nice, France: IEEE, 2003: 1033-1040. [ DOI:10.1109/iccv.2003.1238463 http://dx.doi.org/10.1109/iccv.2003.1238463 ]

Muquit M A, Shibahara T, Aoki T. A high-accuracy passive 3D measurement system using phase-based image matching[J]. IEICE Transactions on fundamentals of Electronics, Communications and Computer Sciences, 2006, E89-A(3):686-697.[DOI:10.1093/ietfec/e89-a.3.686]

Roy S, Cox I J. A maximum-flow formulation of the N-camera stereo correspondence problem[C]//Proceedings of 1998 Sixth International Conference on Computer Vision. Bombay, India: IEEE, 1998: 492-499. [ DOI:10.1109/iccv.1998.710763 http://dx.doi.org/10.1109/iccv.1998.710763 ]

Zhang K, Lu J B, Lafruit G. Cross-based local stereo matching using orthogonal integral images[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2009, 19(7):1073-1079.[DOI:10.1109/tcsvt.2009.2020478]

Mei X, Sun X, Zhou M C, et al. On building an accurate stereo matching system on graphics hardware[C]//Proceedings of 2011 IEEE Inter national Conference on Computer Vision Workshops (ICCV Workshops). Barcelona, Spain: IEEE, 2011: 467-474. [ DOI:10.1109/iccvw.2011.6130280 http://dx.doi.org/10.1109/iccvw.2011.6130280 ]

Žbontar J, LeCun Y. Stereo matching by training a convolutional neural network to compare image patches[J]. Journal of Machine Learning Research, 2016, 17(65)1-32.

Li L C, Yu X, Zhang S L, et al. 3D cost aggregation with multiple minimum spanning trees for stereo matching[J]. Applied Optics, 2017, 56(12):3411-3420.[DOI:10.1364/AO.56.003411]

Hrabar S, Sukhatme G S, Corke P, et al. Combined optic-flow and stereo-based navigation of urban canyons for a UAV[C]//Proceedings of 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems. Edmonton, Canada: IEEE, 2005: 3309-3316. [ DOI:10.1109/iros.2005.1544998 http://dx.doi.org/10.1109/iros.2005.1544998 ]

Stefanik K V, Gassaway J C, Kochersberger K, et al. UAV-based stereo vision for rapid aerial terrain mapping[J]. GIScience&Remote Sensing, 2011, 48(1):24-49.[DOI:10.2747/1548-1603.48.1.24]

Zhou G, Fang L, Tang K T, et al. Guidance: A visual sensing platform for robotic applications[C]//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Boston, USA: IEEE, 2015: 9-14. [ DOI:10.1109/cvprw.2015.7301360 http://dx.doi.org/10.1109/cvprw.2015.7301360 ]

Zhang Z. A flexible new technique for camera calibration[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000, 22(11):1330-1334.[DOI:10.1109/34.888718]

Hirschmuller H. Stereo processing by semiglobal matching and mutual information[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008, 30(2):328-341.[DOI:10.1109/tpami.2007.1166]

Nvidia C. Nvidia Cuda Compute Unified Device Architecture Programming Guide[M]. Santa Clara, USA:NVIDIA Corporation, 2007.

Nvidia C. CUDA C Programming guide[EB/OL]. 2017-10-22[2017-11-17] . http://docs.nvidia.com/cuda/cuda-c-programming-guide http://docs.nvidia.com/cuda/cuda-c-programming-guide .

The CUDA Memory Model. CUDA, Supercomputing for the Masses: Part 4[EB/OL]. 2008-06[2017-03-10] . http://www.drdobbs.com/parallel/cuda-supercomputing-for-the-masses-part/208401741?pgno=3 http://www.drdobbs.com/parallel/cuda-supercomputing-for-the-masses-part/208401741?pgno=3 .

NvidiaKepler Compute Architecture. Tuning CUDA applications for kepler[EB/OL]. 2017-10-22[2017-11-17] . http://docs.nvidia.com/cuda/kepler-tuning-guide http://docs.nvidia.com/cuda/kepler-tuning-guide .

Nvidia J. Bringing GPU-accelerated computing to embedded systems[EB/OL]. 2014-04[2017-03-10] . http://developer.download.nvidia.com/embedded/jetson/TK1/docs/Jetson_platform_brief_May2014.pdf http://developer.download.nvidia.com/embedded/jetson/TK1/docs/Jetson_platform_brief_May2014.pdf .

Scharstein D, Pal C. Learning conditional random fields for stereo[C]//Proceedings of 2007 IEEE Conference on Computer Vision and Pattern Recognition. Minneapolis, USA: IEEE, 2007: 1-8. [ DOI:10.1109/cvpr.2007.383191 http://dx.doi.org/10.1109/cvpr.2007.383191 ]

Hirschmuller H, Scharstein D. Evaluation of cost functions for stereo matching[C]//Proceedings of 2007 IEEE Conference on Computer Vision and Pattern Recognition. Minneapolis, USA: IEEE, 2007: 1-8. [ DOI:10.1109/cvpr.2007.383248 http://dx.doi.org/10.1109/cvpr.2007.383248 ]