发布时间: 2018-12-16 摘要点击次数: 全文下载次数: DOI: 10.11834/jig.180342 2018 | Volume 23 | Number 12 图像分析和识别

1. 南昌航空大学测试与光电工程学院, 南昌 330063;
2. 南昌航空大学信息工程学院, 南昌 330063
 收稿日期: 2018-06-04; 修回日期: 2018-07-23 基金项目: 国家自然科学基金项目（61162023，61772255）；航空科学基金项目（2016ZC56005）；江西省重点研发计划一般基金项目（20171BBG70052，20161BBE50080） 第一作者简介: 江少锋, 1978年生, 男, 副教授, 博士, 主要研究方向为光流计算、医学图像处理、深度学习等。E-mail:jsphone@163.com;赵鹏, 男, 研究生, 主要研究方向为光流计算。E-mail:2257071325@qq.com;杨素华, 女, 讲师, 主要研究方向为图像处理。E-mail:43009@nchu.edu.cn;陈震, 男, 教授, 主要研究方向为光流计算。E-mail:DR_CHENZHEN@sina.com;张聪炫, 男, 副教授, 主要研究方向为光流计算。E-mail:zcxdsg@163.com. 中图法分类号: TP391 文献标识码: A 文章编号: 1006-8961(2018)12-1852-12

# 关键词

Parallel computing of variational optical flow based on the Markov random field model
Jiang Shaofeng1, Zhao Peng1, Yang Suhua2, Chen Zhen1, Zhang Congxuan1
1. School of Measuring and Optical Engineering, Nanchang Hangkong University, Nanchang 330063, China;
2. School of Information Engineering, Nanchang Hangkong University, Nanchang 330063, China
Supported by: National Natural Science Foundation of China (61162023, 61772255)

# Abstract

Objective The Markov random field (MRF) model for variational optical flow computing is a robust one. The MRF variational energy function includes the data and smooth terms. Brief propagation (BP) is a highly effective global method that minimizes the MRF energy function by iteratively transmitting messages from one pixel to the next in four directions. However, its computational cost remains high. The computational complexity of the BP method with a 3-level data pyramid is O($N$×$L$2×(10.5$T$ + 2)), where $N$ is the number of pixels in the image, $T$ is the iterative time and $L$ is the width of spatial sampling. We proposed a new MRF model to calculate optical flow and a parallel method to minimize the MRF energy function, which can dramatically reduce computational time. Method The data term in the proposed MRF variational energy function for optical flow computing is derived from the optical flow constraint equation from the brightness constancy assumption proposed by Horn. The term is smoothed by a non-square penalty function to reduce the boundary effect. To realize parallel computing on the CUDA platform, we proposed an optimal parallel BP method to minimize the MRF energy function. This optimal method used a 3D grid with four levels to perform the parallel method. At varying levels, the message was transmitted along different directions in the neighborhood. At each level, we assigned $L$ threads to one pixel and used the parallel dimension reduction method to calculate the message to be transmitted. Thus, we dramatically reduced the computational load of one thread and computational time. The dimension reduction method transforms the 2D energy function into two independent 1D energy functions and uses a low parabola envelope-based method to calculate the 1D energy function, whose computational complexity is O(4$L$2). The parallel dimension reduction method simultaneously calculates the MRF energy functions of the different rows (or columns) in the label region of optical flow, whose size is $L$×$L$ by allowing each thread to correspond to a row (or column) in the region. Thus, the computational complexity of the proposed parallel dimension reduction method is reduced to O(4$L$). All parallel processing steps were completed on the CUDA platform for general parallel computing on GPU. The first step is calculating the data term by assigning $N$ threads and letting each thread correspond to a pixel in the image. The corresponding computational complexity is O($L$2). The second step is the use of the proposed parallel dimension reduction method to calculate the message and transmit it to up, down, left, and right for $T$ times from the first to the third level in the data pyramid with three levels by assigning 4$L$ threads to a pixel in the image. The corresponding computational complexity is O(3×4×$L$×$T$). The last step is outputting the optical flow corresponding to the minimal message in each pixel by assigning $N$ threads and letting each thread correspond to a pixel in the image. The corresponding computational complexity is O($L$2). Thus, the entire computational complexity of this method is only O(2$L$2 + 12$L$×$T$) with three-level data pyramid. Result In the experiment for rotational sphere image sequence, the computational time was only 0.27 s, and corresponding computational time of the conventional method was 85 s. The computational efficiency of the proposed method was 314 times more than that of the conventional method. Our proposed method performed as rapidly as the Horn method, and faster than the Weickert (6 s) and Grauer-Gray (0.68 s) methods but slower than the Hossen method (0.01 s). In comparative experiments on errors with the Horn, Weickert, Hossen, and Grauer-Gray methods by using three image sequences (sphere image sequence with rotational moving, the Yosemite Valley image sequence with high change of brightness, and RubberWhale image sequence with many moving objects), the proposed method achieved the lowest average endpoint error (AEE) (0.13, 0.55, and 0.34, respectively) on all testing image sequences. Conclusion This paper proposed a novel parallel computing strategy of variational optical flow based on the MRF model by using a 3D grid with four levels to simultaneously transmit the message in the BP method along four directions, where multi-threads were assigned to one pixel for calculating the message to be transmitted. The results indicate that our proposed model has high computational efficiency and can obtain accurate optical flow from image sequences with rotational moving, high change of brightness and many moving objects. However, our proposed method requires a substantial amount of memory, which limits the width of spatial sampling. Therefore, our proposed method is unsuitable for calculating optical flows with large displacement from image sequences with high resolution.

# Key words

optical flow; Markov random field; parallel computing on CUDA; brief propagation method; non-squared penalty function

# 1.1 MRF光流计算能量方程

 $E\left( {u,v} \right) = {E_{{\rm{smooth}}}}\left( {u,v} \right) + \lambda {E_{{\rm{data}}}}\left( {u,v} \right)$ (1)

 ${E_{{\rm{data}}}}\left( {u,v} \right) = \sum\limits_{p \in {\cal P}} {\psi \left( {{D_p}\left( {{u_p},{v_p}} \right)} \right)}$ (5)

 $\begin{array}{*{20}{c}} {{D_p}\left( {{u_p},{v_p}} \right) = }\\ {\left\| {{I_1}\left( {x + \alpha {u_p},y + \alpha {v_p}} \right) - {I_2}\left( {x,y} \right)} \right\|} \end{array}$ (6)

 $\begin{array}{*{20}{c}} {{D_p}\left( {{u_p},{v_p}} \right) = \left\| {{I_1}\left( {x + \alpha {u_p},y + \alpha {v_p}} \right) - {I_2}\left( {x,y} \right)} \right\| + }\\ {\gamma \left\| \begin{array}{l} I_1^x\left( {x,y} \right)\alpha {u_p} + I_1^y\left( {x,y} \right)\alpha {v_p} + \\ {I_2}\left( {x,y} \right) - {I_1}\left( {x,y} \right) \end{array} \right\|} \end{array}$ (7)

# 1.2 BP算法最小化MRF能量方程

 $\begin{array}{*{20}{c}} {m_{pq}^t\left( {{f_q}} \right) = \mathop {\min }\limits_{{f_p}} \left( {V\left( {{f_p},{f_q}} \right) + {D_p}\left( {{f_p}} \right) + } \right.}\\ {\left. {\sum\limits_{s \in \mathit{\boldsymbol{\mathcal{N}}}\left( p \right)\backslash q} {m_{sp}^{t - 1}\left( {{f_p}} \right)} } \right)} \end{array}$ (8)

 ${b_q}\left( {{f_q}} \right) = {D_q}\left( {{f_q}} \right) + \sum\limits_{p \in \mathit{\boldsymbol{\mathcal{N}}}\left( q \right)} {m_{sp}^T\left( {{f_q}} \right)}$ (9)

# 1.5 采用BP方法计算光流步骤

1) 计算数据项：载入连续两帧图像根据式(7)计算数据项，这样每个像素点需进行$L^2$次计算，得到$L^2$个数据项。假设一幅图像有$N$个像素点，本步计算复杂度为${\rm{O}}\left({N{L^2}} \right)$

2) 计算并传递信息：因为光流计算是2维的，于是将式(8)的$p$点传递给$q$点的信息改为

 $\begin{array}{*{20}{c}} {m_{pq}^t\left( {{u_q},{v_q}} \right) = \mathop {\min }\limits_{{u_p},{v_p}} \left( {{{\left( {{u_q} - {u_p}} \right)}^2} + } \right.}\\ {\left. {{{\left( {{v_q} - {v_p}} \right)}^2} + h\left( {{u_p},{v_p}} \right)} \right)} \end{array}$ (10)

 $\begin{array}{*{20}{c}} {{u_p},{u_q},{v_q},{v_q} = - \left( {L/2} \right), \cdots ,\left( {L/2} \right) - 1}\\ {h\left( {{u_p},{v_p}} \right) = }\\ {\sum\limits_{s \in \mathit{\boldsymbol{\mathcal{N}}}\left( p \right)\backslash q} {\left( {m_{sp}^{t - 1}\left( {{u_p},{v_p}} \right) + \psi \left( {{D_p}\left( {{u_p},{v_p}} \right)} \right)} \right)} } \end{array}$ (11)

 $\begin{array}{l} m_{pq}^t\left( {{u_q},{v_q}} \right) = \mathop {\min }\limits_{{u_p}} \left( {{{\left( {{u_q} - {u_p}} \right)}^2} + \mathop {\min }\limits_{{v_p}} \left( {{{\left( {{v_q} - {v_p}} \right)}^2} + } \right.} \right.\\ \left. {\left. {h\left( {{u_p},{v_p}} \right)} \right)} \right) = \mathop {\min }\limits_{{u_p}} \left( {{{\left( {{u_q} - {u_p}} \right)}^2} + m_{h\left| {{u_p}} \right.}^t\left( {{v_q}} \right)} \right) \end{array}$ (12)

 $m_{h\left| {{u_p}} \right.}^t\left( {{v_q}} \right) = \mathop {\min }\limits_{{v_p}} \left( {{{\left( {{v_q} - {v_p}} \right)}^2} + h\left( {{u_p},{v_p}} \right)} \right)$ (13)

 $\begin{array}{l} {s_1} = \frac{{\left( {f\left( r \right) + {r^2}} \right) - \left( {f\left( q \right) + {q^2}} \right)}}{{2r - 2q}}\\ {s_2} = \frac{{\left( {f\left( q \right) + {q^2}} \right) - \left( {f\left( z \right) + {z^2}} \right)}}{{2q - 2z}} \end{array}$ (14)

2) 并行计算信息和二分法传递信息，本文主要对最为复杂的信息计算及信息传递部分的并行计算做了创新性设计。由于需要对上下左右4个方向进行信息更新，在分配线程块的时候采用了式(19)在CUDA中分配了如图 6所示的一个3维的线程块结构

 ${\rm{Dim3}}\;{\rm{Blocks}}\left( {{W_b}/2,{H_b},4} \right)$ (19)

 ${\rm{Dim3}}\;{\rm{Threads}}\left( {{n_{TX}},{n_{TY}},L} \right)$ (20)

# 3 实验

 $A = \frac{1}{N}\sum\limits_{i = 1}^N {{\phi _{\rm{e}}}\left( i \right)}$ (25)

 $\begin{array}{*{20}{c}} {{\phi _{\rm{e}}}\left( i \right) = }\\ {\arccos \left( {\frac{{u_i^{\rm{c}}u_i^{\rm{e}} + v_i^{\rm{c}}v_i^{\rm{e}} + 1}}{{\sqrt {{{\left( {u_i^{\rm{c}}} \right)}^2} + {{\left( {v_i^{\rm{c}}} \right)}^2} + 1} \sqrt {{{\left( {u_i^{\rm{e}}} \right)}^2} + {{\left( {v_i^{\rm{e}}} \right)}^2} + 1} }}} \right)} \end{array}$ (26)

AEE表示($\left({{u^e}, {v^e}} \right)$)和($\left({{u^c}, {v^c}} \right)$)之间的平均绝对误差，其表达式为

 $E = \frac{1}{N}\sum\limits_{i = 1}^N {\sqrt {{{\left( {u_i^{\rm{c}} - u_i^{\rm{e}}} \right)}^2} + {{\left( {v_i^{\rm{c}} - v_i^{\rm{e}}} \right)}^2}} }$ (27)

# 3.1 旋转球图像序列实验

Table 1 The optical flow errors of different methods on rotary sphere image sequence

 方法 AAE AEE TIME/s Horn 3.90 0.14 0.4 Weickert 7.11 0.21 6 Hossen 4.48 0.15 0.01 Grauer-Gray 4.40 0.15 0.68 本文 3.80 0.13 0.27(并行)85(串行)

# 3.2 Yosemite山谷图像序列实验

Table 2 The optical flow errors of different methods on Yosemite Valley image sequence

 方法 AAE AEE TIME/s Horn 19.39 1.10 0.5 Weickert 10.80 0.57 17.7 Hossen 12.02 0.64 0.03 Grauer-Gray 14.13 0.61 1.16 本文 12.41 0.55 0.50

# 3.3 RubberWhale图像序列实验

Table 3 The optical flow errors of different methods on RubberWhale image sequence

 方法 AAE AEE TIME/s Horn 12.04 0.42 1.67 Weickert 24.94 0.71 88.1 Hossen 9.19 0.35 0.08 Grauer-Gray 12.75 0.43 1.95 本文 10.70 0.34 1.62

# 参考文献

• [1] Horn B K P, Schunck B G. Determining optical flow[J]. Artificial Intelligence, 1981, 17(1-3): 185–203. [DOI:10.1016/0004-3702(81)90024-2]
• [2] Lucas B D, Kanade T. An iterative imge registration technique with an application to stereo vision[C]//Proceedings of the 7th International Joint Conference on Artificial Intelligence. Vancouvcr BC, Canada: Morgan Kaufmann Publishers Inc., 1981, 2: 674-679.
• [3] Zimmer H, Bruhn A, Weickert J. Optic flow in harmony[J]. International Journal of Computer Vision, 2011, 93(3): 368–388. [DOI:10.1007/s11263-011-0422-6]
• [4] Papenberg N, Bruhn A, Brox T, et al. Highly accurate optic flow computation with theoretically justified warping[J]. International Journal of Computer Vision, 2006, 67(2): 141–158. [DOI:10.1007/s11263-005-3960-y]
• [5] Heitz D, Mémin E, Schnörr C. Variational fluid flow measurements from image sequences:synopsis and perspectives[J]. Experiments in Fluids, 2010, 48(3): 369–393. [DOI:10.1007/s00348-009-0778-3]
• [6] Werlberger M, Trobin W, Pock T, et al. Anisotropic Huber-L1 optical flow[C]//Proceedings of the British Machine Vision Conference. London, UK: BMVA Press, 2009: 1-11.[DOI: 10.5244/c.23.108]
• [7] Zimmer H, Bruhn A, Weickert J. Optic flow in harmony[J]. International Journal of Computer Vision, 2011, 93(3): 368–388. [DOI:10.1007/s11263-011-0422-6]
• [8] Felzenszwalb P F, Huttenlocher D P. Efficient belief propagation for early vision[J]. International Journal of Computer Vision, 2006, 70(1): 41–54. [DOI:10.1007/s11263-006-7899-4]
• [9] Boykov Y, Veksler O, Zabih R. Fast approximate energy minimization via graph cuts[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2001, 23(11): 1222–1239. [DOI:10.1109/34.969114]
• [10] Weiss Y, Freeman W T. On the optimality of solutions of the max-product belief-propagation algorithm in arbitrary graphs[J]. IEEE Transactions on Information Theory, 2001, 47(2): 736–744. [DOI:10.1109/18.910585]
• [11] Feng C L, Zhao D Z, Huang M. Image segmentation using CUDA accelerated non-local means denoising and bias correction embedded fuzzy c-means (BCEFCM)[J]. Signal Processing, 2016, 122: 164–189. [DOI:10.1016/j.sigpro.2015.12.007]
• [12] Fluck O, Vetter C, Wein W, et al. A survey of medical image registration on graphics hardware[J]. Computer Methods and Programs in Biomedicine, 2011, 104(3). [DOI:10.1016/j.cmpb.2010.10.009]
• [13] Awan A A, Hamidouche K, Hashmi J M, et al. S-Caffe: co-designing MPI runtimes and Caffe for scalable deep learning on modern GPU clusters[C]//Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. Austin, Texas, USA: ACM, 2017: 193-205.[DOI: 10.1145/3018743.3018769]
• [14] Cuomo S, Galletti A, Giunta G, et al. Toward a multi-level parallel framework on GPU cluster with PetSC-CUDA for PDE-based optical flow computation[J]. Procedia Computer Science, 2015, 51: 170–179. [DOI:10.1016/j.procs.2015.05.220]
• [15] Hossen K, Mahmud H. Parallel optical flow detection using CUDA[D]. Dhaka, Bangladesh: BRAC University, 2014.
• [16] Grauer-Gray S, Kambhamettu C, Palaniappan K. GPU implementation of belief propagation using CUDA for cloud tracking and reconstruction[C]//Proceedings of 2008 IAPR Workshop on Pattern Recognition in Remote Sensing. Tampa, Florida, USA: IEEE, 2008: 1-4.[DOI: 10.1109/prrs.2008.4783169]
• [17] Felzenszwalb P F, Huttenlocher D P. Distance transforms of sampled functions[J]. Theory of Computing, 2012, 8(19): 415–428. [DOI:10.4086/toc.2012.v008a019]
• [18] Jiang S F, Yang S H, Chen Z, et al. Parallel computing of signed distance function in level set based on dimension reduction[J]. Journal of Image and Graphics, 2018, 23(2): 174–181. [江少锋, 杨素华, 陈震, 等. 水平集中符号距离函数并行降维计算[J]. 中国图象图形学报, 2018, 23(2): 174–181. ] [DOI:10.11834/jig.170348]
• [19] Barron J L, Fleet D J, Beauchemin S S. Performance of optical flow techniques[J]. International Journal of Computer Vision, 1994, 12(1): 43–77. [DOI:10.1007/bf01420984]