发布时间: 2019-11-16 摘要点击次数: 全文下载次数: DOI: 10.11834/jig.190066 2019 | Volume 24 | Number 11 图像理解和计算机视觉

1. 东北大学软件学院, 沈阳 110000;
2. 东北大学软件学院交互式媒体与计算技术研究所, 沈阳 110000
 收稿日期: 2019-03-12; 修回日期: 2019-06-05; 预印本日期: 2019-06-12 第一作者简介: 高天寒, 1978年生, 男, 教授, 博士生导师, 主要研究方向为虚拟/增强现实技术、普适计算、网络与空间安全。E-mail:gaoth@mail.neu.edu.cn. 中图法分类号: TP301.6 文献标识码: A 文章编号: 1006-8961(2019)11-1952-10

# 关键词

Revised image-sequence-based incremental structure from motion algorithm
Gao Tianhan1,2, Yang Ziyi1
1. Software College, Northeastern University, Shenyang 110000, China;
2. Interactive Media and Computing Technology Institute, Software College, Northeastern University, Shenyang 110000, China

# Abstract

Objective With the fast development of virtual reality technology, determining how to establish virtual scenes rapidly and realistically becomes a bottleneck that restricts the popularization and promotion of virtual reality technology. As an important technical means to solve this problem, 3D reconstruction technology has been applied in many fields, such as ancient building protection and restoration, medical treatment, and tourism. Such technology has been developing continuously in recent years, and its application scenarios have been extended greatly. The scale of data to be processed has also increased substantially, and the accuracy requirements for reconstruction results have continuously increased. In these cases, many problems existing in the traditional methods are exposed. For instance, problems such as poor robustness of initial image pair selection process, inefficient incremental solving process, redundant calculation of bundle adjustment, and errors that remain after model correction are found in the traditional incremental structure from motion algorithm. Accordingly, this study proposes a new incremental structure from motion algorithm (named SFM-Y) based on the foundation called image sequence-based 3D reconstruction. Method First, an improved adaptive outlier-filtering method is proposed to enhance the robustness of the initial image selection in this study. Next, an adaptive threshold estimation model is introduced into our algorithm, and constraints and filter conditions are added to improve the robustness of initial image selection and initial reconstruction to resolve the robustness problem caused by the threshold, which is set manually, used in the RANSAC (random sample consensus) filter process. The constraints include four-point method inspection, wide baseline constraint, and revised five-point outlier culling method. In this way, the initial image pair that is used to perform initial reconstruction is selected with strong robustness. Second, the proposed method processes incremental iterative reconstruction to enrich the point cloud model. In this process, the improved efficient perspective-$n$-point solution method is proposed to improve the computational efficiency and accuracy of the incremental addition process. The solution method combines the idea of weighted refinement and the method of how to reduce a linear system’s algebraic error. We try to give this method a rigorous derivation process to prove that it is an efficient solution to the question of how to accelerate the incremental solution process of the incremental structure from motion algorithm. Finally, an optimized bundling adjustment strategy is used to modify the model, solve the model drift, and revise the re-projection error. In this stage, the integration of graphics is checked by introducing a minimum triangulation angle and performing re-triangulation on the tracks among different projection points. Then, once or twice iterative optimization (including global bundle adjustment, filtering, and re-triangulation process) is executed to ameliorate the result of our algorithm. All the methods mentioned above that we performed or revised work together to complete a common goal, reducing solution time and improving reconstruction quality. Result Among all the experiments mentioned in this paper, we select data sets in different data scales first. A comparison and test are then performed among the methods involved in this study to comprehensively and objectively analyze the algorithm performance and acquire a convincing result. Experimental results show that the SFM-Y algorithm improves the computational efficiency and quality of results compared with the traditional incremental motion structure recovery algorithm. The performance analysis and comparison results indicate that the proposed method is more efficient than the traditional method. The reconstruction accuracy yields 10% improvement compared with that of the traditional algorithm referred to in this study. Conclusion After a series of analyses, arguments, and experiments, the following conclusions are drawn. Experiments in many databases show that the new incremental motion structure restoration algorithm (SFM-Y) proposed in this study can efficiently and accurately achieve the goal of 3D reconstruction based on image sequences. This algorithm provides a new way of thinking in the field of incremental structure from motion algorithm, which certainly has a positive promoting effect on the development of 3D reconstruction research and has great meaning for the following research and exploration. The proposed algorithm is better than traditional methods and has the advantages of high computational efficiency, strong initial reconstruction robustness, and high quality of the generated model.

# Key words

incremental structure from motion algorithm; 3D reconstruction; image sequence; PNP (perspective-$n$-point) problem; bundle adjustment

# 0 引言

1) 使用改进的自适应阈值估计方法取代静态阈值，并进一步添加相应约束过滤，降低异常值(外点)占比，从而提高初始图像对选择的鲁棒性。

2) 对增量添加的过程进行优化，使用改进的EPNP(efficient perspective-$n$-point)算法取代传统方法，提高算法的解算速度和准确性。

3) 优化捆绑调整算法，在保证计算准确度的前提下，尽可能减少冗余计算，在一定程度上提高算法的效率。

# 2.1.1 自适应阈值估计

 $\begin{array}{l} NFA(\mathit{\boldsymbol{M}}, k) = {N_{{\rm{out }}}}\left( {n - {N_{{\rm{sam }}}}} \right) \times \\ \left( {\begin{array}{*{20}{l}} n\\ k \end{array}} \right)\left( {\begin{array}{*{20}{c}} k\\ {{N_{{\rm{sam }}}}} \end{array}} \right){\left( {{e_k}{{(\mathit{\boldsymbol{M}})}^d}{\alpha _0}} \right)^{k - {N_{{\rm{sam }}}}}} \end{array}$ (1)

 $NFA(\mathit{\boldsymbol{M}}) = \mathop {\min }\limits_{k = {N_{{\rm{tant }}}} + 1, \ldots ,n} NFA(\mathit{\boldsymbol{M}},k) \le \varepsilon$ (2)

# 2.1.2 引入约束与过滤条件

1) 使用4点法计算单应矩阵核验匹配点，将通过核验的匹配点称为内点，图像根据内点数量进行排序，从而筛选具有更多可靠的匹配特征点的图像。

2) 为了弥补短基线匹配技术在深度信息估计方面的不足，选择宽基线作为约束条件之一[6, 24]来确保后续增量添加和三角化过程还原点的空间位置可靠，然后利用对极几何关系和特征提取得到的特征描述符进行后续操作。

3) 采用改进的Nister 5点法，利用点到极线距离的最小原理剔除异常值，保证基础矩阵的正确性，从而确保在相机姿态确定后2维图像中的点与对应3维空间中的点映射关系正确，具体为

 $\begin{array}{l} \sum\limits_{i = 0}^N {\left( {d{{\left( {\mathit{\boldsymbol{x}}_i^{\prime {\rm{T}}},\mathit{\boldsymbol{E}}{\mathit{\boldsymbol{x}}_i}} \right)}^2} + d{{\left( {{\mathit{\boldsymbol{x}}_i},{\mathit{\boldsymbol{E}}^{\rm{T}}}{\mathit{\boldsymbol{x}}_i}} \right)}^2}} \right)} = \sum\limits_{i = 0}^n {{{\left( {\mathit{\boldsymbol{x}}_i^{\prime {\rm{T}}},\mathit{\boldsymbol{E}}{\mathit{\boldsymbol{x}}_i}} \right)}^2}} \times \\ \left( {\frac{1}{{\left( {\mathit{\boldsymbol{E}}{\mathit{\boldsymbol{x}}_i}} \right)_1^2 + \left( {\mathit{\boldsymbol{E}}{\mathit{\boldsymbol{x}}_i}} \right)_2^2}} + \frac{1}{{\left( {{\mathit{\boldsymbol{E}}^{\rm{T}}}\mathit{\boldsymbol{x}}_i^\prime } \right)_1^2 + \left( {{\mathit{\boldsymbol{E}}^{\rm{T}}}\mathit{\boldsymbol{x}}_i^\prime } \right)_2^2}}} \right) \end{array}$ (3)

3) 基于选择的初始图像对和计算得到的相机位姿，采用双视图重建方法进行初始重建。

# 2.2.1 优化待求解的线性系统。

 $d_{i}\left[\begin{array}{l}{\boldsymbol{u}_{i}} \\ {1}\end{array}\right]=\boldsymbol{A}[\boldsymbol{R} | \boldsymbol{t}]\left[\begin{array}{l}{\boldsymbol{p}_{i}^{ w}} \\ {1}\end{array}\right]$ (4)

 $\left[\begin{array}{ccc}{1} & {0} & {-u_{i}^{c}} \\ {0} & {1} & {-v_{i}^{c}}\end{array}\right] \boldsymbol{B}_{i} \boldsymbol{x}=0$ (5)

# 2.2.2 剔除错误数值及对应关系

 ${z_s} = \mathop {\arg \min }\limits_{||z||{_F} = 1} \sum\limits_{i = 1}^N {{{\left\| {\sum\limits_{j = 1}^4 {{\alpha _{ij}}} \left( {z_j^c{u_i} - \tilde x_j^c} \right)} \right\|}^2}}$ (6)

# 2.2.3 结果求精

 $\tilde x_i^c - {\bar x^c} = \mathit{\boldsymbol{R}}\left( {\tilde x_i^\omega - {{\bar x}^\omega }} \right),\quad i = 1,2, \cdots ,N$ (7)

 \begin{aligned} \frac{1}{s_{i}^{*}}\left(\widetilde{x}_{i}^{c}-\bar{x}^{c}\right) &=\frac{1}{s_{i}^{*}} R\left(\widetilde{x}_{i}^{\omega}-\bar{x}^{\omega}\right) \\ i &=1,2, \cdots, N \end{aligned} (8)

 $\mathit{\boldsymbol{D}} = \sum\limits_{i = 1}^N {\frac{1}{{s_i^{*2}}}} \left( {\tilde x_i^c - {{\bar x}^c}} \right){\left( {\tilde x_i^\omega - {{\bar x}^\omega }} \right)^{\rm{T}}}$

# 2.3 BA优化与修正

HSFM (hybrid structure-from-motion)[27]采用将BA加入到每个相机添加和三角化的增量迭代过程中来改善模型质量，这种方式可以提高模型的准确性，但由于增量式SFM只是局部影响模型，而且在已经做出改进的积极影响下，通常不需要每次迭代都进行全局BA。因此，以HSFM中的BA策略为参考，本文使用如下的策略步骤进行改进：

1) 全局BA。使用引入huber函数作为损失函数的改进的BA公式模型，不必在每次迭代中都使用全局BA，而是在模型增长一定比例后才进行全局BA，具体为

 $\mathop {\min }\limits_{_{{C_i},{X_j}}} \sum\limits_{i = 1}^N {\sum\limits_{j = 1}^M {{\delta _{ij}}} } {\left\| {{x_{ij}} - \gamma \left( {{\mathit{\boldsymbol{K}}_i},{\mathit{\boldsymbol{R}}_i},{C_i},{X_j}} \right)} \right\|_{{\rm{huber }}}}$ (9)

2) 过滤和重三角测量。在计算过程中会产生重投影误差(即第2次投影误差)，其原因是真实3维空间点在图像平面上的投影(图像上的像素点)和计算得到的虚拟的像素点的差值，由于提取特征点的位置和三角测量计算等原因，计算得到的值和实际情况不会完全相符(即误差值不会恰好等于0)。所采取的策略是进行图形完整性检查，在所有的观察射线对中引入最小三角测量角并对投影点之间的轨迹进行重三角测量，具体为

 $\cos \alpha=\frac{t_{a}-X_{a b}}{\left\|t_{a}-X_{a b}\right\|_{2}} \cdot \frac{t_{b}-X_{a b}}{\left\|t_{b}-X_{a b}\right\|_{2}}$ (10)

3) 迭代优化。最后，为了进一步提升模型的准确性和增强场景，进行12次的迭代优化(重复步骤1)和步骤2))，使最终得到的稀疏点云能达到一个较好的结果。

# 3 实验及性能分析

Table 1 Average outlier proportion

 /% 数据集 bundler visual SFM AC-RANSAC F-AC-RANSAC South Building 0.20 0.19 0.19 0.17 Graham_Hall 0.30 0.27 0.26 0.25 Nyc_library 1.60 1.53 1.40 1.35 Tower of London 1.20 1.18 1.16 1.13 Union_square 1.10 1.05 1.00 0.96 注：加粗字体表示过滤的最优结果。

1) 解算效率。在本实验中主要验证本文改进的EPNP方法(OWEPNP)在解算效率方面的优势。

2) 解算准确性。使用平均旋转误差和平均平移误差两个指标考量OWEPNP方法的解算准确性。平均旋转误差和平均平移误差为所有取样点对应的相机姿态误差(包括旋转误差与平移误差)的平均值，值越稳定说明解算方法越准确，如图 3图 4所示。从图 3图 4可以看出，OWEPNP方法与其他两种方法对比，可以在一定程度减少解算点的误差值，提高解算点的精确度，从而使重建结果能够更加贴近理想状态。

Table 2 Comparison of reconstruction result

 数据集 图像数量 注册图像的数量 结果中点的数量 时间/min 平均重投影误差/像素 VSFM bundler SFM-Y VSFM bundler SFM-Y VSFM SFM-Y bundler VSFM bundler SFM-Y South Building 128 128 100 128 45 000 38 000 60 000 8.976 30.325 6.064 0.68 0.75 0.497 Graham_Hall 1 273 1 084 980 1 251 380 000 364 000 415 000 40.58 5 960.530 35.670 1.16 1.54 1.04 Tower of London 1 576 569 547 804 140 000 150 000 130 000 8.283 3 081.750 7.531 0.59 1.54 0.61 Nyc_library 2 550 511 400 710 110 000 71 000 133 000 10.056 60.425 8.563 0.67 1.84 0.66 Union_square 5 961 839 649 1 611 95 000 90 000 109 000 9.27 938.617 8.900 0.68 3.22 0.67 注：加粗字体表示不同方法相比较得到的最优结果, VSFM为visual SFM。

# 参考文献

• [1] Zhao Q P. Over view of virtual reality[J]. Science in China:Information Science, 2009, 39(1): 2–46. [赵沁平. 虚拟现实综述[J]. 中国科学F辑:信息科学, 2009, 39(1): 2–46. ] [DOI:10.1360/zf2009-39-1-2]
• [2] Forsyth D A, Ponce J. Computer Vision O: amodern Approach[M]. Beijing: Electronic Industry Press, 2012: 110.Forsyth D A, Ponce J. [计算机视觉——一种现代方法[M].北京: 电子工业出版社, 2012: 110.]
• [3] Lyu L, Yao T Z, Song J T, et al. Design andimplementation of 3D reconstruction systembased on monocular vision[J]. Computer Engineering, 2018, 44(12): 233–239. [吕立, 姚拓中, 宋加涛, 等. 基于单目视觉三维重建系统的设计与实现[J]. 计算机工程, 2018, 44(12): 233–239. ] [DOI:10.19678/j.issn.1000-3428.0047936]
• [4] Moulon P, Monasse P, Marlet R. Global fusion of relative motions for robust, accurateand scalable structure from motion[C]//Proceedings of 2013 IEEE International Conference on Computer Vision. Sydney, Australia: IEEE, 2013: 3248-3255.[DOI: 10.1109/ICCV.2013.403]
• [5] Skilling J, Gull S F. Algorithms and Applications[M]//Smith C R, Grandy W T. Maximum Entropy and Bayesian Methods in Inverse Problems. Dordrecht: Reidel Publishing Company, 1985: 83-132.
• [6] Yao T Z, Zuo W H, An P, et al. Wide-baseline 3D reconstruction with semantic prior fusion and progressive depth optimization[J]. Journal of Image and Graphics, 2019, 24(4): 603–614. [姚拓中, 左文辉, 安鹏, 等. 融合语义先验和渐进式深度优化的宽基线3维场景重建[J]. 中国图象图形学报, 2019, 24(4): 603–614. ] [DOI:10.11834/jig.180477]
• [7] Anand A, Koppula H S, Joachims T, et al. Contextually guided semantic labeling and search for three-dimensional point clouds[J]. The International Journal of Robotics Research, 2013, 32(1): 19–34. [DOI:10.1177/0278364912461538]
• [8] Tong S, Xu X G, Yi C T, et al. Overview on vision-based 3D reconstruction[J]. Application Research of Computers, 2011, 28(7): 2411–2417. [佟帅, 徐晓刚, 易成涛, 等. 基于视觉的三维重建技术综述[J]. 计算机应用研究, 2011, 28(7): 2411–2417. ] [DOI:10.3969/j.issn.1001-3695.2011.07.003]
• [9] Snavely N, Seitz S M, Szeliski R. Photo tourism:exploring photo collections in 3D[J]. ACM Transactions on Graphics, 2006, 25(3): 835–846. [DOI:10.1145/1141911.1141964]
• [10] Snavely N, Seitz S M, Szeliski R. Modeling the world from internet photo collections[J]. International Journal of Computer Vision, 2008, 80(2): 189–210. [DOI:10.1007/s11263-007-0107-3]
• [11] Shah R, Deshpande A, Narayanan P J. Multistage SFM: revisiting incremental structure from motion[C]//Proceedings of the 2nd International Conference on 3D Vision. Tokyo, Japan: IEEE, 2014, 1: 417-424.[DOI: 10.1109/3DV.2014.95]
• [12] Schönberger J L, Frahm J M. Structure-from-motion revisited[C]//Proceedings of 2016 IEEE Computer Vision and Pattern Recognition.Las Vegas, NV, USA: IEEE, 2016: 4104-4113.[DOI: 10.1109/CVPR.2016.445]
• [13] Farenzena M, Fusiello A, Gherardi R. Structure-and-motion pipeline on a hierarchical cluster tree[C]//Proceedings of the 12th International Conference on Computer Vision Workshops.Kyoto, Japan: IEEE, 2009: 1489-1496.[DOI: 10.1109/ICCVW.2009.5457435]
• [14] Agudo A, Moreno-Noguer F, Calvo B, et al. Sequential non-rigid structure from motion using physical priors[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38(5): 979–994. [DOI:10.1109/TPAMI.2015.2469293]
• [15] Cui H N, Shen S H, Hu Z Y. Global fusionof generalized camera model for efficient large-scale structure from motion[J]. Science China Information Sciences, 2017, 60(3): 038101. [DOI:10.1007/s11432-015-0792-1]
• [16] Lee M, Cho J, Choi C H, et al. Procrustean normal distribution for non-rigid structure from motion[C]//Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland: IEEE, 2013: 1280-1287.[DOI: 10.1109/CVPR.2013.169]
• [17] Ludovic M, Del Bue A. Revisiting projective structure for motion:a robust and efficient incremental solution[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018. [DOI:10.1109/TPAMI.2018.2849973]
• [18] Lowe D G. Distinctive image features from scale-invariant keypoints[J]. International Journal of Computer Vision, 2004, 60(2): 91–110. [DOI:10.1023/b:visi.0000029664.99615.94]
• [19] Jia D, Zhu N D, Yang N H, et al. Image matching methods[J]. Journal of Image and Graphics, 2019, 24(5): 677–699. [贾迪, 朱宁丹, 杨宁华, 等. 图像匹配方法研究综述[J]. 中国图象图形学报, 2019, 24(5): 677–699. ] [DOI:10.11834/jig.180501]
• [20] Wu C C. Towards linear-time incremental structure from motion[C]//Proceedings of 2013 International Conference on 3DV-Conference. Seattle, WA, USA: IEEE, 2013: 127-134[[DOI: 10.1109/3DV.2013.25]
• [21] Hartley R, Eisserman A. Multiple View Geometry in Computer Vision[M]. USA: Cambridge University Press, 2003.
• [22] Lepetit V, Moreno-Noguer F, Fua P. EPnP:an accurate O(n) solution to the PnP problem[J]. International Journal of Computer Vision, 2009, 81(2): 155–166. [DOI:10.1007/s11263-008-0152-6]
• [23] Moulon P, Monasse P, Marlet R. Adaptive structure from motion with a contrario model estimation[C]//Proceedings of Asian Conference on Computer Vision. Daejeon, Korea: Springer, 2012: 257-270.[DOI: 10.1007/978-3-642-37447-0_20]
• [24] Zhang F, Xu Z H, Shi L M, et al. Automatic reconstruction of distant scenes from wide baseline images[J]. Journal of Computer-Aided Design & Computer Graphics, 2010, 22(2): 256–263. [张峰, 许振辉, 史利民, 等. 基于宽基线图像远距离场景的自动三维重建[J]. 计算机辅助设计与图形学学报, 2010, 22(2): 256–263. ]
• [25] Yang S, Wu F C. Weighted linear methods for the camera pose estimation[J]. Journal of Software, 2011, 22(10): 2476–2487. [杨森, 吴福朝. 摄像机位姿的加权线性算法[J]. 软件学报, 2011, 22(10): 2476–2487. ] [DOI:10.3724/SP.J.1001.2011.03916]
• [26] Ferraz L, Binefa X, Moreno-Noguer F. Very fast solution to the PnP problem with algebraic outlier rejection[C]//Proceedings of 2014 Conference on Computer Vision and Pattern Recognition. Columbus, OH, USA: IEEE, 2014: 501-508.[DOI: 10.1109/CVPR.2014.71]
• [27] Cui H N, Gao X, Shen S H, et al. HSfM: hybrid structure-from-motion[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE, 2017: 1212-1221.[DOI: 10.1109/CVPR.2017.257]
• [28] Wilson K, Snavely N. Robust global translations with 1DSfM[C]//European Conference on Computer Vision. Zurich, Switzerland: Springer, 2014: 61-75.[DOI: 10.1007/978-3-319-10578-9_5]