目的 基于视觉的三维场景重建技术已在机器人导航，航拍地图构建和增强现实等领域得到广泛应用。不过，当相机出现较大运动时则会使得传统基于窄基线约束的三维重建方法无法正常工作。方法 本文针对宽基线环境，提出了一种融合高层语义先验的三维场景重建算法。该方法在马尔可夫随机场（Markov Random Field，简称MRF）模型的基础上，结合超像素的外观，共线性，共面性和深度等多种特征对不同视角图像中各个超像素的三维位置和朝向进行推理，从而实现宽基线条件下的初始三维重建。与此同时，还以递归的方式利用高层语义先验对相似深度超像素实现合并，进而对场景深度和三维模型进行渐进式优化。结果 通过实验证明，本文方法在多种不同的宽基线环境，尤其是相机运动较为剧烈的情况下，依然能够取得比传统方法更为稳定而精确的深度估计和三维场景重建效果。结论 本文展示了在宽基线条件下如何将多元图像特征与基于三角化的几何特征相结合以构建出精确的三维场景模型。本文方法采用MRF模型对不同视角图像中超像素的三维位置和朝向进行同时推理，并结合高层语义先验对三维重建的过程提供指导。与此同时，还使用了一种递归式框架以实现场景深度的渐进式优化。实验证明，本文方法在不同的宽基线环境下均能够获得比传统方法更接近真实描述的三维场景模型。
Wide-baseline 3D reconstruction with semantic prior fusion and progressive depth optimization
yaotuozhong,zuowenhui,anpeng,songjiatao(Information Science and Electronic Engineering, Zhejiang University;School of Electronic and Information Enginnering, Ningbo University of Technology)
Objective As a research hotspot in computer vision, 3D scene reconstruction technique has been widely used in many fields such as unmanned driving, digital entertainment and Aeronautics and Astronautics. Traditional scene reconstruction methods iteratively estimate the camera pose and 3d scene models sparsely or densely based on image sequences from multiple views by structure from motion. However, the large motion between cameras usually leads to occlusion and geometric deformation which often appears in real-application and will significantly increase the difficulty of image matching. Most previous work including both sparse and dense reconstructions only work well in narrow baseline environments and wide-baseline 3D reconstruction is a much more difficult problem. This problem often exists in many applications such robot navigation, aerial map building and augmented reality and is very valuable for researching. In recent years, several semantic fusion based solutions have been proposed which became one of the developing trends because these methods are more consistent with human’s cognition of the scene. Method Our paper proposed a novel wide-baseline dense 3D scene reconstruction algorithm which integrates both the attribute of the outdoor structural scene and high-level semantic prior. Our algorithm has the following three characteristics: 1) Superpixel that is larger than pixel in area is used as geometric primitive for image representation which has three advantages: Firstly, it increases the robustness of region correlation in weak-texture environments; secondly, it describes both the real boundary of the objects in the scene and the discontinuity of the depth; Thirdly, it reduces the number of graph nodes in Markov random field model which results in significant reduction of computational complexity during solving energy minimization problem. 2) A Markov random field (MRF) model is utilized to estimate both the 3D position and orientation of each superpixel in different view images based on multiple low-level features. In our MRF energy function, the unary potential models the planar parameter of each superpixel and uses the relational error of estimated depth and ground truth depth for penalty. The pairwise potential models three geometric relations: co-linearity, connectivity and co-planarity between adjacent superpixels. Additionally, a new potential is added in for modeling the relational error between triangulated depth and estimated depth. 3) Both depth and 3D model of the scene are progressively optimized through superpixel merging with similar depths according to high-level semantic priors in our iterative type framework. When the adjacent superpixels have similar depths, they will be merged and a larger superpixel is generated which further reduces the possibility of depth discontinuity. The segmentation image after superpixel merging will be used in next iteration for MRF based depth estimation. The MAP inference of our MRF model can be efficiently solved by classis linear programming. Result For evaluating the performance of our wide-baseline 3D scene reconstruction algorithm, we use several classic wide-baseline image sequences such as “Stanford I, II, III, IV”, “Merton College”, “University Library” and “Wadham College”. The detailed experimental results demonstrate that our algorithm can estimate the large camera motion more accurately compared with the classic method and recover more robust and accurate depth estimation and 3D scene models. Our algorithm can work well in both narrow-baseline and wide-baseline environments and especially are suitable for large-scale scene reconstruction. Conclusion This paper shows how to recover accurate 3D scene model based on both multiple image features and triangulated geometric features in wide-baseline environments. We use MRF model to estimate the planar parameter of superpixel in different views and high-level semantic prior is integrated for guiding the superpixel merging with similar depths. Furthermore, an iterative framework is proposed to progressively optimize both the depth of the scene and 3D scene model. The experiment shows that our proposed algorithm can achieve more accurate 3D scene model than the classic algorithm in different wide-baseline image datasets.