姚拓中,左文辉,安鹏,宋加涛(宁波工程学院电信学院, 宁波 315016;浙江大学信电系, 杭州 310027)
目的 基于视觉的3维场景重建技术已在机器人导航、航拍地图构建和增强现实等领域得到广泛应用。不过，当相机出现较大运动时则会使得传统基于窄基线约束的3维重建方法无法正常工作。方法 针对宽基线环境，提出了一种融合高层语义先验的3维场景重建算法。该方法在马尔可夫随机场（MRF）模型的基础上，结合超像素的外观、共线性、共面性和深度等多种特征对不同视角图像中各个超像素的3维位置和朝向进行推理，从而实现宽基线条件下的初始3维重建。与此同时，还以递归的方式利用高层语义先验对相似深度超像素实现合并，进而对场景深度和3维模型进行渐进式优化。结果 实验结果表明，本文方法在多种不同的宽基线环境，尤其是相机运动较为剧烈的情况下，依然能够取得比传统方法更为稳定而精确的深度估计和3维场景重建效果。结论 本文展示了在宽基线条件下如何将多元图像特征与基于三角化的几何特征相结合以构建出精确的3维场景模型。本文方法采用MRF模型对不同视角图像中超像素的3维位置和朝向进行同时推理，并结合高层语义先验对3维重建的过程提供指导。与此同时，还使用了一种递归式框架以实现场景深度的渐进式优化。实验结果表明，本文方法在不同的宽基线环境下均能够获得比传统方法更接近真实描述的3维场景模型。
Wide-baseline 3D reconstruction with semantic prior fusion and progressive depth optimization
Yao Tuozhong,Zuo Wenhui,An Peng,Song Jiatao(School of Electronic and Information Engineering, Ningbo University of Technology, Ningbo 315016, China;College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou 310027, China)
Objective As a research hotspot in computer vision, 3D scene reconstruction technique has been widely used in many fields, such as unmanned driving, digital entertainment, aeronautics, and astronautics. Traditional scene reconstruction methods iteratively estimate the camera pose and 3D scene models sparsely or densely on the basis of image sequences from multiple views by structure from motion. However, the large motion between cameras usually leads to occlusion and geometric deformation, which often appears in actual applications and will significantly increase the difficulty of image matching. Most previous works, including sparse and dense reconstructions, are only effective in narrow baseline environments, and wide-baseline 3D reconstruction is a considerably more difficult problem. This problem often exists in many applications, such robot navigation, aerial map building, and augmented reality, and is valuable for research. In recent years, several semantic fusion-based solutions have been proposed and have become the developing trends because these methods are more consistent with human cognition of the scene. Method A novel wide-baseline dense 3D scene reconstruction algorithm, which integrates the attribute of an outdoor structural scene and high-level semantic prior, is proposed. Our algorithm has the following characteristics. 1) Superpixel, which is larger than the pixel in the area, is used as a geometric primitive for image representation with the following advantages. First, it increases the robustness of region correlation in weak-texture environments. Second, it describes the actual boundary of the objects in the scene and the discontinuity of the depth. Third, it reduces the number of graph nodes in Markov random field (MRF) model, thereby resulting in remarkable reduction of computational complexity when solving an energy minimization problem. 2) An MRF model is utilized to estimate the 3D position and orientation of each superpixel in different view images on the basis of multiple low-level features. In our MRF energy function, the unary potential models the planar parameter of each superpixel and uses the relational error of estimated and ground truth depths for penalty. The pairwise potential models three geometric relations, namely, co-linearity, connectivity, and co-planarity between adjacent superpixels. In addition, a new potential is added to model the relational error between the triangulated and estimated depths. 3) The depth and 3D model of the scene are progressively optimized through superpixel merging with similar depths according to high-level semantic priors in our iterative type framework. When the adjacent superpixels have similar depths, they are merged, and a larger superpixel is generated, thereby reducing the possibility of depth discontinuity further. The segmentation image after superpixel merging is used in the next iteration for MRF-based depth estimation. The MAP inference of our MRF model can be efficiently solved by the classic linear programming. Result We use several classic wide-baseline image sequences, such as "Stanford Ⅰ, Ⅱ, Ⅲ, and Ⅳ", "Merton College Ⅲ", "University Library", and "Wadham College" to evaluate the performance of our wide-baseline 3D scene reconstruction algorithm. Experimental results demonstrate that our algorithm can estimate the large camera motion more accurately than the classic method and can recover more robust and accurate depth estimation and 3D scene models. Our algorithm can work effectively in the narrow- and wide-baseline environments and are especially suitable for large-scale scene reconstruction. Conclusion This study shows how to recover an accurate 3D scene model based on multiple image features and triangulated geometric features in wide-baseline environments. We use an MRF model to estimate the planar parameter of superpixel in different views, and high-level semantic prior is integrated to guide the superpixel merging with similar depths. Furthermore, an iterative framework is proposed to optimize the depth of the scene and the 3D scene model progressively. Experimental results show that our proposed algorithm can achieve more accurate 3D scene model than the classic algorithm in different wide-baseline image datasets.