Current Issue Cover

李梦晗1, 高翔2, 解则晓1, 申抒含2(1.中国海洋大学;2.中国科学院自动化研究所)

摘 要
目的 全局式从运动恢复结构(structure from motion, SfM)通过运动平均一次性恢复所有相机的绝对位姿,效率相对较高。运动平均中的平移平均主要负责解算相机在世界坐标系下的绝对位置,其求解过程因尺度歧义性、估计敏感性和求解不确定性的影响而较为困难。文中提出了一种基于增量尺度估计的平移平均方法,在消除尺度歧义性的同时提升了求解鲁棒性与准确性。方法 具体而言,文中将平移平均问题解耦为三个子问题:1)局部绝对尺度的增量式估计;2)全局绝对尺度的增量式估计;3)基于L1优化的尺度已知的绝对位置估计。结果 在1DSfM数据集上进行对比实验,基线解算精度明显提升,解算相机百分比的均值达到96%。当引入两种不同的绝对旋转进行计算时,其绝对位置中值误差仅略差于BATA (bilinear angle-based translation averaging)与CReTA (correspondence reweighted translation averaging),排名第3,均值误差改善更为明显,分别排名第1与第2。相较于原始方法,文中方法在相机解算数量与位置解算精度上均有较大提升。结论 文中方法综合了尺度分离思想与增量式参数估计思想,既消除了尺度歧义性,又保证了鲁棒性与高效性,求解所得的相机绝对位置稳定可靠。
Incremental scale estimation-based camera location recovery

(Institute of Automation, Chinese Academy of Sciences)

Objective The structure from motion (SfM) technique serves as the fundamental step in the sparse reconstruction process, finding extensive applications in remote sensing mapping, indoor modeling, augmented reality, and ancient architecture preservation. SfM technology retrieves camera poses from images, encompassing two main categories: incremental and global approaches. Diverging from the iterative nature of incremental SfM, global SfM simultaneously recovers absolute poses for all cameras through motion averaging, offering relatively higher efficiency while still encountering challenges pertaining to robustness and accuracy. Rotation averaging and translation averaging constitute crucial components within the motion averaging problem. Compared with rotation averaging, translation averaging is more difficult due to the following three reasons: 1) Only relative translation directions could be recovered by essential matrix estimation and decomposition, i.e. the produced relative translations are scale-ambiguous; 2) Only cameras in the same parallel rigid component could their absolute locations be uniquely determined by translation averaging methods, while for rotation averaging, the requirement simply degenerates to the same connected component; 3) Compared with relative rotation, the estimation accuracy of relative translation is more vulnerable to the feature point mismatches and more likely to be oulier-contaminated. In traditional approaches, the translation averaging method based on scale separation (L1SE-L1TA) calculates the relative baseline length between cameras before estimating the absolute locations, eliminates the scale ambiguity, and the solving range is no longer constrained by the camera triplet, but its robustness and accuracy still need to be improved. Incremental translation averaging (ITA) introduces the idea of incremental parameter estimation into the translation averaging process for the first time, which has good robustness and high accuracy. However, its solving process depends on camera triplets and may suffer from degeneracy during collinear camera motion. To solve the above problems, this paper proposes a translation averaging method based on incremental scale estimation (ISE-L1TA), which eliminates the scale ambiguity and improves the robustness of method and the accuracy of results. Method Incremental SfM has been proven to be highly accurate and robust, making it a preferred choice for many applications. It has shown to be particularly effective in handling large datasets and overcoming the challenges posed by complex real-world scenarios. Recognizing its potential, researchers have sought to transfer the incremental parameter estimation ideology to other related tasks, such as incremental rotation averaging (IRA) and ITA. In particular, IRA algorithm is designed to estimate the camera absolute rotations incrementally and efficiently. Meanwhile, ITA algorithm is performed for the camera absolute locations, which enables it to effectively handle outlier rejection and avoid the propagation of error. Overall, the adoption of incremental parameter estimation ideology for motion averaging tasks demonstrates the versatility and effectiveness of this approach. With its ability to handle complex datasets and overcome a range of challenges, the incremental parameter estimation ideology holds great promise for future research in the field of 3D reconstruction and beyond. In this paper, ISE-L1TA is proposed by incorporating the scale separation strategy and incremental parameter estimation ideology. Specifically, the translation averaging problem is decomposed into three sub-ones and sequentially solved: 1) Incremental estimation of local absolute scale. 2) Incremental estimation of global absolute scale. 3) Scale-aware absolute location estimation based on L1 optimization. The input of our proposed method is the pair-wise SIFT (scale invariant feature transform) point matches, and its output is the absolute camera locations. First, the relative motion between cameras is obtained by estimating and decomposing the essential matrix. Next, the two-view triangulation is performed to calculate the relative depths in the local coordinate system. Based on depth ratios, incremental estimations are carried out for both the local and global absolute scales. Subsequently, the relative baseline length between cameras is computed and rotation averaging is performed for absolute rotation estimation, enabling the final scale-aware absolute location estimation. Result In this paper, we performed experimental tests to evaluate the selection of scale distance functions and scale distance thresholds. The experimental results confirmed that the normalized perfect square deviation function effectively eliminates the impact of scaling effects. Furthermore, the incremental scale estimation (ISE) method shows good robustness and insensitivity to scale distance thresholds and achieves significantly higher baseline accuracy compared to L1SE. In addition, when compared with several state-of-the-art methods such as BATA (bilinear angle-based translation averaging), CReTA (correspondence reweighted translation averaging), ITA, and L1SE-L1TA, our proposed method demonstrated the following performance: 1) In terms of the number of cameras solved, the average percentage of successfully solved cameras using our proposed method is 96%. 2) In terms of the median error in absolute location estimation, it is slightly worse than BATA and CReTA, and the overall ranking is at the third place under different absolute rotations. 3) In terms of the mean error in absolute location estimation, our proposed method exhibited a clear advantage, ranking the first and second respectively. Compared with the original L1SE-L1TA, the proposed method has a great improvement in the number of cameras solved and the accuracy of locations estimated. All experiments are performed on the 1DSfM dataset. Conclusion The proposed method in this paper combines the concept of scale separation with incremental parameter estimation. By integrating these two ideas, our method effectively eliminates scale ambiguity while ensuring the effectiveness of outlier rejection and maintaining a concise solving process. As a result, the obtained absolute camera locations are stable and reliable.