Current Issue Cover
单目同时定位与建图中的地图恢复融合技术

张剑华, 王燕燕, 王曾媛, 陈胜勇, 管秋(浙江工业大学计算机科学与技术学院, 杭州 310023)

摘 要
目的 传统的单目视觉SLAM(simultaneous localization and mapping)跟踪失败后需要相机重新回到丢失的位置才能重定位并恢复建图,这极大限制了单目SLAM的应用场景。为解决这一问题,提出一种基于视觉惯性传感器融合的地图恢复融合算法。方法 当系统跟踪失败,仅由惯性传感器提供相机位姿,通过对系统重新初始化并结合惯性传感器提供的丢失部分的相机位姿将丢失前的地图融合到当前的地图中;为解决视觉跟踪丢失期间由惯性测量计算导致的相机位姿误差,提出了一种以关键帧之间的共视关系为依据的跳跃式的匹配搜索策略,快速获得匹配地图点,再通过非线性优化求解匹配点之间的运动估计,进行误差补偿,获得更加准确的相机位姿,并删减融合后重复的点云;最后建立前后两个地图中关键帧之间与地图点之间的联系,用于联合优化后续的跟踪建图过程中相机位姿和地图点位置。结果 利用Euroc数据集及其他数据进行地图精度和地图完整性实验,在精度方面,将本文算法得到的轨迹与ground truth和未丢失情况下得到的轨迹进行对比,结果表明,在SLAM系统跟踪失败的情况下,此方法能有效解决系统无法继续跟踪建图的问题,其精度可达厘米级别。在30 m2的室内环境中,仅有9 cm的误差,而在300 m2工厂环境中误差仅有7 cm。在完整性方面,在相机运动较剧烈的情况下,恢复地图的完整性优于ORB_SLAM的重定位算法,通过本文算法得到的地图关键帧数量比ORB_SLAM多30%。结论 本文提出的算法在单目视觉SLAM系统跟踪失败之后,仍然能够继续跟踪建图,不会丢失相机轨迹。此外,无需相机回到丢失之前的场景中,只需相机观察到部分丢失前场景,即可恢复融合所有地图。本文算法不仅保证了恢复地图的精度,还保证了建图的完整性。与传统的重定位方法相比,本文算法在系统建图较少时跟踪失败的情况下效果更好。
关键词
Recovery of lost map for monocular simultaneous localization and mapping

Zhang Jianhua, Wang Yanyan, Wang Zengyuan, Chen Shengyong, Guan Qiu(College of Computer Science and Technology, Zhejiang University of Technology, Hangzhou 310023, China)

Abstract
Objective The traditional SLAM system requires a camera to return to the position where tracking is lost to restart location and mapping, thereby greatly limiting the application of monocular SLAM scenarios. Thus, a new and highly adaptive recovery strategy is required. In a traditional keyframe-based SLAM system, relocalization is achieved by finding the keyframes (a reference camera pose and image) that match the current frame to estimate the camera pose. Localization only to previously visited places where keyframes were added to the map is possible. In practical application, relocalization is not a user-friendly solution because users prefer to walk into previous unseen areas where no associated keyframes exist. Moreover, finding matches between the two frames is difficult when the scene changes considerably. Tracking resumption can even be difficult when the camera arrives at the place from where it has been. The motion and map message during lost time cannot be recovered, which leads to an imperfect map. Thus, this study proposes a strategy of monocular vision combined with inertial measurement unit (IMU) to recover a lost map based on map fusion. Method The ORB_SLAM system incorporates three threads that run in parallel:tracking, local mapping, and loop closing. Our algorithm improved on the basis of ORB_SLAM. When tracking fails in conditions with motion blur or occlusions, our system immediately restarts a new map. The first map created before tracking lost is saved, and relocalization is achieved by fusing the two maps. A new map is created by re-initializing. Compared with relocalization, initialization is achievable, whereas the scale of the camera trajectory computed by ORB_SLAM is arbitrary. The inconsistent scale must be solved first to fuse the two maps. The absolute scale of the map is calculated by combining monocular vision with IMU. The IMU sensor can recover the metric scale for monocular vision and is affected by the limitation of vision. The position of the camera is calculated by IMU data before the initialization is completed. The pose information during lost time can be used as a bridge to fuse the two maps when the new map is initialized. The process of map fusion consists of three steps:1) Coordinate transformation. The coordinate system of the new map has changed after re-initialization. The transformation between the new map and the lost map can be computed by IMU during the lost time, which is applied to the lost map to move the lost map onto the new map. 2) Matching strategy. The data measured by IMU contains various errors unavoidably affect the result of the map fusion. Thus, vision information is considered to eliminate errors. A jumping matching search strategy is proposed according to the covisibility relationship between keyframes to reduce matching time. We match keyframes selected from the two maps according to a certain condition instead of matching them individually. Thus, a considerable amount of matching time can be reduced. 3) Estimating the error between the two maps. After a set of matching map points is obtained, motion estimation between the matching points is solved by nonlinear optimization, which is applied to the two maps to reduce errors. Finally, the new map points in the matching map points are erased, and the two maps are merged. The relationship of the keyframes and the map points between the old and new maps is established, which are used to jointly optimize the pose of the camera and the position of the map points in subsequent tracking and mapping. Result The algorithm is tested on an open dataset (visual-inertial datasets collected onboard a micro aerial vehicle). Results show that our method can effectively solve the system's inability to keep the map and location when SLAM system tracking fails. The experiment includes map accuracy and map completeness. When referring to accuracy, the trajectory of the fusion map achieves a typical precision of 8 cm for a 30 m2 room environment and a 300 m2 industrial environment compared with the ground truth. We compare fusion trajectory with the trajectory without loss and find that the precision can reach 4 cm. In case the camera moves violently, the map recovered by our algorithm is more complete than that by the ORB_SLAM relocalization algorithm. The result of our algorithm contains 30% more keyframes than the ORB_SLAM algorithm. Conclusion The proposed algorithm can retrieve tracking by merging maps if tracking is lost and recover the trajectory during the lost time. Compared with the traditional SLAM system, our relocalization strategy does not need the camera to return to the previous place visited but only requires one to observe part of the place before the loss. Our algorithm is more adaptive than the ORB_SLAM algorithm in case the tracking fails soon after initialization.
Keywords

订阅号|日报