Current Issue Cover
定位图像匹配尺度与区域的摄像机位姿实时跟踪

苗菁华, 孙延奎(清华大学计算机科学与技术系, 北京 100084)

摘 要
目的 提出一种定位图像匹配尺度及区域的有效算法,通过实现当前屏幕图像特征点与模板图像中对应尺度下部分区域中的特征点匹配,实现摄像机对模板图像的实时跟踪,解决3维跟踪算法中匹配精度与效率问题。方法 在预处理阶段,算法对模板图像建立多尺度表示,各尺度下的图像进行区域划分,在每个区域内采用ORB(oriented FAST and rotated BRIEF)方法提取特征点并生成描述子,由此构建图像特征点的分级分区管理模式。在实时跟踪阶段,对于当前摄像机获得的图像,首先定位该图像所对应的尺度范围,在相应尺度范围内确定与当前图像重叠度大的图像区域,然后将当前图像与模板图像对应的尺度与区域中的特征点集进行匹配,最后根据匹配点对计算摄像机的位姿。结果 利用公开图像数据库(stanford mobile visual search dataset)中不同分辨率的模板图像及更多图像进行实验,结果表明,本文算法性能稳定,配准误差在1个像素左右;系统运行帧率总体稳定在2030 帧/s。结论 与多种经典算法对比,新方法能够更好地定位图像匹配尺度与区域,采用这种局部特征点匹配的方法在配准精度与计算效率方面比现有方法有明显提升,并且当模板图像分辨率较高时性能更好,特别适合移动增强现实应用。
关键词
Real-time camera pose tracking with locating image patching scales and regions

Miao Jinghua, Sun Yankui(Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China)

Abstract
Objective In a conventional augmented reality system, the multi-scale image representations of a template image are constructed first. Feature key points at each scale are extracted and put together as a template feature set, which is used to match with the feature points extracted from the camera images. The number of feature points of the template image would become large when the number of scales in the template image representations is large. Nevertheless, camera images only correspond to images within a scale range similar to the scale of the camera image, and they probably overlapped with these images in partial regions. This result means that an ample amount of useless computation exists in conventional feature matching algorithms, thereby simultaneously lowering the image matching speed and decreasing registration accuracy. This paper proposes an effective method to locate image matching scales and regions in camera pose tracing and solve the preceding problem mentioned. Using local feature patching between current camera image features and the corresponding image scales and features of template image pyramid of regions achieves real-time computation of camera pose by feature matching pairs to solve feature matching accuracy and efficiency problem of the traditional three-dimensional tracing method. Method In the preprocessing stage, scale-space layers of a template image are constructed first. Concretely, an image is obtained by down-sampling the original image by a factor of 1.5, and it is sequenced as the second layer. On the condition that image resolution at the maximum layer is only less than that of the screen image specified, the other layers are formed by progressively half-sampling the original image and the second layer image and putting the two sequences alternately. Secondly, the key frame structure for each layer image is built. Specifically, each layer image is partitioned into the same rectangular regions, which could be overlapped when necessary. The size of the rectangular region is selected similar to that of the layer image at the maximum scale in scale-space layers. In each region, feature points are extracted and binary descriptors are generated by using the oriented FAST and rotated BRIEF algorithm, putting every rectangular position, sub-image, and feature points within it together to form a key frame structure. By this way, the feature descriptors of the image pyramid are managed according to scales and regions. In the real-time tracking stage, the scale range for any camera image within the image pyramid is located first. The covered image regions within this scale range are found using defined overlapping degree rules, thereby decreasing the scope of feature matching between current camera image features and template image pyramid and improving feature matching accuracy and efficiency by using local feature matching. 1) In locating scale range, a camera image, which is obtained in a distance to a template image, essentially corresponds to a scale range in the image pyramid of the template image and overlaps with some image regions in the scale range. This paper suggests a method for locating the scale range. First, this method predicts current camera pose in two ways: using the last frame camera pose and predicting the pose by Kalman filtering; four vertices of the original image are projected on the screen image with the evaluated camera pose; finally, the projection area size is obtained and used to compare with the layer image sizes in the image pyramid to determine the scale range. 2) In calculating the degree of region overlapping, we project all their key frame regions in layer images within the scale range onto the screen image with the evaluated camera pose to calculate the areas of the overlapped regions; the region overlapping degree is calculated through our method. 3) In local feature extraction and matching, a number of key frames with a large region overlapping degrees are obtained from the camera image by using the last frame camera pose as the evaluation; other key frames are obtained similarly by using pose evaluation from Kalman filtering. We consider the union of the two key frame sets and match all their feature points with those extracted from the camera image through the ORB algorithm and compute the camera pose by using some matching pairs. Result The new algorithm is implemented and run on a smartphone, tested on an open image database(Stanford mobile visual search dataset) with different resolution images and on other template images. This new algorithm is compared with four advanced algorithms, namely, fast locating of image scale and area, ORB, FREAK(fast retina keypoint), and BRISK(binary robust invariant scalable keypoints). In experiments, videos are recorded and used for all testing template images, where camera translations, rotations, and scaling-related template images are included. The optimal parameters of the ORB, FREAK, and BRISK algorithms are selected by analysis and tests, and the registration error and running frame rates are tested before and after, respectively, integrating our feature matching algorithm with the optical flow algorithm. Experimental results show that our new algorithm is robust and has high registration accuracy with approximately one pixel and has a real-time 3D tracking rate of 20-30 frames per second. Conclusion The algorithm can locate an image scale and region much better than before. The feature matching accuracy and speed between the current camera and template images increase obviously compared with several classic algorithms, especially when the resolution of the image is high. This algorithm can be used to track the natural image on a mobile platform.
Keywords

订阅号|日报