面向室内场景的主被动融合视觉定位系统
Visual localization system of integrated active and passive perception for indoor scenes
- 2023年28卷第2期 页码:522-534
收稿:2021-07-19,
修回:2021-12-14,
录用:2021-12-21,
纸质出版:2023-02-16
DOI: 10.11834/jig.210603
移动端阅览

浏览全部资源
扫码关注微信
收稿:2021-07-19,
修回:2021-12-14,
录用:2021-12-21,
纸质出版:2023-02-16
移动端阅览
目的
2
视觉定位旨在利用易于获取的RGB图像对运动物体进行目标定位及姿态估计。室内场景中普遍存在的物体遮挡、弱纹理区域等干扰极易造成目标关键点的错误估计,严重影响了视觉定位的精度。针对这一问题,本文提出一种主被动融合的室内定位系统,结合固定视角和移动视角的方案优势,实现室内场景中运动目标的精准定位。
方法
2
提出一种基于平面先验的物体位姿估计方法,在关键点检测的单目定位框架基础上,使用平面约束进行3自由度姿态优化,提升固定视角下室内平面中运动目标的定位稳定性。基于无损卡尔曼滤波算法设计了一套数据融合定位系统,将从固定视角得到的被动式定位结果与从移动视角得到的主动式定位结果进行融合,提升了运动目标的位姿估计结果的可靠性。
结果
2
本文提出的主被动融合室内视觉定位系统在iGibson仿真数据集上的平均定位精度为2~3 cm,定位误差在10 cm内的准确率为99%;在真实场景中平均定位精度为3~4 cm,定位误差在10 cm内的准确率在90%以上,实现了cm级的定位精度。
结论
2
提出的室内视觉定位系统融合了被动式和主动式定位方法的优势,能够以较低设备成本实现室内场景中高精度的目标定位结果,并在遮挡、目标丢失等复杂环境因素干扰下展示出鲁棒的定位性能。
Objective
2
Visual localization is focused on the location and estimation of motion objects via easy-to-use RGB images. The feature-extracted information is challenged to meet the requirements of tasks in traditional computer vision methods in terms of feature extraction algorithms. The deep learning-based feature abstraction and demonstration ability can promote an emerging research issue for pose estimation in computer vision. In addition
the development and application of depth cameras and lasers-based sensors can provide more diverse manners to this issue as well. However
these sensors have some constraints of the entity and shape of the object and it need to be used in a structured environment. Multi-vision ability is often challenged to the issues of installing and debugging problems. In contrast
sensors-visual applications are featured of low cost and less restrictions
and they are easy to be recognized and extended for multiple unstructured scenarios. Interferences are being existed in indoor scenes
such as object occlusion and weak texture areas
which can cause the incorrect estimation of the target points easily and affect the accuracy of visual localization severely. The different methods of camera-deployment can be divided into two categories based on visual object pose estimation method. 1) In order to get the target position data
one category of the two is based on monocular object positioning of pose estimation technology of using the deployment in cameras-fixed in the scene and detecting targets in the images of the relevant information. The pros of positioning result is stable and the cons of it is affected by light and fuzzy image easily
it cannot be dealt with object occlusion in the scene as well due to the limitation of observation angle; 2) The other category of two is oriented on scene reconstruction-based object pose estimation technology
which can use the camera fixed on the target itself to obtain the pose information of the target by detecting the feature points of the scene and matching the features with the 3D scene model constructed in advance. This scheme is derived of the status of texture features. 1) For rich textures and clear features scenes
the accurate positioning results-related can be obtained. 2) For non-texture features scenes and weak texture areas like walls scene
the positioning results are unstable
and other sensors such as inertial measurement unit (IMU) are needed to be positioning-aided. To achieve more precise positions of moving objects in indoor scenes
we propose an active and passive perception-based visual localization system
which combines the advantages of fixed and motion perspectives.
Method
2
First
a plane-prior object pose estimation method is proposed by our research team. Based on the monocular localization framework of keypoint detection
the plane-constraint is used to optimize the 3-DoF (degree of freedom) pose of the object and improve the localization stability under a fixed view. Second
we design a data fusion framework in terms of the unscented Kalman filtering algorithm. To improve the reliability of the pose estimation of the moving target
a fixed view-derived passive perception output and the active perception output are fused from a motion view. The active and passive-integrated indoor visual positioning system is composed of three aspects as mentioned below: 1) passive positioning module
2) active positioning module
and 3) active and passive fusion module. The input of passive positioning module is oriented to RGB image-captured by indoor fixed camera
and the output is based on the target pose data-contained in the image. The input of the active positioning module is the RGB image shot on the perspective of the target to be located
and the output is based on the position and pose-relevant information of the target in the 3D scene. The active and passive fusion module is dealt with the integrating the positioning results of passive and active positioning
and the output is linked to more accurate positioning result of the target in the indoor scene.
Result
2
The average localization error of the indoor visual localization system proposed can reach 2~3 cm on the iGibson simulation dataset
and the accuracy of the 10 cm-within localization error can reach to 99%. In the real scenes
the average localization error can reach 3~4 cm
and the accuracy of the localization error within 10 cm is above 90%. Experimental results are shown our proposed system can obtain centimeter-level accurate positioning. The experimental results of real scenes illustrate that the active and passive fusion visual positioning system can reduce the external interference of passive positioning algorithm under fixed visual angle effectively due to the limitation of visual angle
object occlusion and other external disturbances
and it also can optimize the defects of single frame positioning algorithm with insufficient stability and large random error.
Conclusion
2
Our visual localization system has its potentials to the integrated advantages of passive-based and active-based methods
which can achieve high-precision positioning results in indoor scenes at a low cost. It also shows better robust performance under complex interference such as occlusion and target-missed. We develop a lossless Kalman filter based framework of active and passive fusion indoor visual positioning system for indoor mobile robot operation. Compared to the existing visual positioning algorithm
it can achieve high-precision target positioning results in indoor scenes with lower equipment cost. And
under the shade circumstances
the loss of target under complex environment factors shows robust positioning performance and the indoor scene visual centimeter-level accuracy-purified of positioning. The performance is tested and validated in simulation and the physical environment both. The experimental results show that the positioning system has its priority on high positioning accuracy and robustness for multiple scenarios further.
Deng J and Czarnecki K. 2019. MLOD: a multi-view 3D object detection based on robust feature fusion method//Proceedings of 2019 IEEE Intelligent Transportation Systems Conference (ITSC). Auckland, New Zealand: IEEE: 279-284 [ DOI: 10.1109/ITSC.2019.8917126 http://dx.doi.org/10.1109/ITSC.2019.8917126 ]
DeTone D, Malisiewicz T and Rabinovich A. 2018. SuperPoint: self-supervised interest point detection and description// Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Salt Lake City, USA: IEEE: 337-349 [ DOI: 10.1109/CVPRW.2018.00060 http://dx.doi.org/10.1109/CVPRW.2018.00060 ]
Hinterstoisser S, Lepetit V, Ilic S, Holzer S, Bradski G, Konolige K and Navab N. 2013. Model based training, detection and pose estimation of texture-less 3D objects in heavily cluttered scenes//Proceedings of the 11th Asian Conference on Computer Vision. Daejeon, Korea(South): Springer: 548-562 [ DOI: 10.1007/978-3-642-37331-2_42 http://dx.doi.org/10.1007/978-3-642-37331-2_42 ]
Kendall A, Grimes M and Cipolla R. 2015. PoseNet: a convolutional network for real-time 6-DOF camera relocalization//Proceedings of 2015 IEEE International Conference on Computer Vision (ICCV). Santiago, Chile: IEEE: 2938-2946 [ DOI: 10.1109/ICCV.2015.336 http://dx.doi.org/10.1109/ICCV.2015.336 ]
Li Y C, Hu Z Z, Hu Y Z and Wu H W. 2017. Accurate localization based on GPS and image fusion for intelligent vehicles. Journal of Transportation Systems Engineering and Information Technology, 17(3): 112-119 [DOI: 10.16097/j.cnki.1009-6744.2017.03.017]
Liu C, Zhu F and Xia R B. 2012. Monocular pose determination from coplanar two points and one line features. Application Research of Computers, 29(8): 3145-3147
刘昶, 朱枫, 夏仁波. 2012. 基于共面二点一线特征的单目视觉定位. 计算机应用研究, 29(8): 3145-3147 [DOI: 10.3969/j.issn.1001-3695.2012.08.090]
Michel F, Kirillov A, Brachmann E, Krull A, Gumhold S, Savchynskyy B and Rother C. 2017. Global hypothesis generation for 6D object pose estimation//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 115-124 [ DOI: 10.1109/CVPR.2017.20 http://dx.doi.org/10.1109/CVPR.2017.20 ]
Peng S D, Liu Y, Huang Q X, Zhou X W and Bao H J. 2019. PVNet: pixel-wise voting network for 6DoF pose estimation//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE: 4556-4565 [ DOI: 10.1109/CVPR.2019.00469 http://dx.doi.org/10.1109/CVPR.2019.00469 ]
Qin T, Li P L and Shen S J. 2018. VINS-Mono: a robust and versatile monocular visual-inertial state estimator. IEEE Transactions on Robotics, 34(4): 1004-1020 [DOI: 10.1109/TRO.2018.2853729]
Sarlin P E, Cadena C, Siegwart R and Dymczyk M. 2019. From coarse to fine: robust hierarchical localization at large scale//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE: 12708-12717 [ DOI: 10.1109/CVPR.2019.01300 http://dx.doi.org/10.1109/CVPR.2019.01300 ]
Sattler T, Leibe B and Kobbelt L. 2017. Efficient and effective prioritized matching for large-scale image-based localization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(9): 1744-1756 [DOI: 10.1109/TPAMI.2016.2611662]
Schönberger J L, Hardmeier H, Sattler T and Pollefeys M. 2017. Comparative evaluation of hand-crafted and learned local features//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE: 6959-6968 [ DOI: 10.1109/CVPR.2017.736 http://dx.doi.org/10.1109/CVPR.2017.736 ]
Sundermeyer M, Marton Z C, Durner M, Brucker M and Triebel R. 2018. Implicit 3D orientation learning for 6D object detection from rgb images//Proceedings of the 15th European Conference on Computer Vision (ECCV). Munich, Germany: Springer: 712-729 [ DOI: 10.1007/978-3-030-01231-1_43 http://dx.doi.org/10.1007/978-3-030-01231-1_43 ]
Taira H, Okutomi M, Sattler T, Cimpoi M, Pollefeys M, Sivic J, Pajdla T and Torii A. 2018. InLoc: indoor visual localization with dense matching and view synthesis//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 7199-7209 [ DOI: 10.1109/CVPR.2018.00752 http://dx.doi.org/10.1109/CVPR.2018.00752 ]
Tang S T, Tang C Z, Huang R, Zhu S Y and Tan P. 2021. Learning camera localization via dense scene matching//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 1831-1841 [ DOI: 10.1109/CVPR46437.2021.00187 http://dx.doi.org/10.1109/CVPR46437.2021.00187 ]
Torii A, Arandjelovié R, Sivic J, Okutomi M and Pajdla T. 2015.24/7 place recognition by view synthesis//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston, USA: IEEE: 1808-1817 [ DOI: 10.1109/CVPR.2015.7298790 http://dx.doi.org/10.1109/CVPR.2015.7298790 ]
Umeyama S. 1991. Least-squares estimation of transformation parameters between two point patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence, 13(4): 376-380 [DOI: 10.1109/34.88573]
Wang C, Xu D F, Zhu Y K, Martín-Martín R, Lu C W, Li F F and Savarese S. 2019. DenseFusion: 6D object pose estimation by iterative dense fusion//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 3338-3347 [ DOI: 10.1109/CVPR.2019.00346 http://dx.doi.org/10.1109/CVPR.2019.00346 ]
Xia F, Shen W B, Li C S, Kasimbeg P, Tchapmi M E, Toshev A, Martín-Martín R and Savarese S. 2020. Interactive Gibson benchmark: a benchmark for interactive navigation in cluttered environments. IEEE Robotics and Automation Letters, 5(2): 713-720 [DOI: 10.1109/LRA.2020.2965078]
Xiang Y, Schmidt T, Narayanan V and Fox D. 2018. PoseCNN: a convolutional neural network for 6D object pose estimation in cluttered scenes//Proceedings of the 14th Robotics: Science and Systems. Pittsburgh, USA: [s. n.]: #19
Xie F, Wu J, Huang L, Zhao J, Liu X X and Qian W X. 2020 A robot indoor monocular vision positioning algorithm based on end-to-end model. Journal of Chinese Inertial Technology, 28(4): 493-498, 560
谢非, 吴俊, 黄磊, 赵静, 刘锡祥, 钱伟行. 2020. 基于端到端模型的机器人室内单目视觉定位算法. 中国惯性技术学报, 28(4): 493-498, 560 [DOI: 10.13695/j.cnki.12-1222/o3.2020.04.012]
Xue J R, Wang D, Du S Y, Cui D X, Huang Y and Zheng N N. 2017. A vision-centered multi-sensor fusing approach to self-localization and obstacle perception for robotic cars. Frontiers of Information Technology and Electronic Engineering, 18(1): 122-138 [DOI: 10.1631/FITEE.1601873]
Yang C K, Zeng J and Huang H. 2009. Discussion of Kalman filter in multi-sensor fusion. Modern Electronics Technique, 32(14): 159-161, 164
杨承凯, 曾军, 黄华. 2009. 多传感器融合中的卡尔曼滤波探讨. 现代电子技术, 32(14): 159-161, 164 [DOI: 10.3969/j.issn.1004-373X.2009.14.050]
Zhou L, Luo Z X, Shen T W, Zhang J H, Zhen M M, Yao Y, Fang T and Quan L. 2020. KFNet: learning temporal camera relocalization using kalman filtering//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 4918-4927 [ DOI: 10.1109/CVPR42600.2020.00497 http://dx.doi.org/10.1109/CVPR42600.2020.00497 ]
Zhou X Y, Wang D Q and Krähenbühl P. 2019. Objects as points [EB/OL]. [2021-07-15] . http://arxiv.org/pdf/1904.07850.pdf http://arxiv.org/pdf/1904.07850.pdf
Zhu M L, Derpanis K G, Yang Y F, Brahmbhatt S, Zhang M, Phillips C, Lecce M and Daniilidis K. 2014. Single image 3D object detection and pose estimation for grasping//Proceedings of 2014 IEEE International Conference on Robotics and Automation (ICRA). Hong Kong, China: IEEE: 3936-3943 [ DOI: 10.1109/ICRA.2014.6907430 http://dx.doi.org/10.1109/ICRA.2014.6907430 ]
相关作者
相关机构
京公网安备11010802024621