Current Issue Cover
结合语义信息与3D点云技术的未知环境地图构建方法

马淼, 刘培敏, 潘海鹏(浙江理工大学信息科学与工程学院, 杭州 310018)

摘 要
目的 机器人在进行同时定位与地图构建(simultaneous localization and mapping,SLAM)时需要有效利用未知复杂环境的场景信息,针对现有SLAM算法对场景细节理解不够及建图细节信息缺失的问题,本文构造出一种将SLAM点云定位技术与语义分割网络相结合的未知环境地图构建方法,实现高精度三维地图重建。方法 首先,利用场景的实时彩色信息进行相机的位姿估计,并构造融合空间多尺度稀疏及稠密特征的深度学习网络HieSemNet(hierarchical semantic network),对未知场景信息进行语义分割,得到场景的实时二维语义信息;其次,利用深度信息和相机位姿进行空间点云估计,并将二维语义分割信息与三维点云信息融合,使语义分割的结果对应到点云的相应空间位置,构建出具有语义信息的高精度点云地图,实现三维地图重建。结果 为验证本文方法的有效性,分别针对所构造的HieSemNet网络和语义SLAM系统进行验证实验。实验结果表明,本文的网络在平均像素准确度和平均交并比上均取得了较好的精度,MPA(mean pixel accuracy)指标相较于其他网络分别提高了17.47%、11.67%、4.86%、2.90%和0.44%,MIoU(mean intersection over union)指标分别提高了13.94%、1.10%、6.28%、2.28%和0.62%。本文的SLAM算法可以获得更多的建图信息,构建的地图精度和准确度都更好。结论 本文方法充分考虑了不同尺寸物体的分割效果,提出的HieSemNet网络能有效提高场景语义分割准确性,此外,与现有的前沿语义SLAM系统相比,本文方法能够明显提高建图的精度和准确度,获得更高质量的地图。
关键词
The 3D point cloud based semantic information-relevant map construction method for unrecognized scenario

Ma Miao, Liu Peimin, Pan Haipeng(School of Information Science and Engineering, Zhejiang Sci-Tech University, Hangzhou 310018, China)

Abstract
Objective With the continuous in-depth development of computer technology and artificial intelligence,the intelligent robot contexts have been developing intensively. The simultaneous localization and mapping(SLAM)can be as an effective robot-related technique to recognize scene information. Simultaneous localization and mapping is focused on robot motion location starting from the unknown position of the unknown environment while its own position can be identified and located through the observed map features,and a complete map of the scene is then constructed based on its own posture and trajectory. The environment map constructed by traditional SLAM lacks semantic information,and the robot cannot recognize the scene environment to a certain extent. To achieve the ability to perceive increasingly complex scenes, some scholars have been focused on introducing deep learning methods into SLAM systems to achieve the recognition of scenario objects. However,there are still some challenging problems to be resolved for insufficient scene recognition and map building. SLAM tasks-related robots are required to explore unknown environments and use effective scene information of complex environments. Aiming at the problems that the existing SLAM algorithms understanding insufficiently of scene details and lack of information of map building details,as well as the existing semantic segmentation algorithms do not perform well in the segmentation of multi-scale objects,have slow segmentation speed and indistinct segmentation pictures, We develop main research objectives of improving the recognition ability of the semantic segmentation algorithm for multiscale objects and improving the accuracy and precision of map construction by semantic SLAM technology. A method of unknown environment-related map construction is constructed linked with SLAM point cloud localization technology and semantic segmentation network,which can identify objects of different sizes in the scene effectively and realize highprecision 3D map reconstruction. Method We design a spatial multi-scale sparse and dense features-fused deep learning semantic segmentation network,which is called hierarchical semantic network(HieSemNet). A spatial pyramid module is opted with different dilation rates of dilated convolution,and to capture global contextual information,such features can be extracted using multi-scale structure. To extract features deliberately,the network consists of two branches of the feature extraction base network and the spatial pyramid module. Besides,to supervise the training and calculate the loss function, the semantic labels can be used solely at different scales of the two branches. The final feature map can be generated in terms of weighted fusion method of the feature maps of the two branches. The built semantic segmentation network is then applied to the SLAM system,and the map construction is completed by three modules:tracking,local mapping and LoopClosing. The tracking module extracts ORB (oriented FAST and rotated BRIEF) features from the image sequences acquired by the RGB-D camera,determines key frames based on the ORB feature point pairs between frames and performs camera pose estimation. The local mapping module further filters the inserted key frames,then calculates and filters the map points associated with the key frames. The LoopClosing module performs optimization and updates the generated maps. The steps of the algorithm are as follows:First,it uses the real-time color information of the scene captured by RGBD camera for camera’s positional estimation and trajectory calculation. And then,to achieve semantic segmentation of unknown scene information and obtain real-time 2D semantic information of the scene,it constructs HieSemNet in the context of a deep learning network fusing spatial multiscale sparse and dense features. Second,spatial point cloud estimation using depth information and camera poses to construct an octree of spatial relations of point clouds. Finally,to build a highprecision point cloud map with semantic information and realizing 3D map reconstruction,the semantic segmentation 2D information is fused with 3D point cloud information,and the result of semantic segmentation can correspond the corresponding spatial position of the octree. Result To verify the effectiveness of the method proposed,validation experiments are conducted for the constructed HieSemNet and the semantic SLAM system. The HieSemNet analysis is compared to other related frontier networks full connected network(FCN),segmentation network(SegNet),PSPNet(pyramid scene parsing network),DeepLabv3 and SETR(segmentation transformer)in terms of segmentation accuracy on the classical semantic segmentation dataset ADE20k. The experimental results show that the network proposed has its potentials for mean pixel accuracy and mean intersection over union. Since the HieSemNet can obtain a large perceptual field using dilated convolution without losing too much detail information,it can have much more accurate segmentation results for both of large-size targets and small-size objects. Compared to the above network,the mean pixel accuracy value of the networks can be improved by 17. 47%,11. 67%,4. 86%,2. 90% and 0. 44%,respectively,and the mean intersection over union value can be improved by 13. 94%,1. 10%,6. 28%,2. 28% and 0. 62%,respectively as well. The proposed SLAM algorithm is tested in related to such contexts of office scenes,warehouse scenes of TUM RGB-D dataset and natural environment. This paper shows the map building process,the trajectory accuracy and absolute trajectory error for three of different scenes by the SLAM algorithm. The comparative results show that our constructed maps can obtain more information for map building,fewer blank or wrong parts in the maps,the contour and position information of objects in the maps constructed is more accurate,and the adverse effects caused by small and chaotic objects are less. It is able to show the actual scene more accurately. Conclusion The segmentation effect of objects of different sizes can be fully involved in,and the proposed HieSemNet network can be used to improve the scene semantic segmentation accuracy potentially.
Keywords

订阅号|日报