Current Issue Cover
结合双边交叉增强与自注意力补偿的点云语义分割

朱仲杰1, 张荣2, 白永强2, 王玉儿2, 孙嘉敏3,4(1.浙江省宁波市浙江万里学院信息与智能工程学院;2.浙江万里学院 宁波市DSP重点实验室;3.1.浙江万里学院 宁波市DSP重点实验室,宁波市 315000;4.2.中国海洋大学 信息科学与工程学院,青岛市 266000)

摘 要
目的 针对现有点云语义分割方法几何与语义特征信息利用不充分导致分割性能不佳、特别是局部细粒度分割精度不足的问题,提出一种结合双边交叉增强与自注意力补偿的充分融合几何与语义上下文信息的点云语义分割新算法以提升分割性能。方法 首先,设计基于双边交叉增强的空间聚合模块,将局部几何与语义上下文信息映射到同一空间进行交叉学习增强后聚合为局部上下文信息。然后,基于自注意力机制提取全局上下文信息与增强后的局部上下文信息进行融合,补偿局部上下文信息的单一性,得到完备特征图。最后,将空间聚合模块各阶段输出的多分辨率特征输入特征融合模块进行多尺度特征融合,得到最终综合特征图以实现高性能语义分割。结果 实验结果表明,在S3DIS数据集上,本文算法的平均交并比(mIoU)、平均类别精度(mAcc)、总体精度(OA)分别为70.2%、81.7%和88.3%,与现有代表算法RandLA-Net相比,分别提高2.4%、2.0%、1.0%。同时,对S3DIS数据集Area 5单独测试,mIoU为66.2%,较RandLA-Net提高5.0%。结论 空间聚合模块不仅充分利用局部几何与语义上下文信息增强局部上下文信息,而且基于自注意力机制融合局部与全局上下文信息,增强了特征的完备性以及局部与全局的关联性,可以有效提升点云局部细粒度的分割精度。在可视化分析中,相较于对比算法,本文算法对点云场景的局部细粒度分割效果明显提升,验证了本文算法的有效性。
关键词
Bilateral cross enhancement with self-attention compensation for semantic segmentation of point clouds

()

Abstract
Background Point cloud semantic segmentation is a computer vision task that aims to segment three-dimensional point cloud data and assign corresponding semantic labels to each point. Specifically, according to the location and other attributes of the point, point cloud semantic segmentation involves assigning each point to predefined semantic categories such as ground, buildings, vehicles, pedestrians, and so on. Existing methods for point cloud semantic segmentation can be broadly categorized into three types: projection-based methods, voxel-based methods, and raw point cloud-based methods. Projection-based methods project the three-dimensional point cloud onto a two-dimensional plane (such as an image) and then apply standard image-based segmentation techniques. Voxel-based methods divide the point cloud space into regular voxel grids and assign semantic labels to each voxel. Both methods require data transformation, which inevitably leads to some loss of feature information. In contrast, raw point cloud-based methods directly process the point cloud without any transformation, thereby ensuring the integrity of the input algorithm network with the original point cloud data. Objective To achieve accurate semantic segmentation tasks, it is necessary to fully consider and utilize both the geometric and semantic feature information of each point in the point cloud scene. Existing point cloud semantic segmentation methods generally extract, process, and utilize geometric and semantic feature information separately, without considering their correlation. This leads to less precise local fine-grained segmentation. Therefore, this paper proposes a new point cloud semantic segmentation algorithm based on bilateral cross-enhancement and self-attention compensation. It not only fully utilizes the geometric and semantic feature information of the point cloud but also constructs offsets between them as a medium for information interaction. In addition, the fusion of local and global feature information is achieved, enhancing feature completeness and overall segmentation performance. This fusion process enhances the integrity of features and ensures the full representation and utilization of local and global contexts during the segmentation process. By considering the overall information of the point cloud scene, this algorithm demonstrates better performance in segmenting both local fine-grained details and larger-scale structures. Method Firstly, the original input point cloud data is preprocessed to extract geometric contextual information and initial semantic contextual information. The geometric contextual information is represented by the original coordinates of the point cloud in three-dimensional space, while the initial semantic contextual information is extracted using a Multilayer Perceptron. Next, a spatial aggregation module is designed, consisting of bilateral cross-enhancement units and self-attention mechanism units. In the bilateral cross-enhancement units, local geometric and semantic contextual information is preliminarily extracted by constructing local neighborhoods for the preprocessed geometric contextual information and initial semantic contextual information. Then, offsets are constructed to facilitate cross-learning and enhancement of the local geometric and semantic contextual information by mapping them onto a common space. Finally, the enhanced local geometric and semantic contextual information is aggregated to local contextual information. Then, using self-attention mechanism, global contextual information is extracted and fused with the local contextual information to compensate for the singularity of the local contextual information, resulting in a comprehensive feature map. Lastly, the multi-resolution feature maps obtained at different stages of the spatial aggregation module are fed into the feature fusion module for multi-scale feature fusion, producing the final comprehensive feature map, thereby achieving high-performance semantic segmentation. Results The experimental results on the S3DIS dataset showed the mean Intersection over Union (mIoU) of 70.2%, mean class accuracy (mAcc) of 81.7%, and overall accuracy (OA) of 88.3%, which are 2.4%, 2.0% and 1.0% higher than the existing representative algorithm RandLA-Net. Meanwhile, for Area 5 of S3DIS dataset, the mIoU is 66.2%, 5.0% higher than RandLA-Net. In addition, visualizations of the segmentation results were achieved on the Semantic3D dataset. Conclusion By utilizing the spatial aggregation module, the proposed algorithm maximizes the utilization of geometric and semantic contextual information, thereby enhancing the details of local contextual information. Additionally, the integration of local and global contextual information through self-attention mechanism ensures comprehensive feature representation. As a result, the proposed algorithm achieves a significant improvement in the segmentation accuracy of fine-grained details in point clouds. Visual analysis further validates the effectiveness of the algorithm. Compared to baseline algorithms, the proposed algorithm demonstrates clear superiority in the fine-grained segmentation of local regions in point cloud scenes. This result serves as partial evidence, confirming the effectiveness of the proposed algorithm in addressing challenges related to point cloud segmentation tasks. In conclusion, the spatial aggregation module and its fusion of local and global contextual information significantly improve the segmentation accuracy of local details in point clouds, offering a promising solution to enhance the segmentation accuracy of fine-grained details in point cloud local regions.
Keywords

订阅号|日报