Current Issue Cover
结合LiDAR与RGB数据构建稠密深度图的多阶段指导网络

贾迪1,2, 王子滔1, 李宇扬1, 金志楊1, 刘泽洋1, 吴思1(1.辽宁工程技术大学电子与信息工程学院, 葫芦岛 125105;2.辽宁工程技术大学电器与控制工程学院, 葫芦岛 125105)

摘 要
目的 使用单幅RGB图像引导稀疏激光雷达(light detection and ranging,LiDAR)点云构建稠密深度图已逐渐成为研究热点,然而现有方法在构建场景深度信息时,目标边缘处的深度依然存在模糊的问题,影响3维重建与摄影测量的准确性。为此,本文提出一种基于多阶段指导网络的稠密深度图构建方法。方法 多阶段指导网络由指导信息引导路径和RGB信息引导路径构成。在指导信息引导路径上,通过ERF (efficient residual factorized)网络融合稀疏激光雷达点云和RGB数据提取前期指导信息,采用指导信息处理模块融合稀疏深度和前期指导信息,并将融合后的信息通过双线性插值的方式构建出表面法线,将多模态信息融合指导模块提取的中期指导信息和表面法线信息输入到ERF网络中,提取可用于引导稀疏深度稠密化的后期指导信息,以此构建该路径上的稠密深度图;在RGB信息引导路径上,通过前期指导信息引导融合稀疏深度与RGB信息,通过多模态信息融合指导模块获得该路径上的稠密深度图,采用精细化模块减少该稠密深度图中的误差信息。融合上述两条路径得到的结果,获得最终稠密深度图。结果 通过KITTI (Karlsruhe Institute of Technology and Toyota Technological Institute at Chicago)深度估计数据集训练多阶段指导网络,将测试数据结果提交到KITTI官方评估服务器,评估指标中,均方根误差值和反演深度的均方根误差分别为768.35和2.40,均低于对比方法,且本文方法在物体边缘和细节处的构建精度更高。结论 本文给出的多阶段指导网络可以更好地提高稠密深度图构建准确率,弥补激光雷达点云稀疏的缺陷,实验结果验证了本文方法的有效性。
关键词
Multi-stage guidance network for constructing dense depth map based on LiDAR and RGB data

Jia Di1,2, Wang Zitao1, Li Yuyang1, Jin Zhiyang1, Liu Zeyang1, Wu Si1(1.School of Electronic and Information Engineering, Liaoning Technical University, Huludao 125105, China;2.Faculty of Electrical and Control Engineering, Liaoning Technical University, Huludao 125105, China)

Abstract
Objective Recently, depth information plays an important role in the field of autonomous driving and robot navigation, but the sparse depth collected by light detection and ranging (LiDAR) has sparse and noisy deficiencies. To solve such problems, several recently proposed methods that use a single image to guide sparse depth to construct the dense depth map have shown good performance. However, many methods cannot perfectly learn the depth information about edges and details of the object. This paper proposes a multistage guidance network model to cope with this challenge. The deformable convolution and efficient residual factorized(ERF) network are introduced into the network model, and the quality of the dense depth map is improved from the angle of the geometric constraint by surface normal information. The depth and guidance information extracted in the network is dominated, and the information extracted in the RGB picture is used as the guidance information to guide the sparse depth densification and correct the error in depth information. Method The multistage guidance network is composed of guidance information guidance path and RGB information guidance path. On the path of guidance information guidance, first, the sparse depth information and RGB images are merged through the ERF network to obtain the initial guidance information, and the sparse depth information and the initial guidance information are input into the guidance information processing module to construct the surface normal. Second, the surface normal and the midterm guidance information obtained by the multimodal information fusion guidance module are input into the ERF network, and the later guidance information containing rich depth information is extracted under the action of the surface normal. The later guidance information is used to guide the sparse depth densification. At the same time, the sparse depth is introduced again to make up for the depth information ignored in the early stage, and then the dense depth map constructed on this path is obtained. On the RGB information guidance path, the initial guidance information can be used to guide the fusion of the sparse depth and the information extracted from the RGB picture, and reduce the influence of sparse depth noise and sparsity. The midterm guidance information and initial dense depth map with rich depth information can be extracted from the multimodal information fusion guidance module. However, the initial dense depth map still contains error information. Through the refined module to correct the dense depth map, the accurate dense depth map can be obtained. The network adds sparse depth and guidance information by adding an operation, which can effectively guide sparse depth densification. Using cascading operation can effectively retain their respective features in different information, which causes the network or module to extract more features. Overall, the initial guidance information is extracted by entering information, which promotes the construction of surface normal and guides the fusion of sparse depth and RGB information. The midterm guidance information is obtained by the multimodal information fusion guidance module, which is the key information to connect two paths. The later guidance information is obtained by fusing the midterm guidance information and the surface normal, which is used to guide the sparse depth densification. From the two paths, on the guidance information guidance path, a dense depth map is constructed by the initial, midterm, and later guidance information to guide the sparse depth; on the RGB information guidance path, the multimodal information fusion guidance module guides the sparse depth through the RGB information. Result The proposed network is implemented using PyTorch and Adam optimizer. The parameters of the Adam optimizer are set to β1=0.9 and β2=0.999. The image input to the network is cropped to 256×512 pixels, the graphics card is NVIDIA 3090, the batch size is set to 6, and 30 rounds of training are performed. The initial learning rate is 0.000 125, and the learning rate is reduced by half every 5 rounds. The Karlsruhe Institute of Technology and Toyota Technological Institute at Chicago (KITTI) depth estimation data contains more than 93 000 pairs of ground truth data, aligned LiDAR sparse depth data, and RGB pictures. A total of 85 898 pairs of data can be used to train, and the officially distributed 1 000 pairs of validation set data with ground truth data and 1 000 pairs of test set data without ground truth data can be used to test. The experimental results can be evaluated directly due to the validation set with ground truth data. The test set without ground truth data and the experimental results are required to be submitted to the KITTI official evaluation server to obtain public evaluation results, and the result is an important basis for the performance of a fair assessment model. The validation set and test set do not participate in the training of the network model. The mean square error of the root and the mean square error of inversion root in the evaluation indicators are lower than those of the other methods, and the accuracy of the depth information at the edges and details of the object is more evident. Conclusion A multistage guidance network model for dense depth map construction from LiDAR and RGB information is presented in this paper. The guidance information processing module is used to promote the fusion of guidance information and sparse depth. The multimodal information fusion guidance module can learn a large amount of depth information from sparse depth and RGB pictures. The refined module is used to modify the output results of the multimodal information fusion guidance module. In summary, the dense depth map constructed by the multistage guidance network is composed of the guidance information guidance path and the RGB information guidance path. Two strategies build the dense depth map to form a complementary advantage effectively, using more information to obtain more accurate dense depth maps. Experiments on the KITTI depth estimation data set show that using a multistage guidance network can effectively deal with the depth of the edges and details of the object, and improve the construction quality of dense depth maps.
Keywords

订阅号|日报