面向点云几何压缩的隐式编码网络
陈佳慧^{1}, 方广驰^{1}, 李浩然^{1}, 张晔^{1}, 黄小红^{1}, 郭裕兰^{2}(1.中山大学电子与通信工程学院;2.国防科技大学电子科学学院) 摘 要
目的 现有点云几何压缩算法通常将点云转换八叉树或带潜在特征的稀疏点，从而提高数据结构的存储效率。这些方法将点云量化至三维网格点，导致点云所在表面的精度受限于量化分辨率。针对这一问题，本文将点云转化为连续的隐式表征，提出一种基于隐式表征的点云几何压缩算法框架，以克服量化分辨率对压缩质量的不利影响。方法 该框架由基于符号对数距离场的隐式表征与带乘性分支结构的神经网络组成。具体来说，本文在编码阶段利用神经网络拟合隐式表征，并对该网络进行模型压缩，然后在解码阶段结合改进的 Marching Cube 算法重建点云所在表面，采样恢复点云数据。结果 本文在ABC、Famous与 MPEG PCC 数据集上进行了点云表面压缩实验。与基准算法INR相比，本文算法的L1CD损失平均下降了12.4%，Normal Consistency与Fscore指标平均提升1.5%与13.6%，压缩效率随模型参数量增大而提升，平均增幅为12.9%。与几何压缩标准算法GPCC相比，本文算法在10 kB下依然保持55 dB以上的D1PSNR重建性能，有效压缩上限高于GPCC。此外，消融实验分别验证了本文提出的隐式表征和神经网络结构的有效性。结论 实验结果表明，本文提出的点云压缩算法克服了现有算法的分辨率限制，不仅提升了表面重建精度，而且提升了点云表面的压缩效率与有效压缩上限。
关键词
An implicit coding network for point cloud geometry compression
(School of Electronic and Communication Engineering, Sun Yetsen University) Abstract
Objective Point clouds captured by depth sensors or generated by reconstruction algorithms are essential for various 3D vision tasks, including 3D scene understanding, scan registration, and 3D reconstruction. However, a simple scene or object contains a massive amount of unstructured points, leading to challenges in the storage and transmission of these point cloud data. Therefore, developing point cloud geometry compression algorithms is significant to effectively handle and process point cloud data. Existing point cloud compression algorithms typically involve converting point clouds to storageefficient data structure, such as an octree representation or sparse points with latent features. These intermediate representations are then encoded as a compact bitstream using either handcrafted or learningbased entropy coders. Although the correlation of spatial points effectively improves the compression performance, existing algorithms may not fully exploit these points as the representations of the object surface and topology. Recent studies handled this problem by exploring implicit representations and neural networks for surface reconstruction. However, these methods are primarily tailored for 3D objects represented as occupancy fields and signed distance fields, which limits their applicability to point clouds and nonwatertight meshes in terms of both surface representation and reconstruction. Furthermore, the neural networks used in these approaches often rely on simple MultiLayer Perceptron structures, which may lack capacity and compression efficiency Objective Point clouds captured by depth sensors or generated by reconstruction algorithms are essential for various 3D vision tasks, including 3D scene understanding, scan registration, and 3D reconstruction. However, a simple scene or object contains a massive amount of unstructured points, leading to challenges in the storage and transmission of these point cloud data. Therefore, developing point cloud geometry compression algorithms is significant to effectively handle and process point cloud data. Existing point cloud compression algorithms typically involve converting point clouds to storageefficient data structure, such as an octree representation or sparse points with latent features. These intermediate representations are then encoded as a compact bitstream using either handcrafted or learningbased entropy coders. Although the correlation of spatial points effectively improves the compression performance, existing algorithms may not fully exploit these points as the representations of the object surface and topology. Recent studies handled this problem by exploring implicit representations and neural networks for surface reconstruction. However, these methods are primarily tailored for 3D objects represented as occupancy fields and signed distance fields, which limits their applicability to point clouds and nonwatertight meshes in terms of both surface representation and reconstruction. Furthermore, the neural networks used in these approaches often rely on simple MultiLayer Perceptron structures, which may lack capacity and compression efficiency for point cloud geometry compression tasks. Method To deal with these limitations, we proposed a novel point cloud geometry compression framework, including a signed logarithm distance field, an implicit network structure with the multiplicative branch, and an adaptive Marching Cube algorithm for surface extraction. First, the point cloud surface (serving as the zero levelset), maps arbitrary points in space to the distance values of their nearest neighbors on the point cloud surface. Here, we design an implicit representation termed the Signed Logarithmic Distance Field (SLDF), which utilizes the thickness assumption and logarithmic parameterization to fit arbitrary point cloud surfaces. Then, a multiplicative implicit neural encoding network (MINE) is applied to encode the surface as a compact neural representation. The MINE network combines sinusoidal activation functions and multiplicative operators to enhance the capability and distribution characteristics of the network. The overfitting process transforms the mapping function from point cloud coordinates to implicit distance fields into a neural network, which is subsequently utilized for model compression. Finally, through the decompressed network, the continuous surface is reconstructed using the adaptive marching cubes algorithm (AMC). The AMC algorithm is an extension of the Marching Cubes algorithm, incorporating a duallayer surface fusion process to further enhance the accuracy of surface extraction for SLDF. Result We compared our algorithm with 6 stateoftheart algorithms, including the surface compression approaches based on implicit representation and point cloud compression methods on 3 public datasets, namely, ABC, Famous, and MPEG PCC datasets. The quantitative evaluation metrics contained the ratedistortion curves of ChamferL1 distance (L1CD), normal consistency (NC), Fscore for continuous point cloud surface and the ratedistortion curve of D1PSNR for quantized point cloud surface. Compared with the suboptimal method (i.e., INR), our method reduces L1CD loss by 12.4%, and improves the NC and Fscore performance by 1.5% and 13.6% on the ABC and Famous datasets. Moreover, the compression efficiency increases with the growth of model parameters by an average increase of 12.9%. On multiple MPEG PCC datasets with samples from 512resolution MVUB dataset, 1024resolution 8iVFB dataset and 2048resolution Owlii dataset, our method achieves over 55 dB within 10 kB range in terms of D1PSNR performance, indicating a higher effective compression limit compared to GPCC. Ablation experiments show that in the absence of SLDF, the L1CD loss increased by 18.53% and the D1PSNR performance increases by 15 dB. Similarly, without MINE network, the L1CD loss increased by 3.72% and the D1PSNR performance increases by 2.67 dB. Conclusion This work studies the implicit representation for point cloud surface. On that basis, an enhanced point cloud compression framework is proposed. Specifically, we first design the signed logarithm distance field to extend the implicit representations of arbitrary topologies in point clouds. Then, we use the multiplicative branch network to enhance the capability and distribution characteristics of the network. Finally, a surface extraction algorithm is applied to enhance the quality of reconstructed point cloud. In this way, we obtain a unified framework for the geometric compression of point cloud surfaces at arbitrary resolutions. Experimental results demonstrate that our method achieves promising performance in point cloud geometry compression.
Keywords
