结合密集残差结构和多尺度剪枝的点云压缩网络

朱威; 张雨航; 应悦; 郑雅羽; 何德峰

doi:10.11834/jig.220047

图像理解和计算机视觉 | 浏览量 : 0 下载量: 4 CSCD: 0

PDF
导出
分享
收藏
专辑

结合密集残差结构和多尺度剪枝的点云压缩网络
A dense residual structure and multi-scale pruning-relevant point cloud compression network
2023年28卷第7期页码：2105-2119
纸质出版日期： 2023-07-16 ，
DOI： 10.11834/jig.220047
稿件说明：

移动端阅览

朱威，张雨航，应悦，郑雅羽，何德峰. 2023. 结合密集残差结构和多尺度剪枝的点云压缩网络. 中国图象图形学报， 28(07):2105-2119

Zhu Wei， Zhang Yuhang， Ying Yue， Zheng Yayu， He Defeng. 2023. A dense residual structure and multi-scale pruning-relevant point cloud compression network. Journal of Image and Graphics， 28(07):2105-2119
朱威，张雨航，应悦，郑雅羽，何德峰. 2023. 结合密集残差结构和多尺度剪枝的点云压缩网络. 中国图象图形学报， 28(07):2105-2119 DOI： 10.11834/jig.220047.

Zhu Wei， Zhang Yuhang， Ying Yue， Zheng Yayu， He Defeng. 2023. A dense residual structure and multi-scale pruning-relevant point cloud compression network. Journal of Image and Graphics， 28(07):2105-2119 DOI： 10.11834/jig.220047.

摘要

目的

点云是一种重要的三维数据表示形式，已在无人驾驶、虚拟现实、三维测量等领域得到了应用。由于点云具有分辨率高的特性，数据传输需要消耗大量的网络带宽和存储资源，严重阻碍了进一步推广。为此，在深度学习的点云自编码器压缩框架基础上，提出一种结合密集残差结构和多尺度剪枝的点云压缩网络，实现了对点云几何信息和颜色信息的高效压缩。

方法

针对点云的稀疏化特点以及传统体素网格表示点云时分辨率不足的问题，采用稀疏张量作为点云的表示方法，并使用稀疏卷积和子流形卷积取代常规卷积提取点云特征；为了捕获压缩过程中高维信息的依赖性，将密集残差结构和通道注意力机制引入到点云特征提取模块；为了补偿采样过程的特征损失以及减少模型训练的动态内存占用，自编码器采用多尺度渐进式结构，并在其解码器不同尺度的上采样层之后加入剪枝层。为了扩展本文网络的适用范围，设计了基于几何信息的点云颜色压缩方法，以保留点云全局颜色特征。

结果

针对几何信息压缩，本文网络在MVUB（Microsoft voxelized upper bodies）、8iVFB（8i voxelized full bodies）和Owlii（Owlii dynamic human mesh sequence dataset）3个数据集上与其他5种方法进行比较。相对MPEG（moving picture experts group）提出的点云压缩标准V-PCC（video-based point cloud compression），BD-Rate（bjontegaard delta rate）分别增加了41%、54%和33%。本文网络的编码运行时间与G-PCC（geometry-based point cloud compression）相当，仅为V-PCC的2.8%。针对颜色信息压缩，本文网络在低比特率下的YUV-PSNR（YUV peak signal to noise ratio）性能优于G-PCC中基于八叉树的颜色压缩方法。

结论

本文网络在几何压缩和颜色压缩上优于主流的点云压缩方法，能在速率较小的情况下保留更多原始点云信息。

Abstract

Objective

In recent years， point clouds technique have been widely used in autonomous driving， virtual reality， 3D measurement and other related domains. Point clouds can provide a more denser and realistic representation than mesh representing 3D data to a certain extent. Due to the high resolution characteristics of point clouds， their data is usually very large， and dense point clouds contain millions of points and more complex attribute information. A challenging issue is to be resolved for its transmission efficiency and storage resources of point cloud because it needs to consume a lot of network bandwidth and storage resources. Therefore， it is very necessary to develop point cloud compression methods with high compression ratio and low distortion.

Method

First， to represent point clouds， we develop a sparse tensor-related network to replace voxels via COO （coordinate） format. Sparse convolution （SC） and sub-manifold sparse convolution （SSC） are used to replace regularized convolution. The SSC can preserve features-extracting sparse features， and the network’s ability is optimized to extract local features while SC has a larger receptive field， which can make up for the lack of SSC receptive field. Second， point clouds analysis are challenged for sparse and unorganized status in space， and their channel-related information is likely to be more effective than spatial information. By combining channel attention with the dense residual network that has a good performance in the field of image super-resolution， we construct a three-dimensional dense residual module with channel attention （3D-RDB-CA）. This module is capable to capture cross-channel features of high-dimensional information and improve compression performance. Furthermore， existing point cloud compression networks reconstruct high-resolution point clouds from low-resolution features through multiple layers of de-convolution， but de-convolution layers-stacked may produce a checkerboard effect. Therefore， to mitigate this effect and reduce dynamic memory footprint during compression， a pruning layer is added after the multi-scale up-sampling layer in the decoder. According to the saved side information during encoding， this module cuts out the feature points， which do not contribute enough to the compression accuracy in the reconstruction process， and the optimal effect of dynamic memory of model training and convergence speed can be achieved. Finally， a geometric information-compressed point cloud color compression scheme is designed to expand the applicable scope of the compression network.

Result

For geometric information compression， three sort of conventional point cloud compression algorithms （G-PCC （octree）， G-PCC （trisoup） and V-PCC） and two kind of point cloud compression algorithms （pcc_geo_cnn_v2 and learned_pcgc） based on deep learning are involved in the comparative experiment with the proposed network. By calculating the peak signal to noise ratio （PSNR） based on D1-p2point （D1 PSNR） and D2-p2plane （D2 PSNR） mentioned above， the corresponding rate-distortion curves are drawn at the same time. For G-PCC and V-PCC， the range of bit rate and corresponding parameters are configurable according to the MPEG CTC-related guidance. Finally， compared to the performance of D1 PSNR and D2 PSNR under the corresponding bit rate range， the proposed network can be used as the baseline， and the BD-Rate and BD-PSNR of other related methods can be calculated to compare its performance. Compared to the point cloud compression standard V-PCC proposed by MPEG， BD-Rate gains can reach more than 41%， 54% and 33% of each dataset. The encoding runtime of the proposed network is equivalent to G-PCC and it is 2.8% of V-PCC only. For color information compression， G-PCC （octree） can be as the baseline. By setting different octree bit depths， quantization ratios and color quality， the color compression distortion can be obtained at different bit rate. By calculating the YUV-PSNR of the two methods under the corresponding bit rate and geometric distortion， the rate-distortion curves can be drawn to compare their compression performance. The experiment demonstrates that the YUV-PSNR performance of the proposed network at low bit rate is better than the octree-based color compression method in G-PCC.

Conclusion

The proposed network has its great potentials in geometric compression and color compression， and more original point cloud information with less bit rate can be preserved. It also can be used to facilitate geometry and color-compression-relevant applicable domains of point cloud compression method in the context of deep learning technique.

关键词

深度学习点云压缩自编码器稀疏卷积点云注意力机制密集残差结构多尺度剪枝

Keywords

deep learningpoint cloud compressionauto-encodersparse convolutionattention mechanism in point clouddense residual structuremulti-scale pruning

references

Achlioptas P， Diamanti O， Mitliagkas I and Guibas L. 2018. Learning representations and generative models for 3D point clouds//Proceedings of the 35th International Conference on Machine Learning. Stockholm， Sweden： PMLR： 40-49

Alexiou E， Tung K and Ebrahimi T. 2020. Towards neural network approaches for point cloud compression//Applications of Digital Image Processing XLIII. Virtual： SPIE： 18-37 ［DOI： 10.1117/12.2569115http://dx.doi.org/10.1117/12.2569115］

Ballé J， Laparra V and Simoncelli E P. 2017. End-to-end optimized image compression//Proceedings of the 5th International Conference on Learning Representations. Toulon， France： OpenReview.net： 1-27

Ballé J， Minnen D， Singh S， Hwang S J and Johnston N. 2018. Variational image compression with a scale hyperprior//Proceedings of the 6th International Conference on Learning Representations. Vancouver， Canada： OpenReview.net： 1-23

Cao C， Preda M and Zaharia T. 2019. 3D point cloud compression： a survey//Proceedings of the 24th International Conference on 3D Web Technology. Los Angeles， USA： ACM： 1-9 ［DOI： 10.1145/3329714.3338130http://dx.doi.org/10.1145/3329714.3338130］

Cheng Z X， Sun H M， Takeuchi M and Katto J. 2018. Deep convolutional AutoEncoder-based Lossy image compression//Proceedings of 2018 Picture Coding Symposium （PCS）. San Francisco， USA： IEEE： 253-257 ［DOI： 10.1109/PCS.2018.8456308http://dx.doi.org/10.1109/PCS.2018.8456308］

Choy C， Gwak J and Savarese S. 2019. 4D spatio-temporal ConvNets： Minkowski convolutional neural networks//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Long Beach， USA： IEEE： 3070-3079 ［DOI： 10.1109/CVPR.2019.00319http://dx.doi.org/10.1109/CVPR.2019.00319］

d’Eon E， Harrison B， Myers T and Chou P A. 2017. 8i voxelized full bodies-a voxelized point cloud dataset. ISO/IEC JTC1/SC29 Joint WG11/WG1 （MPEG/JPEG） input document WG11M40059/WG1M74006. Geneva： MPEG

Graham B and van der Maaten L. 2017. Submanifold sparse convolutional networks ［EB/OL］. ［2020-01-12］. https://arxiv.org/pdf/1706.01307.pdfhttps://arxiv.org/pdf/1706.01307.pdf

Guarda A F R， Rodrigues N M M and Pereira F. 2019. Point cloud coding： adopting a deep learning-based approach//Proceedings of 2019 Picture Coding Symposium （PCS）. Ningbo， China： IEEE： 1-5 ［DOI： 10.1109/PCS48520.2019.8954537http://dx.doi.org/10.1109/PCS48520.2019.8954537］

Gwak J， Choy C and Savarese S. 2020. Generative sparse detection networks for 3D single-shot object detection//Proceedings of the 16th European Conference on Computer Vision. Glasgow， UK： Springer： 297-313 ［DOI： 10.1007/978-3-030-58548-8_18http://dx.doi.org/10.1007/978-3-030-58548-8_18］

He K M， Zhang X Y， Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition （CVPR）. Las Vegas， USA： IEEE： 770-778 ［DOI： 10.1109/CVPR.2016.90http://dx.doi.org/10.1109/CVPR.2016.90］

Hu J， Shen L and Sun G. 2018. Squeeze-and-excitation networks//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Salt Lake City， USA： IEEE： 7132-7141 ［DOI： 10.1109/CVPR.2018.00745http://dx.doi.org/10.1109/CVPR.2018.00745］

Huang G， Liu Z， van der Maaten L and Weinberger K Q. 2017. Densely connected convolutional networks//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition （CVPR）. Honolulu， USA： IEEE： 2261-2269 ［DOI： 10.1109/CVPR.2017.243http://dx.doi.org/10.1109/CVPR.2017.243］

Huang T X and Liu Y. 2019. 3D point cloud geometry compression on deep learning//Proceedings of the 27th ACM International Conference on Multimedia. Nice， France： ACM： 890-898 ［DOI： 10.1145/3343031.3351061http://dx.doi.org/10.1145/3343031.3351061］

Lin T Y， Goyal P， Girshick R， He K M and Dollár P. 2017. Focal loss for dense object detection//Proceedings of 2017 IEEE International Conference on Computer Vision （ICCV）. Venice， Italy： ICCV： 2999-3007 ［DOI： 10.1109/ICCV.2017.324http://dx.doi.org/10.1109/ICCV.2017.324］

Loop C， Cai Q， Escolano S O and Chou P A. 2016. Microsoft voxelized upper bodies-a voxelized point cloud dataset. ISO/IEC JTC1/SC29 Joint WG11/WG1 （MPEG/JPEG） input document m38673/M72012. Geneva， Switzerland： MPEG

Misra D. 2020. Mish： a self-regularized non-monotonic activation function ［EB/OL］. ［2022-01-12］. https://arxiv.org/pdf/1908.08681.pdfhttps://arxiv.org/pdf/1908.08681.pdf

Nguyen D T， Quach M， Valenzise G and Duhamel P. 2021. Learning-based lossless compression of 3D point cloud geometry//Proceedings of the ICASSP 2021—2021 IEEE International Conference on Acoustics， Speech and Signal Processing （ICASSP）. Toronto， Canada： IEEE： 4220-4224 ［DOI： 10.1109/ICASSP39728.2021.9414763http://dx.doi.org/10.1109/ICASSP39728.2021.9414763］

Odena A， Dumoulin V and Olah C. 2016. Deconvolution and checkerboard artifacts. Distill， 1（10）： #3 ［DOI： 10.23915/distill.00003http://dx.doi.org/10.23915/distill.00003］

Qi C R， Su H， Mo K C and Guibas L J. 2017a. PointNet： deep learning on point sets for 3D classification and segmentation//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition （CVPR）. Honolulu， USA： IEEE： 77-85 ［DOI： 10.1109/CVPR.2017.16http://dx.doi.org/10.1109/CVPR.2017.16］

Qi C R， Yi L， Su H and Guibas L J. 2017b. PointNet++： deep hierarchical feature learning on point sets in a metric space//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach， USA： Curran Associates Inc.： 5105–5114 ［DOI： 10.5555/3295222.3295263http://dx.doi.org/10.5555/3295222.3295263］

Qiu S， Wu Y F， Anwar S and Li C Y. 2021. Investigating attention mechanism in 3D point cloud object detection//Proceedings of 2021 International Conference on 3D Vision （3DV）. London， UK： IEEE： 403-412 ［DOI： 10.1109/3DV53792.2021.00050http://dx.doi.org/10.1109/3DV53792.2021.00050］

Quach M， Valenzise G and Dufaux F. 2019. Learning convolutional transforms for lossy point cloud geometry compression//Proceedings of 2019 IEEE International Conference on Image Processing （ICIP）. Taipei， China： IEEE： 4320-4324 ［DOI： 10.1109/icip.2019.8803413http://dx.doi.org/10.1109/icip.2019.8803413］

Quach M， Valenzise G and Dufaux F. 2020. Improved deep point cloud geometry compression// Proceedings of the 22nd IEEE International Workshop on Multimedia Signal Processing （MMSP）. Tampere， Finland： IEEE： 1-6 ［DOI： 10.1109/mmsp48831.2020.9287077http://dx.doi.org/10.1109/mmsp48831.2020.9287077］

Rethage D， Wald J， Sturm J， Navab N and Tombari F. 2018. Fully-convolutional point networks for large-scale point clouds//Proceedings of the 15th European Conference on Computer Vision （ECCV）. Munich， Germany： Springer： 625-640 ［DOI： 10.1007/978-3-030-01225-0_37http://dx.doi.org/10.1007/978-3-030-01225-0_37］

Schnabel R and Klein R. 2006. Octree-based point-cloud compression//Proceedings of the 3rd Eurographics/IEEE VGTC conference on Point-Based Graphics. Boston， Massachusetts： Eurographics Association： 111-121

Schwarz S， Martin-Cocher G and Flynn D. 2018. Common test conditions for point cloud compression. ISO/IEC JTC1/SC29/WG11 w17766. Slovenia： MPEG

Schwarz S， Preda M， Baroncini V， Budagavi M， Cesar P， Chou P A， Cohen R A， Krivokuća M， Lasserre S， Li Z， Llach J， Mammou K， Mekuria R， Nakagami O， Siahaan E， Tabatabai A， Tourapis A M and Zakharchenko V. 2019. Emerging MPEG standards for point cloud compression. IEEE Journal on Emerging and Selected Topics in Circuits and Systems， 9（1）： 133-148 ［DOI： 10.1109/JETCAS.2018.2885981http://dx.doi.org/10.1109/JETCAS.2018.2885981］

Sullivan G J and Wiegand T. 1998. Rate-distortion optimization for video compression. IEEE Signal Processing Magazine， 15（6）： 74-90 ［DOI： 10.1109/79.733497http://dx.doi.org/10.1109/79.733497］

Tew P A. 2016. An Investigation of Sparse Tensor Formats for Tensor Libraries. Cambridge： Massachusetts Institute of Technology

Tian D， Ochimizu H， Feng C， Cohen R and Vetro A. 2017. Geometric distortion metrics for point cloud compression//Proceedings of 2017 IEEE International Conference on Image Processing （ICIP）. Beijing， China： IEEE： 3460-3464 ［DOI： 10.1109/ICIP.2017.8296925http://dx.doi.org/10.1109/ICIP.2017.8296925］

Toderici G， O'Malley S M， Hwang S J， Vincent D， Minnen D， Baluja S， Covell M and Sukthankar R. 2016. Variable rate image compression with recurrent neural networks ［EB/OL］. ［2022-01-12］. https://arxiv.org/pdf/1511.06085.pdfhttps://arxiv.org/pdf/1511.06085.pdf

Torlig E M， Alexiou E， Fonseca T A， de Queiroz R L and Ebrahimi T. 2018. A novel methodology for quality assessment of voxelized point clouds//Applications of Digital Image Processing XLI. San Diego， USA： SPIE： 174-190 ［DOI： 10.1117/12.2322741http://dx.doi.org/10.1117/12.2322741］

Wang J Q， Ding D D， Li Z and Ma Z. 2021b. Multiscale point cloud geometry compression//Proceedings of 2021 Data Compression Conference （DCC）. Snowbird， USA： IEEE： 73-82 ［DOI： 10.1109/DCC50243.2021.00015http://dx.doi.org/10.1109/DCC50243.2021.00015］

Wang J Q， Zhu H， Liu H J and Ma Z. 2021a. Lossy point cloud geometry compression via end-to-end learning. IEEE Transactions on Circuits and Systems for Video Technology， 32（12）： 4909-4923 ［DOI： 10.1109/tcsvt.2021.3051377http://dx.doi.org/10.1109/tcsvt.2021.3051377］

Wu Z R， Song S R， Khosla A， Yu F， Zhang L G， Tang X O and Xiao J X. 2015. 3D ShapeNets： a deep representation for volumetric shapes//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition （CVPR）. Boston， USA： IEEE： 1912-1920 ［DOI： 10.1109/CVPR.2015.7298801http://dx.doi.org/10.1109/CVPR.2015.7298801］

Xu Y， Lu Y and Wen Z. 2017. Owlii Dynamic human mesh sequence dataset. ISO/IEC JTC1/SC29/WG11 m41658. Macau， China： MPEG

Yan W， Shao Y， Liu S， Li T H， Li Z and Li G. 2019. Deep AutoEncoder-based Lossy geometry compression for point clouds ［EB/OL］. ［2022-01-12］. https://arxiv.org/pdf/1905.03691.pdfhttps://arxiv.org/pdf/1905.03691.pdf

Zhang Y L， Tian Y P， Kong Y， Zhong B N and Fu Y. 2021. Residual dense network for image restoration. IEEE Transactions on Pattern Analysis and Machine Intelligence， 43（7）： 2480-2495 ［DOI： 10.1109/TPAMI.2020.2968521http://dx.doi.org/10.1109/TPAMI.2020.2968521］

文章被引用时，请邮件提醒。

提交