多尺度代价体信息共享的多视角立体重建网络
Multi-scale cost volumes information sharing based multi-view stereo reconstructed model
- 2022年27卷第11期 页码:3331-3342
收稿日期:2021-09-02,
修回日期:2021-10-29,
录用日期:2021-11-6,
纸质出版日期:2022-11-16
DOI: 10.11834/jig.210774
移动端阅览

浏览全部资源
扫码关注微信
收稿日期:2021-09-02,
修回日期:2021-10-29,
录用日期:2021-11-6,
纸质出版日期:2022-11-16
移动端阅览
目的
2
多视角立体重建方法是3维视觉技术中的重要部分。相较于传统方法,基于深度学习的方法大幅减少重建所需时间,同时在重建完整性上也有所提升。然而,现有方法的特征提取效果一般和代价体之间的关联性较差,使得重建结果仍有可以提升的空间。针对以上问题,本文提出了一种双U-Net特征提取的多尺度代价体信息共享的多视角立体重建网络模型。
方法
2
为了获得输入图像更加完整和准确的特征信息,设计了一个双U-Net特征提取模块,同时按照3个不同尺度构成由粗到细的级联结构输出特征;在代价体正则化阶段,设计了一个多尺度代价体信息共享的预处理模块,对小尺度代价体内的信息进行分离并传给下层代价体进行融合,由粗到细地进行深度图估计,使重建精度和完整度有大幅提升。
结果
2
实验在DTU(Technical University of Denmark)数据集上与CasMVSNet相比,在准确度误差、完整度误差和整体性误差3个主要指标上分别提升约16.2%,6.5%和11.5%,相较于其他基于深度学习的方法更是有大幅度提升,并且在其他几个次要指标上也均有不同程度的提升。
结论
2
提出的双U-Net提取多尺度代价体信息共享的多视角立体重建网络在特征提取和代价体正则化阶段均取得了效果,在重建精度上相比于原模型和其他方法都有一定的提升,验证了该方法的真实有效。
Objective
2
Multi-view stereo (MVS) network is modeled to resilient a 3D model of a scene in the context of a set of images of a scene derived from photographic parameters-relevant multiple visual angles. This method can reconstruct small and large scales indoor and outdoor scenes both. The emerging virtual-reality-oriented 3D reconstruction technology has been developing nowadays. Traditional MVS methods mainly use manual designed similarity metrics and regularization methods to calculate the dense correspondence of scenes
which can be broadly classified into four categorized algorithms based on point cloud
voxel
variable polygon mesh
and depth map. These methods can achieve good results in ideal Lambert scenes without weakly textured areas
but it often fails to yield satisfactory reconstruction results in cases of texture scarcity
texture repetition
or lighting changes. Recent computer-vision-oriented deep learning techniques have promoted the newly reconstruction structure. The learning-based approach can learn the global semantic information. For example
there are based on the highlights and reflections of the prior for getting the more robust matching effect
so it was successively applied on the basis of the above traditional methods of deep learning. In general
MVS inherits the stereo geometry mechanism of stereo matching and improves the effect of the occlusion problem effectively
and it achieves greater improvement in accuracy and generalization as well. However
the existing methods have normal effects in feature extraction and poor correlation between cost volumes. We facilitate the multi-view stereo network with dual U-Net feature extraction sharing multi-scale cost volumes information.
Method
2
Our improvements are mainly focused on the feature extraction and cost volume regularization pre-processing. First
a dual U-Net module is designed for feature extraction. For all input images with a resolution of 512×640 pixels
after convolution and ReLU
the original image of 3 channels is conveyed to 8 channels and 32 channels
and the feature maps of 1
1/4 and 1/16 of the original image size are generated by dual pooling-maximized and convolution-sustained
respectively. In the up-sampling stage
the multi-scale features information is sewed and channel dimension is fused for thicker features. The merged convolution and upsampling are continued to obtain a 32-channel feature map with the same resolution as the original image
and it is used as the input and passed through the U-Net network once more to finally obtain three sets of feature maps with different sizes. Such a dual U-Net feature extraction module can keep more detailed features through
downsampling (reduce the spatial dimension of the pooling layer)
upsampling (repair the detail and spatial dimension of the object)
and side-joining (repair the detail of the target). This can make the following depth estimation results more accurate and complete. Nextly
since the initially constructed cost volumes have no connection between different scales and rely on the upsampling of the feature extraction module only to maintain the connection
resulting in the information of the cost volumes in each layer cannot be transferred
we design a multi-scale cost volume information sharing module in the pre-regularization stage
which separates the cost volumes generated in each layer and fuse them into the next layer. To improve the estimation quality of the depth map
the small-scale cost volume information is fused into the cost volume of the next layer.
Result
2
The Technical University of Denmark(DTU) dataset used in this experiment is an indoor dataset specially shot and processed for MVS
which can directly obtain the internal and external photographic parameters of each view angle. It consists of 128 multiple objects or scenes based on 79 training scenes
18 validation scenes and 22 test scenes. The training is equipped on Ubuntu 20.04 with an Intel Core i9-10920X CPU and an NVIDIA 3090 graphics card. There are three main evaluation metrics
which are the average distance from the reconstructed point cloud to the real point cloud
which can be called accuracy (
Acc
)
the average distance from the real point cloud to the reconstructed point cloud
which can be called completeness (
Comp
) and the average value of accuracy and completeness
which can be called overall (
Overall
). Some secondary metrics are involved in
which are absolute value of depth error
absolute error and accuracy of 2 mm
absolute error and accuracy of 4 mm
etc. The experimental results show that our three main metrics like
Acc
Comp
and
Overall
are improved by about 16.2%
6.5% and 11.5% compared to the original method.
Conclusion
2
Our reconstructed network model is developed via the multi-view stereo network with dual U-Net feature extraction sharing multi-scale cost volumes information method. It has a significant enhancement effect on both feature extraction and cost volume regularization
and the reconstructed accuracy has its feasible potentials.
Aanæs H, Jensen R R, Vogiatzis G, Tola E and Dahl A B. 2016. Large-scale data for multiple-view stereopsis. International Journal of Computer Vision, 120(2): 153-168[DOI: 10.1007/s11263-016-0902-9]
Campbell N D F, Vogiatzis G, Hernández C and Cipolla R. 2008. Using multiple hypotheses to improve depth-maps for multi-view stereo//Proceedings of the 10th European Conference on Computer Vision. Marseille, France: Springer: 766-779[ DOI: 10.1007/978-3-540-88682-2_58 http://dx.doi.org/10.1007/978-3-540-88682-2_58 ]
Chen R, Han S F, Xu J and Su H. 2019. Point-based multi-view stereo network//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 1538-1547[ DOI: 10.1109/ICCV.2019.00162 http://dx.doi.org/10.1109/ICCV.2019.00162 ]
Furukawa Y and Ponce J. 2009. Carved visual hulls for image-based modeling. International Journal of Computer Vision, 81(1): 53-67[DOI: 10.1007/s11263-008-0134-8]
Furukawa Y and Ponce J. 2010. Accurate, dense, and robust multiview stereopsis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(8): 1362-1376[DOI: 10.1109/TPAMI.2009.161]
Galliani S, Lasinger K and Schindler K. 2015. Massively parallel multiview stereopsis by surface normal diffusion//[JP2]Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: [JP] IEEE: 873-881[ DOI: 10.1109/ICCV.2015.106 http://dx.doi.org/10.1109/ICCV.2015.106 ]
Gu X D, Fan, Z W, Zhu S Y, Dai Z Z, Tan F T and Tan P. 2020. Cascade cost volume for high-resolution multi-view stereo and stereo matching//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 2492-2501[ DOI: 10.1109/CVPR42600.2020.00257 http://dx.doi.org/10.1109/CVPR42600.2020.00257 ]
Huang P H, Matzen K, Kopf J, Ahuja N and Huang J B. 2018. DeepMVS: learning multi-view stereopsis//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 2821-2830[ DOI: 10.1109/CVPR.2018.00298 http://dx.doi.org/10.1109/CVPR.2018.00298 ]
Ji M Q, Gall J, Zheng H T, Liu Y B and Fang L. 2017. SurfaceNet: an end-to-end 3D neural network for multiview stereopsis//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 2326-2334[ DOI: 10.1109/ICCV.2017.253 http://dx.doi.org/10.1109/ICCV.2017.253 ]
Kar A, Häne C and Malik J. 2017. Learning a multi-view stereo machine[EB/OL]. [2021-08-17] . https://arxiv.org/pdf/1708.05375.pdf https://arxiv.org/pdf/1708.05375.pdf
Kendall A, Martirosyan H, Dasgupta S, Henry P, Kennedy R, Bachrach A and Bry A. 2017. End-to-end learning of geometry and context for deep stereo regression//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 66-75[ DOI: 10.1109/ICCV.2017.17 http://dx.doi.org/10.1109/ICCV.2017.17 ]
Li Z X, Wang K Q, Zuo W M, Meng D Y and Zhang L. 2016. Detail-preserving and content-aware variational multi-view stereo reconstruction. IEEE Transactions on Image Processing, 25(2): 864-877[DOI: 10.1109/TIP.2015.2507400]
Long X X, Cheng X J, Zhu H, Zhang P J, Liu H M, Li J, Zheng L T, Hu Q Y, Liu H, Cao X, Yang R G, Wu Y H, Zhang G F, Liu Y B, Xu K, Guo Y L and Chen B Q. 2021. Recent progress in 3D vision. Journal of Image and Graphics, 26(6): 1389-1428
龙霄潇, 程新景, 朱昊, 张朋举, 刘浩敏, 李俊, 郑林涛, 胡庆拥, 刘浩, 曹汛, 杨睿刚, 吴毅红, 章国锋, 刘烨斌, 徐凯, 郭裕兰, 陈宝权. 2021. 三维视觉前沿进展. 中国图象图形学报, 26(6): 1389-1428[DOI: 10.11834/jig.210043]
Luo W J, Schwing A G and Urtasun R. 2016. Efficient deep learning for stereo matching//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 5695-5703[ DOI: 10.1109/CVPR.2016.614 http://dx.doi.org/10.1109/CVPR.2016.614 ]
Ronneberger O, Fischer P and Brox T. 2015. U-Net: convolutional networks for biomedical image segmentation//Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention. Munich, Germany:Springer: 234-241[ DOI: 10.1007/978-3-319-24574-4_28 http://dx.doi.org/10.1007/978-3-319-24574-4_28 ]
Schönberger J L, Zheng E L, Frahm J M and Pollefeys M. 2016. Pixelwise view selection for unstructured multi-view stereo//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands: Springer: 501-518[ DOI: 10.1007/978-3-319-46487-9_31 http://dx.doi.org/10.1007/978-3-319-46487-9_31 ]
Sinha S N, Mordohai P and Pollefeys M. 2007. Multi-view stereo via graph cuts on the dual of an adaptive tetrahedral mesh//Proceedings of the 11th IEEE International Conference on Computer Vision. Rio de Janeiro, Brazil: IEEE: 1-8[ DOI: 10.1109/ICCV.2007.4408997 http://dx.doi.org/10.1109/ICCV.2007.4408997 ]
Tola E, Strecha C and Fua P. 2012. Efficient large-scale multi-view stereo for ultra high-resolution image sets. Machine Vision and Applications, 23(5): 903-920[DOI: 10.1007/s00138-011-0346-8]
Yan S, Zhang M J, Fan Y C, Tan X H, Liu Y, Peng Y and Liu Y X. 2021. Progress in the large-scale outdoor image 3D reconstruction. Journal of Image and Graphics, 26(6): 1429-1449
颜深, 张茂军, 樊亚春, 谭小慧, 刘煜, 彭杨, 刘宇翔. 2021. 大规模室外图像3维重建技术研究进展. 中国图象图形学报, 26(6): 1429-1449[DOI: 10.11834/jig.200842]
Yang G S, Manela J, Happold M and Ramanan D. 2019. Hierarchical deep stereo matching on high-resolution images//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 5510-5519[ DOI: 10.1109/CVPR.2019.00566 http://dx.doi.org/10.1109/CVPR.2019.00566 ]
Yao Y, Luo Z X, Li S W, Fang T and Quan L. 2018. MVSNet: depth inference for unstructured multi-view stereo//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 785-801[ DOI: 10.1007/978-3-030-01237-3_47 http://dx.doi.org/10.1007/978-3-030-01237-3_47 ]
Yao Y, Luo Z X, Li S W, Shen T W, Fang T and Quan L. 2019. Recurrent MVSNet for high-resolution multi-view stereo depth inference//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 5520-5529[ DOI: 10.1109/CVPR.2019.00567 http://dx.doi.org/10.1109/CVPR.2019.00567 ]
Yu Z H and Gao S H. 2020. Fast-MVSNet: sparse-to-dense multi-view stereo with learned propagation and gauss-newton refinement//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 1946-1955[ DOI: 10.1109/cvpr42600.2020.00202 http://dx.doi.org/10.1109/cvpr42600.2020.00202 ]
Žbontar J and LeCun Y. 2016. Stereo matching by training a convolutional neural network to compare image patches. Journal of Machine Learning Research, 17(1): 2287-2318
相关作者
相关机构
京公网安备11010802024621