抗高光的光场深度估计方法
Anti-specular light-field depth estimation algorithm
- 2020年25卷第12期 页码:2630-2646
收稿:2019-10-15,
修回:2020-1-14,
录用:2020-1-21,
纸质出版:2020-12-16
DOI: 10.11834/jig.190526
移动端阅览

浏览全部资源
扫码关注微信
收稿:2019-10-15,
修回:2020-1-14,
录用:2020-1-21,
纸质出版:2020-12-16
移动端阅览
目的
2
光场相机一次成像可以同时获取场景中光线的空间和角度信息,为深度估计提供了条件。然而,光场图像场景中出现高光现象使得深度估计变得困难。为了提高算法处理高光问题的可靠性,本文提出了一种基于光场图像多视角上下文信息的抗高光深度估计方法。
方法
2
本文利用光场子孔径图像的多视角特性,创建多视角输入支路,获取不同视角下图像的特征信息;利用空洞卷积增大网络感受野,获取更大范围的图像上下文信息,通过同一深度平面未发生高光的区域的深度信息,进而恢复高光区域深度信息。同时,本文设计了一种新型的多尺度特征融合方法,串联多膨胀率空洞卷积特征与多卷积核普通卷积特征,进一步提高了估计结果的精度和平滑度。
结果
2
实验在3个数据集上与最新的4种方法进行了比较。实验结果表明,本文方法整体深度估计性能较好,在4D light field benchmark合成数据集上,相比于性能第2的模型,均方误差(mean square error,MSE)降低了20.24%,坏像素率(bad pixel,BP)降低了2.62%,峰值信噪比(peak signal-to-noise ratio,PSNR)提高了4.96%。同时,通过对CVIA(computer vision and image analysis)Konstanz specular dataset合成数据集和Lytro Illum拍摄的真实场景数据集的定性分析,验证了本文算法的有效性和可靠性。消融实验结果表明多尺度特征融合方法改善了深度估计在高光区域的效果。
结论
2
本文提出的深度估计模型能够有效估计图像深度信息。特别地,高光区域深度信息恢复精度高、物体边缘区域平滑,能够较好地保存图像细节信息。
Objective
2
Image depth
which refers to the distance from a point in a scene to the center plane of a camera
reflects the 3D geometric information of a scene. Reliable depth information is important in many visual tasks
including image segmentation
target detection
and 3D surface reconstruction. Depth estimation has become one of the most important research topics in the field of computer vision. With the development of sensor technology
light field cameras
as new multi-angle image acquisition devices
have increased the convenience of acquiring optical field data. These cameras can simultaneously acquire the spatial and angular information of a scene and show unique advantages in depth estimation. At present
most of the available methods for light field depth estimation can obtain highly accurate depth information in many scenes. However
these methods implicitly assume that objects are on a Lambertian surface or a uniform reflection coefficient surface. When specular reflection or non-Lambertian surfaces appear in a scene
depth information cannot be accurately obtained. Specular reflection is commonly observed in real-world scenes when light strikes the surface of an object
such as metals
plastics
ceramics
and glass. Specular reflection tends to change the color of an object and obscure its texture
thereby leading to local area information loss. Previous studies have shown that the specular region changes along with angle of view. Furthermore
we can speculate on the location of the specular area based on the context information of its surroundings. Inspired by these principles
we propose an anti-specular depth estimation method based on the context information of the light field image. In this way
this method can improve the reliability of the algorithm in handling problems associated with specular reflection.
Method
2
Based on the changes in the change of an image with the angle of view
we design our network by considering the light field geometry
select the horizontal
vertical
left diagonal
and right diagonal dimensions
and create four independent yet identical sub-aperture image processing branches. In this configuration
the network generates four directional independent depth feature representations that are combined at a later stage. We also use a fixed light direction
due to the obstruction of the front object or the incident angle of the light
smooth surface at the same depth level
not all areas will appear as highlights. In addition
the degree of reflection of specular on the smooth surface is different
indirectly showing the geometric characteristics. Therefore
we process each sub-aperture image branch via dilated convolution
which expands the network receptive field. Our constructed network obtains a wide range of image context information and then restores the specular region depth information. To improve the depth estimation accuracy in the specular area
we apply a novel multi-scale feature fusion method where the multi-rate dilated convolution feature is connected to a multi-kernel common convolution feature to obtain the fusion features. To enhance the robustness of our depth estimation
we use a series of residual modules to reintroduce part of the feature information that is lost by the previous layer convolution in the network
learn the relationship among the fusion features
and encode such relationship into higher-dimension features. We use Tensorflow as our training backend
the Ker as programming language to build our network
Rmsprop as our optimizer
and set the batch size to 16. We initialize our model parameters by using the Glorot uniform distribution initialization and set our initial learning rate to 1E-4
which decreases to 1E-6 along with the number of iterations. We use the mean absolute error (MAE) as our loss function given its robustness to outliers. We use an Intel i7-5820K@3.30 GHz processor with GeForce GTX 1080Ti as our experimental machine. Our network trains 200 epochs for approximately 2 to 3 days.
Result
2
4D light field benchmark synthetic scene dataset was used for quantitative experiments
and the computer vision and image analysis (CVIA) Konstanz specular synthetic scene dataset and real scene dataset captured by Lytro Illum were used for the qualitative experiments. We used three evaluation criteria in our quantitative experiment
namely
mean square error (MSE)
bad pixel (BP)
and peak signal-to-noise ratio (PSNR). Experiment results show that our proposed method has an improved depth estimation. Our quantitative analysis on 4D light field benchmark synthetic dataset shows that our proposed method reduces the MSE value by 20.24%
has a BP value (0.07) that is 2.62% lower than that of the second-best model
and a 4.96% PSNR value. Meanwhile
in our qualitative analysis of the CVIA Konstanz specular synthetic dataset and the real scene dataset captured by Lytro Illum
our proposed algorithm achieves ideal depth estimation results
thereby verifying its effectiveness in recovering depth information in the specular highlight region. We also perform an ablation experiment of the network receptive field expansion and residual feature coding modules
and we find that the multi-scale feature fusion method improves the effect of depth estimation in the highlight areas and greatly improves the residual structure.
Conclusion
2
Our model can effectively estimate image depth information. This model achieves a high recovery accuracy in recovering highlight region depth information
has a smooth object edge region
and can efficiently preserve image detail information.
Adelson E H and Bergen J R.1991. The plenoptic function and the elements of early vision//Landy M, Movshon J A, eds. Computational Models of Visual Processing. Cambridge, USA: MIT Press: 3-20
Adelson E H and Wang J Y A. 1992. Single lens stereo with a plenoptic camera. IEEE Transactions on Pattern Analysis and Machine Intelligence, 14(2):99-106[DOI:10.1109/34.121783]
Alperovich A and Goldluecke B. 2017. A variational model for intrinsic light field decomposition//Proceedings of the 13th Asian Conference on Computer Vision. Taipei, China: Springer: 66-82[ DOI: 10.1007/978-3-319-54187-7_5 http://dx.doi.org/10.1007/978-3-319-54187-7_5 ]
Alperovich A, Johannsen O, Strecke M and Goldluecke B. 2018. Light field intrinsics with a deep encoder-decoder network//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE: 9145-9154[ DOI: 10.1109/CVPR.2018.00953 http://dx.doi.org/10.1109/CVPR.2018.00953 ]
Bishop T E, Zanetti S and Favaro P. 2009. Light field superresolution//Proceedings of 2009 IEEE International Conference on Computational Photography. San Francisco: IEEE: 1-9[ DOI: 10.1109/ICCPHOT.2009.5559010 http://dx.doi.org/10.1109/ICCPHOT.2009.5559010 ]
Cui Z P, Gu J W, Shi B X, Tan P and Kautz J. 2017. Polarimetric multi-view stereo//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE: 369-378[ DOI: 10.1109/CVPR.2017.47 http://dx.doi.org/10.1109/CVPR.2017.47 ]
Dansereau D G, Pizarro O and Williams S B. 2013. Decoding, calibration and rectification for lenselet-based plenoptic cameras//Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland: IEEE: 1027-1034[ DOI: 10.1109/CVPR.2013.137 http://dx.doi.org/10.1109/CVPR.2013.137 ]
Feng M T, Wang Y N, Liu J, Zhang L, Zaki H F M and Mian A. 2018. Benchmark data set and method for depth estimation from light field images. IEEE Transactions on Image Processing, 27(7):3586-3598[DOI:10.1109/TIP.2018.2814217]
Goodfellow I, Bengio Y and Courville A. 2016. Deep Learning. Cambridge: MIT Press
He K M, Zhang X Y, Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE: 770-778[ DOI: 10.1109/CVPR.2016.90 http://dx.doi.org/10.1109/CVPR.2016.90 ]
Heber S and Pock T. 2014. Shape from light field meets robust PCA//Proceedings of the 13th European Conference on Computer Vision. Switzerland: Springer: 751-767[ DOI: 10.1007/978-3-319-10599-4_48 http://dx.doi.org/10.1007/978-3-319-10599-4_48 ]
Heber S and Pock T. 2016. Convolutional networks for shape from light field//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE: 3746-3754[ DOI: 10.1109/CVPR.2016.407 http://dx.doi.org/10.1109/CVPR.2016.407 ]
Heber S, Yu W and Pock T. 2016. U-shaped networks for shape from light field//Proceedings of British Machine Vision Conference. York: BMVA Press: #5[ DOI: 10.5244/C.30.37 http://dx.doi.org/10.5244/C.30.37 ]
Heber S, Yu W and Pock T. 2017. Neural EPI-volume networks for shape from light field//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice: IEEE: 2271-2279[ DOI: 10.1109/ICCV.2017.247 http://dx.doi.org/10.1109/ICCV.2017.247 ]
Honauer K, Johannsen O, Kondermann D and Goldluecke B. 2016. A dataset and evaluation methodology for depth estimation on 4 d light fields//Proceedings of the 13th Asian Conference on Computer Vision. Taipei, China: Springer: 19-34[ DOI: 10.1007/978-3-319-54187-7_2 http://dx.doi.org/10.1007/978-3-319-54187-7_2 ]
Huang F C, Luebke D P and Wetzstein G. 2015. The light field stereoscope//Proceedings of ACM SIGGRAPH 2015 Emerging Technologies. Los Angeles: ACM: #24[ DOI: 10.1145/2782782.2792493 http://dx.doi.org/10.1145/2782782.2792493 ]
Jeon H G, Park J, Choe G, Park J, Bok Y, Tai Y W and So Kweon I. 2015. Accurate depth map estimation from a lenslet light field camera//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE: 1547-1555[ DOI: 10.1109/CVPR.2015.7298762 http://dx.doi.org/10.1109/CVPR.2015.7298762 ]
Johannsen O, Sulc A and Goldluecke B. 2016. What sparse light field coding reveals about scene structure//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE: 3262-3270[ DOI: 10.1109/CVPR.2016.355 http://dx.doi.org/10.1109/CVPR.2016.355 ]
Kalantari N K, Wang T C and Ramamoorthi R. 2016. Learning-based view synthesis for light field cameras. ACM Transactions on Graphics, 35(6):#193[DOI:10.1145/2980179.2980251]
Kim C, Zimmer H, Pritch Y, Sorkine-Hornung A and Gross M. 2013. Scene reconstruction from high spatio-angular resolution light fields. ACM Transactions on Graphics, 32(4):#73[DOI:10.1145/2461912.2461926]
Langguth F, Sunkavalli K, Hadap S and Goesele M. 2016. Shading-aware multi-view stereo//Proceedings of the 14th European Conference on Computer Vision. Amsterdam: Springer: 469-485[ DOI: 10.1007/978-3-319-46487-9_29 http://dx.doi.org/10.1007/978-3-319-46487-9_29 ]
Levoy M and Hanrahan P. 1996. Light field rendering//Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques. New Orleans: ACM: 31-42[ DOI: 10.1145/237170.237199 http://dx.doi.org/10.1145/237170.237199 ]
Levoy M. 2006. Light fields and computational imaging. Computer, 39(8):46-55[DOI:10.1109/mc.2006.270]
Li N Y, Ye J W, Ji Y, Ling H B and Yu J Y. 2014. Saliency detection on light field//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus: IEEE: 2806-2813[ DOI: 10.1109/CVPR.2014.359 http://dx.doi.org/10.1109/CVPR.2014.359 ]
Long J, Shelhamer E and Darrell T. 2015. Fully convolutional networks for semantic segmentation//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE: 3431-3440[ DOI: 10.1109/CVPR.2015.7298965 http://dx.doi.org/10.1109/CVPR.2015.7298965 ]
Ng R. 2006. Digital Light Field Photography. Stanford: Stanford University: 1-203
Oxholm G and Nishino K. 2014. Multiview shape and reflectance from natural illumination//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus: IEEE: 2163-2170[ DOI: 10.1109/CVPR.2014.277 http://dx.doi.org/10.1109/CVPR.2014.277 ]
Perra C, Murgia F and Giusto D. 2016. An analysis of 3D point cloud reconstruction from light field images//Proceedings of the 2016 Sixth International Conference on Image Processing Theory, Tools and Applications. Oulu: IEEE: 1-6[ DOI: 10.1109/IPTA.2016.7821011 http://dx.doi.org/10.1109/IPTA.2016.7821011 ]
Perwass C and Wietzke L. 2012. Single lens 3D-camera with extended depth-of-field//Proceedings of the SPIE 8291, Human Vision and Electronic Imaging XVⅡ. Burlingame: SPIE: #829108[ DOI: 10.1117/12.909882 http://dx.doi.org/10.1117/12.909882 ]
Zhang J, Liu Y, Zhang S, Poppe R and Wang M. 2020 Light field saliency detection with deep convolutional networks. IEEE Transactions on Image Processing, 29: 4421-4434[ DOI: 10.1109/TIP.2020.2970529 http://dx.doi.org/10.1109/TIP.2020.2970529 .]
Shafer S A. 1985. Using color to separate reflection components. Color Research and Application, 10(4):210-218[DOI:10.1002/col.5080100409]
Shin C, Jeon H G, Yoon Y, So Kweon I and Joo Kim S. 2018. EPINET: a fully-convolutional neural network using epipolar geometry for depth from light field images//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE: 4748-4757[ DOI: 10.1109/CVPR.2018.00499 http://dx.doi.org/10.1109/CVPR.2018.00499 ]
Simonyan K and Zisserman A. 2014. Very deep convolutional networks for large-scale image recognition[EB/OL].[2020-01-10] . https://arxiv.org/pdf/1409.1556.pdf https://arxiv.org/pdf/1409.1556.pdf
Srinivasan P P, Wang T Z, Sreelal A, Ramamoorthi R and Ng R. 2017. Learning to synthesize a 4 d RGBD light field from a single image[EB/OL].[2020-01-10] . https://arxiv.org/pdf/1708.03292.pdf https://arxiv.org/pdf/1708.03292.pdf
Tao M W, Hadap S, Malik J and Ramamoorthi R. 2013. Depth from combining defocus and correspondence using light-field cameras//Proceedings of 2013 IEEE International Conference on Computer Vision. Sydney: IEEE: 673-680[ DOI: 10.1109/ICCV.2013.89 http://dx.doi.org/10.1109/ICCV.2013.89 ]
Tao M W, Su J C, Wang T C, Malik J and Ramamoorthi R. 2016. Depth estimation and specular removal for glossy surfaces using point and line consistency with light-field cameras. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(6):1155-1169[DOI:10.1109/TPAMI.2015.2477811]
Wang T C, Efros A A and Ramamoorthi R. 2015. Occlusion-aware depth estimation using light-field cameras//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago: IEEE: 3487-3495[ DOI: 10.1109/ICCV.2015.398 http://dx.doi.org/10.1109/ICCV.2015.398 ]
Wang T C, Zhu J Y, Hiroaki E, Chandraker M, Efros A A and Ramamoorthi R. 2016. A 4D light-field dataset and CNN architectures for material recognition//Proceedings of the 14th European Conference on Computer Vision. Amsterdam: Springer: 121-138[ DOI: 10.1007/978-3-319-46487-9_8 http://dx.doi.org/10.1007/978-3-319-46487-9_8 ]
Wang Y L, Liu F, Zhang K B, Hou G Q, Sun Z N and Tan T N. 2018. LFNet:a novel bidirectional recurrent convolutional neural network for light-field image super-resolution. IEEE Transactions on Image Processing, 27(9):4274-4286[DOI:10.1109/TIP.2018.2834819]
Wanner S and Goldluecke B. 2012a. Spatial and angular variational super-resolution of 4D light fields//Proceedings of the 12th European Conference on Computer Vision. Florence: Springer: 608-621[ DOI: 10.1007/978-3-642-33715-4_44 http://dx.doi.org/10.1007/978-3-642-33715-4_44 ]
Wanner S and Goldluecke B. 2012b. Globally consistent depth labeling of 4D light fields//Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence: IEEE: 41-48[ DOI: 10.1109/CVPR.2012.6247656 http://dx.doi.org/10.1109/CVPR.2012.6247656 ]
Wanner S and Goldluecke B. 2013a. Reconstructing reflective and transparent surfaces from Epipolar plane images//Proceedings of the 35th German Conference on Pattern Recognition. Saarbrücken: Springer: 1-10[ DOI: 10.1007/978-3-642-40602-7_1 http://dx.doi.org/10.1007/978-3-642-40602-7_1 ]
Wanner S and Goldluecke B. 2014. Variational light field analysis for disparity estimation and super-resolution. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(3):606-619[DOI:10.1109/TPAMI.2013.147]
Wanner S, Meister S and Goldluecke B. 2013b. Datasets and benchmarks for densely sampled 4D light fields//Bronstein M, Favre J and Hormann K, eds. Vision, Modeling and Visualization. The Eurographics Association: 225-226[ DOI: 10.2312/PE.VMV.VMV13.225-226 http://dx.doi.org/10.2312/PE.VMV.VMV13.225-226 ]
Wu C L, Wilburn B, Matsushita Y and Theobalt C. 2011. High-quality shape from multi-view stereo and shading under general illumination//Proceedings of CVPR 2011. Providence: IEEE: 969-976[ DOI: 10.1109/CVPR.2011.5995388 http://dx.doi.org/10.1109/CVPR.2011.5995388 ]
Wu G C, Masia B, Jarabo A, Zhang Y C, Wang L Y, Dai Q H, Cjai T Y and Liu Y B. 2017. Light field image processing:an overview. IEEE Journal of Selected Topics in Signal Processing, 11(7):926-954[DOI:10.1109/JSTSP.2017.2747126]
Xiong W, Zhang J, Gao X J, Zhang X D and Gao J. 2017. Anti-occlusion light-field depth estimation from adaptive cost volume. Journal of Image and Graphics, 22(12):1709-1722
熊伟, 张骏, 高欣健, 张旭东, 高隽. 2017.自适应成本量的抗遮挡光场深度估计算法.中国图象图形学报, 22(12):1709-1722[DOI:10.11834/jig.170324]
Yu F and Koltun V. 2015. Multi-scale context aggregation by dilated convolutions[EB/OL].[2020-01-20] . https://arxiv.org/pdf/1511.07122.pdf https://arxiv.org/pdf/1511.07122.pdf
Yu Z, Guo X Q, Lin H B, Lumsdaine A and Yu J Y. 2013. Line assisted light field triangulation and stereo matching//Proceedings of 2013 IEEE International Conference on Computer Vision. Sydney: IEEE: 2792-2799[ DOI: 10.1109/ICCV.2013.347 http://dx.doi.org/10.1109/ICCV.2013.347 ]
Zhang J, Wang M, Gao J, Wang Y, Zhang X D and Wu X D. 2015. Saliency detection with a deeper investigation of light field//Proceedings of the 24th International Conference on Artificial Intelligence. Aires: ACM: 2212-2218
Zhang J, Wang M, Lin L, Yang X, Gao J and Rui Y. 2017. Saliency detection on light field:a multi-cue approach. ACM Transactions on Multimedia Computing, Communications, and Applications, 13(3):#32[DOI:10.1145/3107956]
Zhang S, Sheng H, Li C, Zhang J and Xiong Z. 2016. Robust depth estimation for light fieldvia spinning parallelogram operator. Computer Vision and Image Understanding, 145:148-159[DOI:10.1016/j.cviu.2015.12.007]
相关作者
相关机构
京公网安备11010802024621