Dual-pixel imaging and applications: an overview

Yuchao Dai; Feiyu Zhang; Liyuan Pan; Mochu Xiang; Mingyi He

doi:10.11834/jig.210984

Review | Views : 0 下载量: 1 CSCD: 0

PDF
Export
Share
Collection
Album

Dual-pixel imaging and applications: an overview
Vol. 27, Issue 12, Pages: 3395-3414(2022)
Published： 16 December 2022 ，

Accepted： 13 March 2022
DOI： 10.11834/jig.210984
稿件说明：

移动端阅览

Yuchao Dai, Feiyu Zhang, Liyuan Pan, Mochu Xiang, Mingyi He. Dual-pixel imaging and applications: an overview. [J]. Journal of Image and Graphics 27(12):3395-3414(2022)
DOI：

Yuchao Dai, Feiyu Zhang, Liyuan Pan, Mochu Xiang, Mingyi He. Dual-pixel imaging and applications: an overview. [J]. Journal of Image and Graphics 27(12):3395-3414(2022) DOI： 10.11834/jig.210984.

摘要

全像素双核(dual-pixel

DP)自动对焦(dual-pixel CMOS auto focus

DP CMOS AF)采用混合检测自动对焦，其在每个像素配备两个光电二极管

使每个像素既参与对焦又参与成像，克服了传统相位检测自动对焦和反差检测自动对焦技术的缺点。根据离焦视差估计图像合焦镜头所需移动的距离，DP自动对焦具有更快的对焦速度和更高的对焦精度，因此广泛应用在手持设备中(如手机相机、单反相机等)。由于全像素双核传感器将每个像素分成两半，该传感器一次拍摄即可得到两幅图像。这两幅图像(全像素双核图像对)可以看做一个具有相同曝光时间和严格校正的小基线立体图像对。该图像对的视差与图像模糊程度相对应，只在离焦区域存在视差。全像素双核传感器不仅用于自动对焦，而且可以用于深度估计、散焦去模糊和反射去除等方面。本文系统地综述了全像素双核传感器的自动对焦、成像原理及研究现状，并进一步展望其未来发展。1)对自动对焦技术进行介绍，对比了传统对焦与全像素双核对焦；2)详细分析了全像素双核传感器的成像原理、成像模型及特点；3)系统地介绍了全像素双核在计算机视觉领域应用的最新进展，从深度估计、反射去除和离焦模糊去除等方面进行全面阐述及分析；4)适当的数据集是基于深度学习方法的基础，对目前的全像素双核数据集进行了介绍；5)分析了全像素双核在计算机视觉领域面临的挑战与机遇，对未来的全像素双核应用方向进行了讨论与展望。

Abstract

Dual-pixel (DP) sensor is a kind of Canon-originated hardware technique for autofocusing in 2013. Conventional autofocus methods are divided into two major categories: phase-based and contrast-based methods. However

phase-based autofocusing has higher electronic complexity

and contrast-based autofocusing runs slower in practice. Therefore

current hybrid detection autofocus technique is more concerned

which yields some pixels to imaging and focusing. However

the resolution loss issue cannot be avoided. Hybrid DP-based autofocusing methods enable each pixel to be integrated for both imaging and focusing

which improves the cost-efficiency focusing accuracy. Therefore

it has been widely used in mobile phone cameras and digital single lens reflex (DSLR) cameras. In recent years

DP sensors have been offered by sensor manufacturers and occupied the vast majority of camera sensor market. To guarantee the performance of focusing and imaging

each pixel in a DP-based sensor is equipped with two photodiodes. A DP-based sensor can assign each pixel to two halves and two images can be obtained simultaneously. These two images (DP image pair) could be viewed as a perfectly rectified stereo image pair in relation to a tiny baseline and the same exposure time

or as a two viewing angled light field camera. Unlike stereo image pairs

DP image pairs only have disparity at the out-of-focus regions

while the in-focus regions have no disparity. Defocus disparity

which only exists in the out-of-focus regions

is directly connected with the depth of the scene being captured and is generated by the point spread function. The point spread functions of the left and right views of DP are approximately symmetric

and the point spread functions based focus are relatively symmetric before and after as well. This relationship and the special point spread function can provide extra information for various computer vision tasks. Therefore

the obtained DP image pair can also be used for depth estimation

defocus deblurring and reflection removal beyond automatic focusing applications of the DP sensors. In particular

the relationship between depth and blur size in DP sensor is effective to deal with the depth from defocus task and the defocus deblur task. We critically review the autofocus

imaging principle and current situation of the DP-based sensor. 1) To provide a basic understanding of the dual-pixel sensors

we introduce the dual-pixel imaging model and imaging principle. 2) To specify the breakthrough of them

we carry out comparative analysis related to dual-pixel research in recent years. 3) To develop a reference for researchers further

we trace the current open-source dual-pixel datasets and simulators to facilitate data acquisition. Specifically

we firstly describe dual-pixel from the point of view of enabling automatic focus

where three conventional autofocus methods: 1) phase detection autofocus (PDAF)

2) contrast detection autofocus (CDAF)

and 3) hybrid autofocus. The principle and priority of dual-pixel autofocus are critically reviewed in Section I. In Section II

we review the relevant optical concepts and camera imaging model. The imaging principle and geometric features of dual-pixel are introduced on four aspects: 1) dual-pixel geometry

2) dual-pixel affine ambiguity

3) dual-pixel point spread function

and 4) the difference between dual-pixel image pair and stereo image pair. It shows that DP image pairs can aid downstream tasks and how to mine effective hidden information from the DP image pairs. DP-based defocus disparity is linked to the contexted depth in terms of the affine ambiguity of dual-pixel

which can be used as a cue of depth estimation

defocus deblur and other related tasks. In Section Ⅲ

we summarize the applications of the DP image pairs in the context of three computer vision tasks: 1) depth estimation

2) reflection removal

and 3) defocus deblur. As appropriate datasets are fundamental to designing deep learning based architecture of neural networks better in contrast to conventional methods

we briefly introduce the community-derived DP datasets and summarize the algorithm principles of the current DP simulators. Finally

the future challenges and opportunities of the DP sensor have been discussed further in Section V.

关键词

全像素双核(DP)自动对焦深度学习相机成像反射去除深度估计

Keywords

dual-pixel(DP)autofocusdeep learningcamera imagingreflection removaldepth estimation

references

Abuolaim A and Brown M S. 2020. Defocus deblurring using dual-pixel data//Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer: 111-126 [DOI: 10.1007/978-3-030-58607-2_7http://dx.doi.org/10.1007/978-3-030-58607-2_7]

Abuolaim A, Delbracio M, Kelly D, Brown M S and Milanfar P. 2021. Learning to reduce defocus blur by realistically modeling dual-pixel data//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE: 2269-2278 [DOI: 10.1109/ICCV48922.2021.00229http://dx.doi.org/10.1109/ICCV48922.2021.00229]

Abuolaim A, Afifi M and Brown M S. 2022. Improving single-image defocus deblurring: How dual-pixel images help through multi-task learning//Proceedings of the 2022 IEEE/CVF Winter Conference on Applications of Computer Vision. Hawall, USA: IEEE: 1231-1239 [DOI: https://doi.org/10.48550/arXiv.2108.05251https://doi.org/10.48550/arXiv.2108.05251]

Bi T T, Liu Y, Weng D D and Wang Y T. 2018. Survey on supervised learning based depth estimation from a single image. Journal of Computer-Aided Design and Computer Graphics, 30(8): 1383-1393

毕天腾, 刘越, 翁冬冬, 王涌天. 2018. 基于监督学习的单幅图像深度估计综述. 计算机辅助设计与图形学学报, 30(8): 1383-1393 [DOI: 10.3724/SP.J.1089.2018.16882]

Canon China. 2013. Dual pixel CMOS AF-principles of focusing [EB/OL]. [2021-11-02].https://www.canon.com.cn/special/dualpixelcomsaf/principles.htmlhttps://www.canon.com.cn/special/dualpixelcomsaf/principles.html

佳能中国. 2013. 全像素双核CMOS AF-对焦的原理[EB/OL]. [2021-11-02].https://www.canon.com.cn/special/dualpixelcomsaf/principles.htmlhttps://www.canon.com.cn/special/dualpixelcomsaf/principles.html

Cheng X L, Zhong Y R, Harandi M, Dai Y C, Chang X J, Drummond T, Li H D and Ge Z Y. 2020. Hierarchical neural architecture search for deep stereo matching[EB/OL]. [2021-03-27].https://arxiv.org/pdf/2010.13501.pdfhttps://arxiv.org/pdf/2010.13501.pdf

DiVerdi S and Barron J T. 2016. Geometric calibration for mobile, stereo, autofocus cameras//Proceedings of 2016 IEEE Winter Conference on Applications of Computer Vision. Lake Placid, USA: IEEE: 1-8 [DOI: 10.1109/WACV.2016.7477646http://dx.doi.org/10.1109/WACV.2016.7477646]

Garg R, Wadhwa N, Ansari S and Barron J T. 2019. Learning single camera depth estimation using dual-pixels//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 7627-7636 [DOI: 10.1109/ICCV.2019.00772http://dx.doi.org/10.1109/ICCV.2019.00772]

Godard C, Aodha O M and Brostow G J. 2017. Unsupervised monocular depth estimation with left-right consistency//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 6602-6611 [DOI: 10.1109/CVPR.2017.699http://dx.doi.org/10.1109/CVPR.2017.699]

Hartley R and Zisserman A. 2000. Multiple View Geometry in Computer Vision. 2nd ed. New York, United States: Cambridge University Press

He K M, Zhang X Y, Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 770-778 [DOI: 10.1109/CVPR.2016.90http://dx.doi.org/10.1109/CVPR.2016.90]

Huang J, Wang C, Liu Y and Bi T T. 2019. The progress of monoculardepth estimation technology. Journal of Image and Graphics, 24(12): 2081-2097

黄军, 王聪, 刘越, 毕天腾. 2019. 单目深度估计技术进展综述. 中国图象图形学报, 24(12): 2081-2097 [DOI: 10.11834/jig.190455]

Jang J, Park S, Jo J, Kim J and Paik J. 2016. Hybrid auto-focusing system using dual pixel-type CMOS sensor with contrast detection algorithm//Proceedings of 2016 Imaging Systems and Applications. Heidelberg, Germany: Optica Publishing Group: IW3F. 2 [DOI: 10.1364/ISA.2016.IW3F.2http://dx.doi.org/10.1364/ISA.2016.IW3F.2]

Jang J, Yoo Y, Kim J and Paik J. 2015. Sensor-based auto-focusing system using multi-scale feature extraction and phase correlation matching. Sensors, 15(3): 5747-5762 [DOI: 10.3390/s150305747]

Levin A and Weiss Y. 2007. User assisted separation of reflections from a single image using a sparsity prior. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(9): 1647-1654 [DOI: 10.1109/TPAMI.2007.1106]

Lowe D G. 1999. Object recognition from local scale-invariant features//Proceedings of the 7th IEEE International Conference on Computer Vision. Kerkyra, Greece: IEEE: 1150-1157 [DOI: 10.1109/ICCV.1999.790410http://dx.doi.org/10.1109/ICCV.1999.790410]

Morgan M J and Castet E. 1997. The aperture problem in stereopsis. Vision Research, 37(19): 2737-2744 [DOI: 10.1016/S0042-6989(97)00074-6]

Pan L Y, Chowdhury S, Hartley R, Liu M M, Zhang H G and Li H D. 2021. Dual pixel exploration: simultaneous depth estimation and image restoration//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 4338-4347 [DOI: 10.1109/CVPR46437.2021.00432http://dx.doi.org/10.1109/CVPR46437.2021.00432]

Punnappurath A, Abuolaim A, Afifi M and Brown M S. 2020. Modeling defocus-disparity in dual-pixel sensors//Proceedings of 2020 IEEE International Conference on Computational Photography. St. Louis, USA: IEEE: 1-12 [DOI: 10.1109/ICCP48838.2020.9105278http://dx.doi.org/10.1109/ICCP48838.2020.9105278]

Punnappurath A and Brown M S. 2019. Reflection removal using a dual-pixel sensor//Proceedings of 2019IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 1556-1565 [DOI: 10.1109/CVPR.2019.00165http://dx.doi.org/10.1109/CVPR.2019.00165]

Ronneberger O, Fischer P and Brox T. 2015. U-Net: convolutional networks for biomedical image segmentation//Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention. Munich, Germany: Springer: 234-241 [DOI: 10.1007/978-3-319-24574-4_28http://dx.doi.org/10.1007/978-3-319-24574-4_28]

Schönberger J L, Zheng E L, Frahm J M and Pollefeys M. 2016. Pixelwise view selection for unstructured multi-view stereo//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands: Springer: 501-518 [DOI: 10.1007/978-3-319-46487-9_31http://dx.doi.org/10.1007/978-3-319-46487-9_31]

Silberman N, Hoiem D, Kohli P and Fergus R. 2012. Indoor segmentation and support inference from RGBD images//Proceedings of the 12th European Conference on Computer Vision. Florence, Italy: Springer: 746-760 [DOI: 10.1007/978-3-642-33715-4_54http://dx.doi.org/10.1007/978-3-642-33715-4_54]

S'liwiński P and Wachel P. 2013. A simple model for on-sensor phase-detection autofocusing algorithm. Journal of Computer and Communications, 1(6): 11-17 [DOI: 10.4236/jcc.2013.16003]

Wadhwa N, Garg R, Jacobs D E, Feldman B E, Kanazawa N, Carroll R, Movshovitz-Attias Y, Barron J T, Pritch Y and Levoy M. 2018. Synthetic depth-of-field witha single-camera mobile phone. ACM Transactions on Graphics, 37(4): #64 [DOI: 10.1145/3197517.3201329]

Wan R J, Shi B X, Duan L Y, Tan A H and Kot A C. 2017. Benchmarking single-image reflection removal algorithms//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 3942-3950 [DOI: 10.1109/ICCV.2017.423http://dx.doi.org/10.1109/ICCV.2017.423]

Wu J Z, Zheng C W, Hu X H, Ouyang G J and Zhang L Q. 2011. Survey of rendering techniques for depth-of-field effect. Journal of Image and Graphics, 16(11): 1957-1966

吴佳泽, 郑昌文, 胡晓惠, 欧阳冠军, 张利强. 2011. 景深效果绘制技术综述. 中国图象图形学报, 16(11): 1957-1966 [DOI: 10.11834/jig.20111101]

Xin S M, Wadhwa N, Xue T F, Barron J T, Srinivasan P P, Chen J W, Gkioulekas I and Garg R. 2021. Defocus map estimation and deblurring from a single dual-pixel image//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE: 2208-2218 [DOI: 10.1109/ICCV48922.2021.00223http://dx.doi.org/10.1109/ICCV48922.2021.00223]

Zhang H G, Dai Y C, Li H D and Koniusz P. 2019. Deep stacked hierarchical multi-patch network for image deblurring//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 5971-5979 [DOI: 10.1109/CVPR.2019.00613http://dx.doi.org/10.1109/CVPR.2019.00613]

Zhang X E, Ng R and Chen Q F. 2018. Single image reflection separation with perceptual losses//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 4786-4794 [DOI: 10.1109/CVPR.2018.00503http://dx.doi.org/10.1109/CVPR.2018.00503]

Zhang Y D, Wadhwa N, Orts-Escolano S, Häne C, Fanello S and Garg R. 2020. Du2Net: learning depth estimation from dual-cameras and dual-pixels//Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer: 582-598 [DOI: 10.1007/978-3-030-58452-8_34http://dx.doi.org/10.1007/978-3-030-58452-8_34].

Alert me when the article has been cited

提交

Review of monocular depth estimation based on deep learning

Multi-stage guidance network for constructing dense depth map based on LiDAR and RGB data

The progress of monocular depth estimation technology

Survey of digital face rendering and appearance recovery methods

Comprehensive review of methods for vehicle logo recognition in intelligent transportation systems