Current Issue Cover

岳焕景1, 杨文瀚2, 李重仪3, 杨铀4, 刘文予4, 杨敬钰1(1.天津大学电气自动化与信息工程学院, 天津 300072;2.鹏城实验室战略与交叉前沿研究部, 深圳 518055;3.南开大学计算机学院, 天津 300350;4.华中科技大学电子信息与通信学院, 武汉 430074)

摘 要
底层视觉重建技术旨在在受限的成像条件下重建高质量图像/视频,对后续视觉处理与呈现具有重要意义。由于像感域数据(raw data)具有高位宽、与感光量成线性响应等特点,近年来基于像感域的视觉重建技术在学术界和工业界获得的关注日益提高。本文聚焦于 6 种代表性视觉重建任务,包括低光增强与去噪、超分辨率、高动态范围重建、去摩尔纹、多任务联合重建以及数据生成,重点综述了深度学习驱动的像感域视觉重建领域的进展:系统地总结了领域代表性方法,概述各类方法的优势与局限,分析了不同任务中像感域数据相较于颜色域数据(经降噪、去马赛克、白平衡、色调映射和颜色空间转换(如 RGB、sRGB 等)等处理之后的数据)的独特属性与优势;梳理了各个领域的开源数据集,包括图像数据集、快速连拍数据集以及视频数据集,总结了数据集的构造方法以及配对数据的空间/时间对齐策略,为后续研究的数据集创建提供了参考与指引;总结了现有方法存在的问题与困境,展望了像感域底层视觉重建的发展趋势。
Advances of low-level vision reconstruction in raw domain

Yue Huanjing1, Yang Wenhan2, Li Chongyi3, Yang You4, Liu Wenyu4, Yang Jingyu1(1.School of Electrical and Information Engineering, Tianjin University, Tianjin 300072, China;2.Department of Strategic and Advanced Interdisciplinary, PengCheng Laboratory, Shenzhen 518055, China;3.College of Computer Science, Nankai University, Tianjin 300350, China;4.School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan 430074, China)

The low-level vision reconstruction technology aims to reconstruct high-quality images and videos under limited imaging conditions,which is important for subsequent visual analysis. The images(videos)in the raw domain have two advantageous features:wider bit depth(10,12,14 bits)and intensity linear to the irradiance. As a result,raw images contain the most original information and the noise statistics are also simpler than those in standard RGB(sRGB)domain. Therefore,low-level vision reconstruction with raw inputs has achieved an increasing attention from academic and industrial communities in recent years. This review focuses on the low-level vision reconstruction technology in the raw domain and mainly investigates the progress of deep learning-based vision reconstruction. Six representative vision reconstruction tasks in raw domain are selected,namely,low-light enhancement and denoising,super-resolution,high dynamic range (HDR)reconstruction,moiré removal,multi-task joint reconstruction,and raw image generation,for a comprehensive review. Representative methods in the six fields are systematically summarized,the advantages and problems of various methods are outlined,and the advantages and unique attributes of raw images(videos)compared with sRGB images(videos)are highlighted in different tasks. Thereafter,the currently open-source low-level vision reconstruction datasets in raw domain in various fields are summarized,including image,burst image,and video datasets. The dataset construction methods for each task are introduced. Different strategies to solve the key problems in dataset construction,namely,spatial alignment and temporal alignment,are also introduced. We hope these summarization and comparisons can provide references for the followers who construct their own datasets. This review would like to point out that the six tasks not only have unique problems but also have common issues. For example,for denoising and enhancement of videos captured in low light,constructing a supervised dataset with realistic motions and fine-scale textures is still difficult. For multi-frame superresolution,the key problem is constructing the accurate alignment module. For HDR reconstruction,the deghosting performance still needs to be improved in dark and over-exposed areas. For demoiréing,balancing the performance between color recovery and moiré removal needs to be explored. For multi-task joint reconstruction,improving the adjustability and interpretability of the model is a key problem. Meanwhile,all the six tasks need to recover the correct colors while completing their own tasks. However,they have different optimization directions. Introducing special modules to ensure their similar optimization directions may be a good solution. In addition,achieving accurate alignment between degraded and ground truth images is difficult,and many datasets exhibit misalignment. Then,we review representative industrial applications of raw domain reconstruction,including intelligent image signal processing and night imaging in smartphones,low-light and HDR imaging in security monitoring cameras,and raw domain detection in driverless cars. Finally,based on the existing problems and challenges of raw domain vision reconstruction,we identify four possible development trends for raw domain vision reconstruction. First is exploiting the specific properties of raw images(videos)for a specific task. Current methods usually utilize the advantages of raw data in terms of wider bit depth and linearity to intensity. Only a few works utilize the specialized structures of raw data. For example,the moiré distribution in different channels differs,and the green channel usually has higher intensities than other channels. We expect more works exploring the special properties of raw data in popular denoising and super-resolution tasks. Second is improving the availability of large-scale raw data. Many cameras do not provide the raw outputs due to the large memory cost. Therefore,the current constructed raw domain datasets are usually smaller than those in sRGB domain. A feasible solution is to design the raw image compression method with sRGB image guidance for enabling raw domain decoding with a few meta data. Third is alleviating the data-bias problem. The model trained with the raw data captured with one camera may not work well when dealing with raw images captured with other cameras. Alleviating the data-bias is important for real applications. One feasible solution is to jointly utilize physicsand data-driven models. Fourth is further improving raw reconstruction performance with large models. The scale of data is important to improve the reconstruction quality. One solution is to first train a large model with a large-scale dataset and then distill the large model to a small one. Then,the small model can be deployed in various edge devices. In summary, we expect more works exploring low-level vision reconstruction in raw domain to improve the imaging quality of various vision systems.