面向本征图像分解的高质量渲染数据集与非局部卷积网络

王玉洁; 樊庆楠; 李坤; 陈冬冬; 杨敬钰; 卢健智; Dani Lischinski; 陈宝权

发布时间： 2022-02-22
摘要点击次数： 2545
全文下载次数： 830
DOI: 10.11834/jig.210705
2022 | Volume 27 | Number 2

面向本征图像分解的高质量渲染数据集与非局部卷积网络

王玉洁¹, 樊庆楠², 李坤³, 陈冬冬⁴, 杨敬钰³, 卢健智⁵, Dani Lischinski⁶, 陈宝权⁷(1.山东大学, 青岛 266237;2.腾讯AI Lab, 深圳 518057;3.天津大学, 天津 300072;4.微软云人工智能, 华盛顿 98052, 美国;5.广东三维家信息科技有限公司, 广州 510000;6.耶路撒冷希伯来大学, 耶路撒冷 91904, 以色列;7.北京大学, 北京 100091)

摘要

目的本征图像分解是计算视觉和图形学领域的一个基本问题，旨在将图像中场景的纹理和光照成分分离开来。基于深度学习的本征图像分解方法受限于现有的数据集，存在分解结果过度平滑、在真实数据泛化能力较差等问题。方法首先设计基于图卷积的模块，显式地考虑图像中的非局部信息。同时，为了使训练的网络可以处理更复杂的光照情况，渲染了高质量的合成数据集。此外，引入了一个基于神经网络的反照率图像优化模块，提升获得的反照率图像的局部平滑性。结果将不同方法在所提的数据集上训练，相比之前合成数据集CGIntrinsics进行训练的结果，在IIW （intrinsic images in the wild）测试数据集的平均WHDR （weighted human disagreement rate）降低了7.29%，在SAW （shading annotations in the wild）测试集的AP （average precision）指标上提升了2.74%。同时，所提出的基于图卷积的神经网络，在IIW、SAW数据集上均取得了较好的结果，在视觉结果上显著优于此前的方法。此外，利用本文算法得到的本征结果，在重光照、纹理编辑和光照编辑等图像编辑任务上，取得了更优的结果。结论所提出的数据集质量更高，有利于基于神经网络的本征分解模型的训练。同时，提出的本征分解模型由于显式地结合了非局部先验，得到了更优的本征分解结果，并通过一系列应用任务进一步验证了结果。

关键词

图像处理图像理解本征图像分解图卷积网络(GCN) 合成数据集

High quality rendered dataset and non-local graph convolutional network for intrinsic image decomposition

Wang Yujie¹, Fan Qingnan², Li Kun³, Chen Dongdong⁴, Yang Jingyu³, Lu Jianzhi⁵, Dani Lischinski⁶, Chen Baoquan⁷(1.Shandong Univeristy, Qingdao 266237, China;2.Tencent AI Lab, Shenzhen 518057, China;3.Tianjin Uniersity, Tianjin 300072, China;4.Microsoft Cloud AI, Washington 98052, USA;5.Sunvega Company, Guangzhou 510000, China;6.The Hebrew University of Jerusalem, Jerusalem 91904, Israel;7.Peking University, Beijing 100091, China)

Abstract

Objective Intrinsic decomposition is a key problem in computer vision and graphics applications. It aims at separating lighting effects and material-oriented characteristics of object surfaces of the depicted scene within the image. Intrinsic decomposition from a single input image is highly ill-posed since the amount of unknowns is twice of the known values. Most classical approaches model intrinsic decomposition task with handcrafted priors to generate reasonable decomposition results. But they perform poorly in complicated scenarios as the prior knowledge is too limited to model complicated light-material interactions in real-world scenes. Deep neural network based methods can automatically learn the knowledge from data to avoid using handcrafted priors to model the task. However, due to the dependency on training datasets, the performance of current deep learning based methods is still limited because of various constraints in the current intrinsic datasets. Moreover, the learned networks tend to suffer from poor generalization once there is a large difference between the training and target domain. Another issue of deep neural network based methods is that the limited receptive field probably constrains the ability of the models to exploit the non-local information in the intrinsic component prediction process. Method A graph convolution based module is designed to fully utilize the non-local cues within the input feature space. The module takes a feature map as input and outputs a feature map with same resolution as the input feature map. For producing the output feature vector for each position, the module uses information that includes the feature of itself, the information extracted from the local neighborhood and the information aggregated from the non-local neighbors that are likely to be very distant. The full intrinsic decomposition framework is constructed by integrating the devised non-local feature learning module into a U-Net network. In addition, to improve the piece-wise smoothness of the produced albedo results, we incorporate a neural network based image refinement module into the full pipeline, which is able to adaptively remove unnecessary artifacts while preserving structural information within the scenes depicted in input images. Simultaneously, there are noticeable limitations in existing intrinsic image datasets including limited sample amount, unrealistic scene and achromatic lighting in shading and sparse annotations, which will cause generalization issues for deep learning models and limit the decomposition performance as well. A new photorealistic rendered dataset for intrinsic image decomposition is proposed, which is rendered by leveraging large-scale 3D indoor scene models, along with high-quality textures and lighting to simulate the real-world environment. The chromatic shading components are first implemented. Result To validate the effectiveness of the proposed dataset, several state-of-the-art methods are trained on both the proposed dataset and CGIntrinsics dataset, a previously proposed dataset, and tested on intrinsic image evaluation benchmarks, i.e., intrinsie images in the wild (IIW)/shading annotations in the wild (SAW) test sets. Compared to the variants trained on CGIntrinsics dataset, the variants trained on the proposed dataset demonstrate a 7.29% improvement in averaging weighted human disagreement rate (WHDR) on IIW test set and a 2.74% gain for average precision (AP) on SAW test set. Simultaneously, the proposed graph convolution based network achieves comparable quantitative results on both IIW and SAW test sets and gets significantly better qualitative results. To further investigate the intrinsic decomposition quality for different methods, a number of application tasks including re-lighting and texture/lighting editing are conducted utilizing the generated intrinsic components. The proposed method demonstrates more promising application effects comparing with two state-of-the-art methods, further highlighting its superiority and application potential. Conclusion Based on the non-local priors in classical methods for intrinsic image decomposition, a graph convolutional network for intrinsic decomposition is proposed, in which non-local cues are utilized. To mitigate the issues existed in current intrinsic image datasets, a new high quality photorealistic dataset is rendered, which provides dense labels for albedo and shading. The depicted scenes in the images of the proposed dataset have complicated textures and illuminations that closely approximate general indoor scenes in reality, which helps to mitigate the domain gap issues. The shading labels in this dataset first consider chromatic lighting, which allows the neural networks to better separate material properties and lighting effects, especially for the effects introduced by inter-reflections between diffuse surfaces. The decomposition results of both the proposed method and two current state-of-the-art methods are applied to a range of application scenarios, visually demonstrating the superior decomposition quality and application potentials of the proposed method.

Keywords

image processing image understanding intrinsic image decomposition graph convolutional neural network(GCN) synthetic dataset