Current Issue Cover
深度残差学习下的光源颜色估计

崔帅, 张骏, 高隽(合肥工业大学计算机与信息学院, 合肥 230601)

摘 要
目的 颜色恒常性通常指人类在任意光源条件下正确感知物体颜色的自适应能力,是实现识别、分割、3维视觉等高层任务的重要前提。对图像进行光源颜色估计是实现颜色恒常性计算的主要途径之一,现有光源颜色估计方法往往因局部场景的歧义颜色导致估计误差较大。为此,提出一种基于深度残差学习的光源颜色估计方法。方法 将输入图像均匀分块,根据局部图像块的光源颜色估计整幅图像的全局光源颜色。算法包括光源颜色估计和图像块选择两个残差网络:光源颜色估计网络通过较深的网络层次和残差结构提高光源颜色估计的准确性;图像块选择网络按照光源颜色估计误差对图像块进行分类,根据分类结果去除图像中误差较大的图像块,进一步提高全局光源颜色估计精度。此外,对输入图像进行对数色度预处理,可以降低图像亮度对光源颜色估计的影响,提高计算效率。结果 在NUS-8和重处理的ColorChecker数据集上的实验结果表明,本文方法的估计精度和稳健性较好;此外,在相同条件下,对数色度图像比原始图像的估计误差低10% 15%,图像块选择网络能够进一步使光源颜色估计网络的误差降低约5%。结论 在两组单光源数据集上的实验表明,本文方法的总体设计合理有效,算法精度和稳健性好,可应用于需要进行色彩校正的图像处理和计算机视觉等领域。
关键词
Illuminant estimation via deep residual learning

Cui Shuai, Zhang Jun, Gao Jun(School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230601, China)

Abstract
Objective Color constancy refers to the human ability that allows the brain to recognize an object as having a consistent color under varying illuminants. Color constancy has become an important prerequisite of high-level tasks, such as recognition, segmentation, and 3D vision. In the computer vision community, the goal of computational color constancy is to remove illuminant color casts and obtain accurate color representations for images. Therefore, illuminant estimation is an important means to achieve computational color constancy, which is a difficult and underdetermined problem because the observed image color is influenced by unknown factors, such as scene illuminants and object reflections. Illuminant estimation methods can be categorized into two classes:statistics-based (or static) and learning-based methods. Statistics-based methods estimate the illuminant based on the statistical properties (e.g., reflectance distributions) of the image. Learning-based methods learn a model from training images then estimate the illuminant using the model. Convolutional neural networks (CNNs) are very powerful methods of estimating illuminants, and many competitive results have been obtained with CNN-based methods. We propose a CNN-based illuminant estimation algorithm in this study. We use deep residual learning to improve network accuracy and a patch-selecting network to overcome the color ambiguity issue of local patches. Method We uniformly sample local patches from the image, estimate the local illuminant of each patch individually, and generate a global illuminant estimation of the entire image by combining the local illuminants. We use a 64×64 patch size in the patch sampling to guarantee the estimation accuracy of the local illuminant and provide sufficient training inputs without data augmentation. The proposed approach includes two residual networks, namely, illuminant estimation net (IEN) and patch selection net (PSN). IEN estimates the local illuminant of image patches. To improve the estimation accuracy of IEN, we increase the feature extraction hierarchy by adding network depth and use the residual structure to ensure gradient propagation and facilitate the training of the deep network. IEN is based on the residual structure, which consists of many stacked 3×3 and 1×1 convolutional layers, batch normalization layers, and rectified linear unit layers. The remaining part is composed of one global average pooling layer and one full connection layer. We use Euclidean loss and stochastic gradient descent (SGD) to optimize IEN. PSN shares a similar architecture with IEN, except that PSN has an additional Softmax layer that serves as the classifier at the end of the network. PSN is proposed to classify image patches according to their illuminant estimation errors. We use cross entropy loss and SGD to optimize PSN. According to the results of PSN, patches with a large estimation error are removed from the entire image, thus improving the performance of global illuminant estimation. Additionally, we preprocess the input image by using the log-chrominance algorithm, which converts a three-channel RGB image into a two-channel log-chrominance image; this reduces the influence of image luminance and improves the computational efficiency by decreasing the amount of data by 1/3. Result We implement the proposed IEN and PSN on the Caffe library. To evaluate the performance of our approach, we use two standard single-illuminant datasets, namely, the NUS-8 dataset and the reprocessed ColorChecker dataset. Both datasets include indoor and outdoor images, and a Macbeth ColorChecker is placed in each image to calculate the ground truth illuminant. The NUS-8 dataset contains 1 736 images captured from 8 different cameras, and the reprocessed ColorChecker dataset consists of 568 images from 2 cameras. Following the configurations of previous studies, we report the following metrics:the mean, the median, the tri-mean, and the mean of the lowest 25% and the highest 25% of angular errors. We also report the additional metric of the 95th percentile for the reprocessed ColorChecker dataset. We divide the NUS-8 dataset into eight subsets, apply three-fold cross-validation on the eight subsets individually, and report the geometric mean of the proposed metrics for all eight subsets. We directly apply three-fold cross-validation on the reprocessed ColorChecker dataset. Experimental results show that the proposed approach is competitive with state-of-the-art methods. For the NUS-8 dataset, the proposed IEN achieves the best results among all compared methods, and the proposed PSN can further increase the precision of the IEN results. For the reprocessed ColorChecker dataset, our results are comparable with those of other advanced methods. In addition, we conduct ablation studies to evaluate the model components of the proposed approach. We compare the proposed IEN with several shallower CNNs. Experimental results show that deep residual learning is effective in improving illuminant estimation accuracy. Moreover, compared with the estimated illuminant on the original image, log-chrominance preprocessing can reduce the illuminant estimation error by 10% to 15%. The proposed PSN can further decrease the global illuminant estimation error by 5% compared with the method that uses IEN alone. Finally, we evaluate the time cost of our method on a PC with an Intel i5 2.7 GHz CPU, 16GB of memory, and an NVIDIA GeForce GTX 1080Ti GPU. Our code takes less than 1.4 s to estimate a 2 K image, which has a typical resolution of 2 048×1 080 pixels. Conclusion Experiments on the two single-illuminant datasets show that the proposed approach, which includes log-chrominance preprocessing, deep residual learning-based network structure, and patch selection for global illuminant estimation, is reasonable and effective. The proposed approach has high precision and robustness and can be widely used in image processing and computer vision systems that require color calibrations.
Keywords

订阅号|日报