语义分割和HSV色彩空间引导的低光照图像增强
张航, 颜佳(武汉大学电子信息学院) 摘 要
目的 低光照图像增强是图像处理中的基本任务之一。虽然已经提出了各种方法,但它们往往无法在视觉上产生吸引人的结果,这些图像存在细节不清晰、对比度不高和色彩失真等问题,同时也对后续目标检测、语义分割等任务有不利影响。针对上述问题,提出一种语义分割和HSV色彩空间引导的低光照图像增强方法。方法 首先提出一个迭代图像增强网络,逐步学习低光照图像与增强图像之间像素级的最佳映射,同时为了在增强过程中保留语义信息,引入一个无监督的语义分割网络并计算语义损失,该网络不需要昂贵的分割注释。为了进一步解决色彩失真问题,在训练时利用HSV色彩空间设计HSV损失;为了解决低图像图像增强中出现细节不清晰的问题,设计了空间一致性损失,使增强图像与对应的低光照图像尽可能细节一致。最终,本文的总损失函数由五个损失函数组成。 结果 将本文方法与LIME(lowlight image enhancement)、RetinexNet、EnlightenGAN、Zero-DCE (zero-reference deep curve estimation)和SGZ(Semantic-Guided Zero-Shot Learning)5种方法进行了比较,在峰值信噪比(peak signal-to-noise ratio, PSNR)上,平均比Zero-DCE (zero-reference deep curve estimation)提高了0.32 dB;在自然图像质量评价(natural image quality evaluation, NIQE)方面,比EnlightenGAN提高了6%。从主观上来看,本文方法具有更好的视觉效果。 结论 本文所提出的低光照图像增强方法能有效解决细节不清晰、色彩失真等问题,具有一定的应用价值。
关键词
Low-light image enhancement guided by semantic segmentation and HSV color space
Zhang Hang, Yan Jia(School of Electronic Information,Wuhan University) Abstract
Objective Due to unavoidable environmental and technical limitations, such as insufficient lighting and limited exposure time, images are often taken under sub-optimal lighting conditions, and are disturbed by backlight, uneven illumination and weak light. The quality of such images will be affected, and the information transmission for high-level tasks, such as object tracking, recognition and detection, is also unsatisfactory. Although various methods have been proposed, they often fail to produce attractive results visually. These images have problems such as unclear details, low contrast and color distortion. The existing deep learning methods have better accuracy, robustness and speed than the traditional methods. However, due to the reasons of synthetic data sets, the generalization performance is generally poor. For example, the supervised learning method requires pairs of low-light and normal-light images, and the visual effect of the trained model applied to the real low-light image is very poor. In view of the above problems, a low-light image enhancement method guided by semantic segmentation and HSV color space is proposed. It does not need too much computing resources while restoring the true color and detail texture of the object. Moreover, the generalization performance of the model is better than supervised learning because it is a non-reference training. Method Our framework is an end-to-end low-light image enhancement network based on seven convolutional layers with symmetrical structure similar to U-Net. The input is a low-light image, and the output is a set of best-fit curve parameter graphs. Through iterative application of the curve, all pixels in the RGB channel of the input low-light image are mapped to obtain the final enhanced image. The curve here can automatically map the low-light image to the enhanced image, and the curve parameters are adaptive, only depending on the input image and learning through the network. After the network extracts the curve parameter diagram of the input image, the curve is applied repeatedly for image enhancement, and the enhancement results are evaluated and guided by a series of non-reference loss functions. At the same time, the result of the last step of iterative enhancement is sent into an unsupervised semantic segmentation network to preserve the semantic information of the image. Our loss function includes: 1) spatial consistency loss, which is used to keep the details of the enhanced image consistent with the original image, and solve the problem of unclear details in most low-image image enhancement. The enhanced result and low-light image are divided into several small local regions, aiming to minimize the pixel differences between the corresponding local regions in the enhanced result and the surrounding one-pixel-wide local regions in the low-light image as much as possible; 2) HSV loss, used to restore the color information of the image. The enhanced result and the input low-light image are converted from RGB to the HSV color space, and then the hue and saturation differences for each pixel between the enhanced result and the corresponding low-light image are calculated. A smaller difference in both hue and saturation indicates that the color is closer to the original color of the low-light image; 3) Exposure loss is used to enhance brightness by making each pixel"s brightness closer to a certain middle value, enhancing the overall brightness level of the final image. This middle value represents the ideal exposure value; 4) Semantic loss is used to retain semantic information. The unsupervised semantic segmentation network performs pixel-wise segmentation on the enhanced image, obtaining the predicted probability for each pixel, and using this probability to design the semantic loss; 5) Total variation loss is used to maintain the difference between adjacent pixels of the image. The estimated curve parameter map is smoothed to ensure that the curve parameter values of adjacent pixels are close to each other, and to preserve the monotonicity of the curve as much as possible. Result The method in this paper was compared with five methods including LIME (lowlight image enhancement), RetinexNet, EnlightenGAN, Zero-DCE (zero-reference deep curve estimation) and SGZ (Semantic-Guided Zero-Shot Learning). The quality of enhanced images is objectively evaluated using full-reference evaluation metrics such as peak signal-to-noise ratio (PSNR), structural similarity (SSIM), mean absolute error (MAE), and no-reference evaluation metric natural image quality evaluator (NIQE), while incorporating subjective visual effects for comprehensive evaluation. PSNR is used to measure the level of noise and distortion in an image. In theory, a higher value indicates a smaller error between the enhanced image and the reference image, thus indicating higher quality. SSIM is a perceptual model that aligns with human visual perception and is used to measure the similarity between the enhanced image and the reference image in terms of contrast, brightness, and structure. A higher SSIM value indicates that the enhanced image is closer to the reference image. A smaller MAE value indicates a smaller deviation from the reference image. NIQE compares the image with a designed natural image model, and a lower NIQE value indicates a higher similarity to natural real images. On the peak signal-to-noise ratio , our method is 0. 32 dB higher than Zero-DCE; on natural image quality evaluation value, our method is higher than EnlightenGAN by 6%. From a subjective point of view, the method in this paper solves the problems of unclear details and color distortion existing in other methods, and has better visual effects. Conclusion We introduced an unsupervised semantic segmentation network to perform pixel-wise segmentation on the enhanced images, preserving the semantic information during the enhancement process. By designing a loss function in the HSV color space, we restored the color of low-light images. The spatial consistency loss was designed to ensure that the enhanced images are as detail-consistent as possible with their corresponding low-light images. Subjective and objective evaluations were conducted to demonstrate the superiority of our method over others. Experimental results show that the proposed enhancement method outperforms other methods in both qualitative and quantitative aspects, effectively addressing the issues of unclear details and color distortion in low-light images, and demonstrating its practical value.
Keywords
|