李国庆,赵洋,刘青萌,殷翔宇,王业南(合肥工业大学计算机与信息学院, 合肥 230009;工业安全与应急技术安徽省重点实验室, 合肥 230009)
目的 图像质量评估是计算机视觉、图像处理等领域的基础研究课题之一，传统评估方法常基于图像低层视觉特征而忽略了高层语义信息，这也在一定程度上影响了客观指标和主观视觉质量的一致性。近年来，感知损失被广泛应用于图像风格化、图像复原等研究中，通过使用预训练的深度网络对图像进行多层语义分解，在相关问题上取得了较好的效果。受感知损失启发，提出一种多层感知分解的全参考图像质量评估方法。方法 首先使用预训练的深度网络对图像进行多层语义分解，获取多层特征图，再计算失真图像与参考图像之间的相似度，以及它们的不同层级特征图之间的相似度，最终得出兼顾了高层语义信息的图像质量分数。结果 针对传统方法PSNR（peak signal-to-noise ratio）、SSIM（structure similarity）、MS-SSIM（multi-scale structure similarity）及FSIM（feature similarity）进行实验，结果表明，本文方法能够有效提升传统图像质量评估方法的性能，在SRCC（Spearman rank order correlation coefficient）、KRCC（Kendall rank order correlation coefficient）、PLCC（Pearson linear correlation coefficient）和RMSE（root mean squared error）客观指标上均有相应提升。通过使用本文框架，PSNR、SSIM、MS-SSIM、FSIM方法在TID2013数据库上SRCC指标分别获得0.02、0.07、0.06和0.04的提升。结论 本文提出的一种多层感知分解的全参考图像质量评估方法，结合传统方法与深度学习方法，兼顾了图像低层视觉特征和高层语义信息，从而有效地提升了传统方法的评估性能，使客观评估结果更加符合主观视觉感受，同时，本文提出的评估框架能够适用于多种传统方法的性能提升。
Multi-layer perceptual decomposition based full reference image quality assessment
Li Guoqing,Zhao Yang,Liu Qingmeng,Yin Xiangyu,Wang Yenan(School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230009, China;Anhui Province Key Laboratory of Industry Safety and Emergency Technology, Hefei 230009, China)
Objective IQA (image quality assessment) is one of the fundamental research topics in the fields of computer vision and image processing. Traditional quality assessment methods are mainly based on low-level visual features and generally ignore high-level semantic information. Traditional IQA methods mainly rely on single pixel intensity or low-level visual features, such as image contrast, image edges, etc., to assess images. PSNR (peak signal-to-noise ratio) is a basic and commonly used tool for directly comparing the differences of pixel intensities between the test image and the reference image By contrast, human visual systems extract structural information from visual scenes. The PSNR cannot accurately measure the subjective visual quality. To extract the structure information and attain a better evaluation, various kinds of improved IQA methods have been proposed. Many methods first decompose an image into different aspects to extract information that effectively measures image quality. However, these traditional methods still omit the high-level semantic information. With the rapid development of deep learning algorithms, high-level semantic information can be effectively extracted by deep networks. Given their special hierarchical structure, deep networks can analyze and understand images in different levels. In recent years, perceptual loss based on deep network has been widely used in many computer vision applications, such as image style-transfer, non-photorealistic rendering, image restoration, etc. By utilizing a pre-trained deep network to decompose an image into different semantic levels, satisfactory results can be produced for related tasks. Inspired by the perceptual loss, we proposed a multi-layer perceptual decomposition-based full-reference image quality assessment method. Method First, a pre-trained deep network was used to decompose the input image and extract the multi-layer feature maps. Many pre-trained deep networks could be employed for this purpose. On the basis of previous studies on perceptual loss, the VGG-19 network was selected because of its effectiveness. VGG-19 is composed of several different layers, including the convolutional, activation function, pool, dropout, fully connected, and softmax layers. These elements are stacked in a specific order to form a completed network model. This network has been widely applied because it can achieve impressive results in many recognition tasks. To reduce complexity, several layers were set as the abstraction layer for extracting feature maps. Second, the proposed method calculated not only the similarity between the test image and the reference image but also the similarity between their multi-level feature maps. The feature maps at the lower level can reflect the differences of the image in the edge, detail, texture, and some low-level features, whereas the feature maps at the higher level can reflect the saliency and semantic differences of the image in the region of interest. Finally, an image quality score that considered the similarity of high-level semantic was obtained. Compared with existing DNN (deep neural network)-based IQA methods, the pre-trained deep network was merely utilized to decompose the image rather than to fit the subjective mean opinion scores. Thus, the proposed method did not need to train a new IQA network in contrast to other DNN-based methods. Moreover, the proposed method was an open and elastic framework that improved the performance of traditional methods by extracting additional high-level semantic information. Therefore, numerous traditional full reference IQA methods can be further improved by exploiting the proposed framework. In this paper, a number of typical and efficient traditional IQA methods were improved and evaluated by proposed method. These IQA methods included the PSNR, the SSIM (structure similarity), and its two effective variants, namely, MS-SSIM (multi-scale structure similarity) and FSIM (feature similarity). Other full reference IQA methods can also be improved by the proposed semantic decomposition-based framework. Result The experimental data were derived from the TID2013 dataset, which includes 25 reference images and 3 000 distorted images. Compared with other existing databases, TID2013 has more images and distortion types, guaranteeing more reliable results. The experimental results of the selected traditional methods, namely, PSNR, SSIM, MS-SSIM, and FSIM, showed that the proposed method can effectively improve the performance of traditional image quality assessment methods and achieve corresponding improvements in many objective criteria, such as SRCC (Spearman rank order correlation coefficient), KRCC (Kendall rank order correlation coefficient), PLCC (Pearson linear correlation coefficient), and RMSE (root mean squared error). The SRCC indicators were increased by 0.02, 0.07, 0.06, and 0.04 for PSNR, SSIM, MS-SSIM, and FSIM, respectively, on the TID2013 dataset. SRCC and KRCC measure the predicting monotonicity. PLCC is calculated to predict accuracy. RMSE is used to predict consistency. These traditional assessments can attain higher SRCC, KRCC, and PLCC values by using the proposed method. For the RMSE, the proposed methods can achieve much lower results than those of the corresponding conventional IQA methods. In addition, the results for different distortion types demonstrated that the proposed method can effectively improve the performance. Conclusion This paper proposed a full-reference image quality assessment method based on perceptual decomposition that combined the benefits of traditional methods and deep learning methods. By simultaneously considering the low-level visual features and high-level semantic information, the proposed method effectively improved the evaluation performance of traditional methods. By incorporating the additional high-level semantic information, the IQA results became more consistent with the subjective visual perception. Furthermore, the proposed evaluation framework can also be applied to other traditional full reference IQA methods.