发布时间: 2021-07-16
摘要点击次数:
全文下载次数:
DOI: 10.11834/jig.200546
2021 | Volume 26 | Number 7

图像分析和识别

相位一致性指导的全参考全景图像质量评价

夏雨蒙¹, 王永芳^1,2, 王闯¹

1. 上海大学通信与信息工程学院, 上海 200444;

2. 上海大学上海先进通信与数据科学研究所, 上海 200444

收稿日期: 2020-09-04; 修回日期: 2021-01-25; 预印本日期: 2021-02-01

基金项目: 国家自然科学基金项目(61671283)

作者简介: 夏雨蒙, 1996年生, 女, 硕士研究生, 主要研究方向为图像视频的质量评估。E-mail: xym_96@shu.edu.cn
王永芳, 通信作者, 女, 教授, 主要研究方向为智能多媒体处理与分析、图像视频质量编码与评估。E-mail: yfw@shu.edu.cn
王闯, 男, 硕士研究生, 主要研究方向为感知模型及感知视频编码。E-mail: chuangwang@shu.edu.cn
*通信作者: 王永芳 yfw@shu.edu.cn

中图法分类号: TP391.41

文献标识码: A

文章编号: 1006-8961(2021)07-1625-12

摘要

目的全景图像的质量评价和传输、处理过程并不是在同一个空间进行的，传统的评价算法无法准确地反映用户在观察球面场景时产生的真实感受，针对观察空间与处理空间不一致的问题，本文提出一种基于相位一致性的全参考全景图像质量评价模型。方法将平面图像进行全景加权，使得平面上的特征能准确反映球面空间质量畸变。采用相位一致性互信息的相似度获取参考图像和失真图像的结构相似度。接着，利用相位一致性局部熵的相似度反映参考图像和失真图像的纹理相似度。将两部分相似度融合可得全景图像的客观质量分数。结果实验在全景质量评价数据集OIQA（omnidirectional image quality assessment）上进行，在原始图像中引入4种不同类型的失真，将提出的算法与6种主流算法进行性能对比，比较了基于相位信息的一致性互信息和一致性局部熵，以及评价标准依据4项指标。实验结果表明，相比于现有的6种全景图像质量评估算法，该算法在PLCC（Pearson linear correlation coefficient）和SRCC（Spearman rank order correlation coefficient）指标上比WS-SSIM（weighted-to-spherically-uniform structural similarity）算法高出0.4左右，并且在RMSE（root of mean square error）上低0.9左右，4项指标最优，能够获得更好的拟合效果。结论本文算法解决了观察空间和映射空间不一致的问题，并且融合了基于人眼感知的多尺度互信息相似度和局部熵相似度，获得与人眼感知更为一致的客观分数，评价效果更为准确，更加符合人眼视觉特征。

关键词

全景图像/视频; 质量评价; 人类视觉系统; 相位一致性; 结构相似度(SSIM); 纹理相似度

Phase consistency guided full-reference panoramic image quality assessment algorithm

Xia Yumeng¹, Wang Yongfang^1,2, Wang Chuang¹

1. School of Communication and Information Engineering, Shanghai University, Shanghai 200444, China;

2. Shanghai Institute for Advanced Communication and Data Science, Shanghai University, Shanghai 200444, China

Supported by: National Natural Science Foundation of China (61671283)

Abstract

Objective Panoramic images introduce distortion in the process of acquisition, compression, and transmission. To provide viewers with a real experience, the resolution of a panoramic image is higher than that of the traditional image. The higher the resolution is, the more bandwidth is needed for transmission, and the more space is needed for storage. Therefore, image compression technology is conducive to improving transmission efficiency. At the same time, the compression distortion is introduced. With the increasing demand of viewers for panoramic image/video visual experience, the research on virtual reality visual system becomes increasingly important, and the quality evaluation of panoramic image/video is an indispensable part. The traditional subjective observation process of image is realized through the screen, and the design of objective quality assessment algorithm is based on 2D planes. When assessing the quality of panoramic images, viewers need to freely switch the perspective to observe the whole spherical scene with the help of head-mounted equipment. However, the transmission, storage, and processing are all in the projection format of the panoramic image, which causes the problem of inconsistency between the observation and processing spaces. As a result, the traditional assessment algorithm cannot accurately reflect the viewers' real feelings when observing the sphere, and cannot directly reflect the distortion degree of the spherical scene. To solve the problem of inconsistency between the observation and processing spaces, this study proposes a phase-consistency guided panoramic image quality assessment (PC-PIQA) algorithm. Method The structure and texture information are rich in high-resolution panoramic images, and they are the important features of the human visual system to understand the scene content. The proposed PC-PIQA model can solve the inconsistency between the observation space and processing plane by utilizing the features. Its panoramic statistical similarity is only related to the description parameters rather than the video content. First, the equirectangular projection format is mapped to the cube map projection (CMP) format, and the panoramic weight under the CMP format is used to solve the problem of inconsistent observation space and processing space.Then, the high-order phase-consistent mutual information of a single plane in the CMP format is calculated to describe the similarity of structural information between the reference image and distorted image at different orders.Next, the texture similarity is calculated by using the similarity of the first-order phase congruence local entropy. Finally, the visual quality of a single plane can be obtained by fusing the two parts of quality. According to the human eye's attention to the panoramic content, the different perceptual weights are assigned to six planes to obtain the overall quality score. Result Experiments are conducted on the panoramic evaluation data set called omnidirectional image quality assessment (OIQA). The original images are added by four different types of distortion, including JPEG compression, JPEG2000 compression, Gaussian blur, and Gaussian noise. The proposed algorithm is compared with six kinds of mainstream algorithm performance, including peak signal-to-noise ratio (PSNR), structural similarity (SSIM), craster parabolic projection PSNR (CPP-PSNR), weighted-to-spherically-uniform PSNR (WS-PSNR), spherical PSNR (S-PSNR) and weighted-to-spherically-uniform SSIM (WS-SSIM). The assessment criteria contains four indicators, including Pearson linear correlation coefficient (PLCC), Spearman rank-order correlation coefficient (SRCC), Kendall rank-order correlation coefficient (KRCC), and root of mean square error (RMSE). In addition, we also list the performance obtained separately for structural similarity based on the panoramic weighted-mutual information (PW-MI) and texture similarity based on the panoramic weighted-local entropy (PW-LE), which can prove that each factor plays a significant role in improving the performance. The experimental results show that the PLCC and SRCC indexes of this proposed algorithm are approximately 0.4 higher than that of the other existing models, and the RMSE index is approximately 0.9 lower. All the indexes are the best compared with the other existing six panoramic image-quality assessment algorithms. Meanwhile, the individual performance of PV-MI and PV-LE is also better than that of the reference panoramic algorithms. The algorithm not only solves the problem of inconsistency between the observation and processing spaces, but also has robustness to different distortion types and achieves the best fitting effect. The human visual system has different sensitivities to different scales of images, and experiment results show that the sampling scales with parameters of 2 and 4 perform better. Therefore, the mutual information of each order of phase consistency on the two scales and the local entropy of the first order of phase consistency are finally fused. The high-order phase consistency has a negative effect on the calculation of similarity. The proposed model performs best when using the local entropy with the first-order phase consistency. Conclusion The proposed algorithm solves the problem of inconsistency between the observation and processing space, and combines the multi-scale mutual information similarity and local entropy similarity based on human eye perception to obtain an objective score that is more consistent with the human eye perception. The assessment result is more accurate and consistent with the human visual system.The panoramic quality evaluation model proposed in this paper is classified as a traditional algorithm. With the development of deep learning, the framework implemented by neural networks can also obtain high accuracy. Further experiments are needed to determine if our model can be further integrated into neural network-based panoramic quality assessment.

Key words

panoramic image/video; quality assessment; human visual system; phase consistency; structural similarity(SSIM); texture similarity

0 引言

全景图像/视频能带来全新的视觉体验，但是在采集、传输以及存储过程中难免引入失真。对全景图像/视频进行主观评价能够准确反映其质量，但是需要大量人力物力。因此，对于图像的质量评价，快速而又准确的客观质量评价模型具有重要作用。

一些客观的全景图像质量评价方法将传统的峰值信噪比(peak signal to noise ratio, PSNR)和结构相似性(structural similarity, SSIM)与全景图像的特性相结合。S-PSNR(spherical PSNR)(Yu等，2015)将球上一点s投影到参考图像和失真图像上，分别找到对应点并计算这两点之间的PSNR作为失真全景图像的质量。Zakharchenko等人(2017)使用CPP-PSNR (craster parabolic projection PSNR)将参考图像与失真图像同时投影到CPP(craster’s parabolic projection)面上，再进行对应点PSNR的计算。Sun等人(2017)提出WS-PSNR (weighted-to-spherically-uniform PSNR)，利用球面与投影平面之间的映射关系改进PSNR。

全景图像显著性的研究也为全景图像质量评价提供了新的思路。Upenik和Ebrahimi(2019)，Upenik等人(2016)利用视觉注意力机制，提出了基于关注度的全参考质量评估模型(visual attention based PSNR, VA-PSNR)，将得到的全景显著性图像与传统PSNR进行结合。Yang等人(2017)提出了基于反向传播的全参考质量评估模型(back propagation-based quality assessment of panoramic videos in VR system, BP-QAVR)以衡量全景视频的质量。

Xu等人(2019b)提出基于非内容和基于内容的两种全参考全景视频质量评价模型，前者认为不同位置的像素产生的失真与人眼的关注区有关，后者将对视频内容预测的可能观看方向作为权重来衡量质量损失。Zhou等人(2018)采用SSIM，在考虑亮度、对比度和结构特征的基础上，将像素从球面映射到投影平面时的面积拉伸比作为权重，扩展成全景质量评价模型WS-SSIM (weighted-to-spherically-uniform SSIM)。

通过拼接实现的全景图像重建需要将多个视点图通过拼接算法合成为广角视图，因此这类全景图像的失真主要是几何失真和结构失真。Cheung等人(2017)利用光流来建立像素点之间的对应关系，将几何误差和畸变程度两部分特征进行融合来评估拼接图像的失真。Xu等人(2019a)提出了一种立体全景图像的全参考评价模型。许欣等人(2018)利用小波域的特征设计了一种半参考全景图像质量评价模型。上述几种模型的计算均在全景图像的投影平面上进行，没有考虑到处理平面与观察空间之间的非线性关系。

全景图像主观质量评价时通过辅助设备自由切换视角以观察整个球面场景，但传输、存储与处理过程都是对全景图像的投影格式进行处理，这就造成观察空间与处理空间不一致的问题，从而导致传统评价算法无法直接反映球面场景的失真程度。因此，利用观察空间与处理空间之间的映射关系，更有利于提升全景质量评价算法的准确性。

高分辨率的全景图像中结构和纹理信息非常丰富，且结构和纹理信息是人眼视觉系统理解场景内容的重要特征，因此针对观察空间与处理平面不一致，本文提出一种基于相位一致性的全参考全景图像质量评价模型(phase consistency based panoramic image quality assessment, PC-PIQA)。依据视觉系统对结构和纹理的敏感性，计算参考图像与失真图像四阶相位一致性之间互信息的相似度，来衡量结构信息的相似度; 计算一阶相位一致性局部熵的相似度，来衡量纹理相似度，将两部分融合得到最终的质量分数。

1 基于相位一致性的全参考全景图像质量评价算法

本文提出基于相位一致性的全参考全景图像评价模型，其框图如图 1所示。首先将经纬图投影(equirectangular projection, ERP)格式(艾达等，2018)影射为立方体投影(cube map projection, CMP) 格式(Greene，1986)，利用CMP格式下的全景权重解决观察空间和处理空间不一致的问题。然后，对CMP中单个平面计算高阶相位一致性之间的互信息来描述参考图像与失真图像不同阶之间结构信息传递的相似度。此外，利用一阶相位一致性局部熵的相似度反映纹理的相似度。将两部分质量融合可得单个平面的视觉质量。最后，根据人眼对全景内容的关注度，分配给6个平面不同的感知权重得到整体的质量分数。

图 1 相位一致性的全参考全景图像质量评价框图

Fig. 1 The phase consistency block diagram of full reference panoramic image quality evaluation

1.1 基于投影格式的全景权重

如图 2所示，全景投影格式ERP应用广泛，但其两极点处像素拉伸变形非常严重，将ERP格式转换成CMP格式可以有效减轻畸变，并且CMP的单个平面更接近人眼视觉系统的结构(Dedhia等，2019)。为了解决观察空间与处理平面之间的非线性映射关系，利用CMP格式下像素面积由球面投影为平面时产生的拉伸比作为全景权重。设CMP单个平面上点的坐标为$\left({i, j} \right)$，平面尺寸为$A \times A$，则每个平面的权重(Sun等，2017)为

$ \omega_{\mathrm{cmp}}(i, j)=\left(1+\frac{d^{2}(i, j)}{r^{2}}\right)^{-3 / 2} $

(1)

$ \begin{gathered} d^{2}(i, j)=(i+0.5-A / 2)^{2}+ \\ (j+0.5-A / 2)^{2} \end{gathered} $

(2)

图 2 ERP投影格式及其对应CMP格式

Fig. 2 ERP projection format and its corresponding CMP format ((A)ERP format; (b)CMP format)

式中，$r = A/2$, ${d^2}$表示点$\left({i, j} \right)$与平面中心点的平方距离，${\omega _{{\rm{cmp}}}}$如图 3所示，CMP格式的每个平面的全景权重计算方法相同。对于单个平面，越接近图像中心位置的像素拉伸越小，产生畸变越小，其权重也越高。

图 3 CMP格式全景权重

Fig. 3 CMP format panorama weight

1.2 基于高阶相位一致性互信息的结构相似度

图像的边缘信息对于人眼视觉系统至关重要，人眼容易感受到图像结构信息的变化。传统的边缘检测大多数都是通过Sobel、Roberts、Canny和Laplacian等算子实现的，可以提取图像的结构特征，但是失真的加入则会改变这些结构信息，因此测量结构的失真程度是质量评价的重要方法。但这些梯度函数的计算与人眼对边缘信息的处理过程不同，不符合人类的感知视觉特性。Kovesi(1999)提出一种基于人类视觉特性的相位一致性图像边缘检测算子，表明人类视觉系统感知的图像特征集中在图像中各谐波分量相位最一致的点，且该算法可以不受图像局部光线明暗变化的影响。

将CMP单个平面记为$\mathit{\boldsymbol{I}}\left({x, y} \right)$，则经过投影格式权重加权的CMP单个平面图像为

$ \boldsymbol{I}^{\prime}(x, y)=\boldsymbol{I}(x, y) \cdot \omega_{\mathrm{cmp}} $

(3)

对$\mathit{\boldsymbol{I'}}\left({x, y} \right)$中一点$x$，其相位一致性值可以表示为

$ P(x)=\frac{\sum\limits_{n=1}^{N} W(x)\left\lfloor A_{n}(x) \Delta \phi_{n}(x)-T\right\rfloor}{\sum\limits_{n=1}^{N} A_{n}(x)+\varepsilon} $

(4)

式中，$W\left(x \right)$为滤波器频带加权值，${A_n}\left(x \right)$为第$n$个傅里叶分量的幅值，最大为$N$。$T$为噪声干扰阈值，$\varepsilon $为常量，$\left\lfloor \cdot \right\rfloor $表示括号内的值大于零时等于本身，否则为零，$\Delta {\phi _n}\left(x \right)$为相位偏移函数，即

$ \begin{array}{c} \Delta \phi_{n}(x)=\cos \left(\phi_{n}(x)-\overline{\phi_{n}}(x)\right)- \\ \left|\sin \left(\phi_{n}(x)-\overline{\phi_{n}}(x)\right)\right| \end{array} $

(5)

式中，${\phi _n}\left(x \right)$表示$x$处第$n$个傅里叶分量的相位值，$\overline {{\phi _n}} \left(x \right)$为使$P\left(x \right)$最大时傅里叶各分量局部相角的加权平均。通过相位一致性可以获得图像的一阶梯度，此时提取场景斜率，而更高阶的相位一致性可以提供更加精细的结构特征。图 4(a)给出了原始图像，图 4(b)—(e)依次为其一阶至四阶的相位一致性图。可以看出，一阶相位一致性图几乎没有提取出柱子中砖块的边缘、楼梯瓷砖的纹路，随着阶数的增加，砖块的纹理和台阶的纹路越来越清晰，所以图像的高阶梯度能够提供更多细节信息。

图 4 原图及其四阶相位一致性图

Fig. 4 Original image and its four order phase consistency maps

((a) original image; (b) first-order; (c) second-order; (d) third-order; (e) fourth-order)

如图 5所示，通过原始图像与失真图像的一阶相位一致图，可以看出不同失真类型导致相位一致性产生不同失真。

图 5 原始图像以及不同类型失真图像的一阶相位一致图

Fig. 5 The original image and the first-order phase consistency maps ((a) original image; (b) first-order phase consistency of original image; (c) JPEG distortion; (d) JPEG2000 distortion; (e) Gaussian blur; (f) Gaussian white noise)

计算单个平面的原始图像和失真图像的四阶相位一致性图，每一阶相位一致性图描述不同程度上的图像结构信息，这里使用各阶之间的互信息来表达基于相位一致性的特征。以第一阶和第二阶之间的互信息为例，设第一阶的相位一致为${P^{1{\rm{st}}}}$，第二阶的相位一致为${P^{2{\rm{nd}}}}$，则图像熵分别为

$ H\left(P^{1 \mathrm{st}}\right)=-\sum\limits_{m} P_{P ^{\mathrm{1st}}}(m) \log P_{P^{1 \mathrm{st}}}(m) $

(6)

$ H\left(P^{2 \mathrm{nd}}\right)=-\sum\limits_{n} P_{P ^{\mathrm{2nd}}}(n) \log P_{P^{2 \mathrm{nd}}}(n) $

(7)

式中，${P_{{P^{1{\rm{st}}}}}}\left(m \right)$和${P_{{P^{2{\rm{nd}}}}}}\left(n \right)$为概率分布，进一步, 不同阶相位一致性图之间的联合熵为

$ \begin{gathered} H\left(P^{1 \mathrm{st}}, P^{2 \mathrm{nd}}\right)=-\sum\limits_{m, n} P_{P^{\mathrm{1st}}, {P}^{2 \mathrm{nd}}}(m, n) \cdot \\ \log P_{P^{1 \mathrm{st}}, P ^{2 \mathrm{nd}}}(m, n) \end{gathered} $

(8)

式中，${P_{{P^{{\rm{1st}}}}, {P^{2{\rm{nd}}}}}}\left({m, n} \right)$表示一阶相位一致和二阶相位一致的联合分布，互信息可以表示为

$ \begin{gathered} M^{1 \mathrm{st}, 2 \mathrm{nd}}=H\left(P^{1 \mathrm{st}}\right)+H\left(P^{2 \mathrm{nd}}\right)- \\ H\left(P^{1 \mathrm{st}}, P^{2 \mathrm{nd}}\right) \end{gathered} $

(9)

同理可计算其他阶数之间的互信息，在权衡复杂度和性能的基础上仅用了3个互信息特征。参考图像单个平面的互信息特征可以记为${\mathit{\boldsymbol{M}}_{\rm{R}}} \in \left({\mathit{\boldsymbol{M}}_{\rm{R}}^{1{\rm{st}}, 2{\rm{nd}}}, \mathit{\boldsymbol{M}}_{\rm{R}}^{{\rm{2nd}}, 3{\rm{rd}}}, \mathit{\boldsymbol{M}}_{\rm{R}}^{3{\rm{rd}}, 4{\rm{th}}}} \right)$，对应的失真图像单个平面的互信息特征可以记为${\mathit{\boldsymbol{M}}_{\rm{D}}} \in \left({\mathit{\boldsymbol{M}}_{\rm{D}}^{1{\rm{st}}, 2{\rm{nd}}}, \mathit{\boldsymbol{M}}_{\rm{D}}^{{\rm{2nd}}, 3{\rm{rd}}}, \mathit{\boldsymbol{M}}_{\rm{D}}^{3{\rm{rd}}, 4{\rm{th}}}} \right)$，则单个平面基于高阶相位一致性的互信息的质量可以定义为

$ Q_{\mathrm{MI}}=\frac{\sum\limits_{\boldsymbol{R} \in \boldsymbol{G}_{r}} \sum\limits_{\boldsymbol{D} \in \boldsymbol{G}_{d}}\left(\boldsymbol{M}_{\mathrm{R}}-\overline{\boldsymbol{M}_{\mathrm{R}}}\right)\left(\boldsymbol{M}_{\mathrm{D}}-\overline{\boldsymbol{M}_{\mathrm{D}}}\right)}{\sqrt{\left(\sum\limits_{\boldsymbol{R} \in \boldsymbol{G}_{r}}\left(\boldsymbol{M}_{\mathrm{R}}-\overline{\boldsymbol{M}_{\mathrm{R}}}\right)^{2}\right)\left(\sum\limits_{\boldsymbol{D} \in \boldsymbol{G}_{d}}\left(\boldsymbol{M}_{\mathrm{D}}-\overline{\boldsymbol{M}_{\mathrm{D}}}\right)^{2}\right)}} $

(10)

$ \overline{\boldsymbol{M}_{\mathrm{R}}}={mean}\left(\boldsymbol{M}_{\mathrm{R}}^{1 \mathrm{st}, 2 \mathrm{nd}}, \boldsymbol{M}_{\mathrm{R}}^{2 \mathrm{nd}, 3 \mathrm{rd}}, \boldsymbol{M}_{\mathrm{R}}^{3 \mathrm{rd}, 4 \mathrm{th}}\right) $

(11)

$ \overline{\boldsymbol{M}_{\mathrm{D}}}={mean}\left(\boldsymbol{M}_{\mathrm{D}}^{1 \mathrm{st}, 2 \mathrm{nd}}, \boldsymbol{M}_{\mathrm{D}}^{2 \mathrm{nd}, 3 \mathrm{rd}}, \boldsymbol{M}_{\mathrm{D}}^{3 \mathrm{rd}, 4 \mathrm{th}}\right) $

(12)

式中，$\mathit{\boldsymbol{R}}$和$\mathit{\boldsymbol{D}}$分别表示参考图像和失真图像，${\mathit{\boldsymbol{G}}_r}$和${\mathit{\boldsymbol{G}}_d}$分别表示参考图像集合和失真图像集合，$\overline {{\mathit{\boldsymbol{M}}_{\rm{R}}}} $、$\overline {{\mathit{\boldsymbol{M}}_{\rm{D}}}} $分别表示${\mathit{\boldsymbol{M}}_{\rm{R}}}$和${\mathit{\boldsymbol{M}}_{\rm{D}}}$的平均值。

应用式(10)计算出CMP格式下的6个平面的质量$Q_{{\rm{MI}}}^i$。相关全景质量主观实验与显著性研究(Xu等，2019b；Zhu等，2018；Lebreton和Raake，2018)表明，在观察全景内容时赤道附近的内容比其他区域更能吸引人眼关注。主观实验表明人眼更关注全景赤道附近的中心，根据这一特性将6个平面质量进行融合，得到完整CMP的相位一致性互信息相似度的质量

$ \begin{gathered} Q_{\mathrm{M}}=\omega_{1} \cdot Q_{\mathrm{MI}}^{1}+\omega_{2} \cdot Q_{\mathrm{MI}}^{2}+\omega_{3} \cdot Q_{\mathrm{MI}}^{3}+ \\ \omega_{4} \cdot Q_{\mathrm{MI}}^{4}+\omega_{5} \cdot Q_{\mathrm{MI}}^{5}+\omega_{6} \cdot Q_{\mathrm{MI}}^{6} \end{gathered} $

(13)

式中，$Q_{{\rm{MI}}}^i\left({i \in \left[ {1, 6} \right]} \right)$表示6个平面的质量，${\omega _i}\left({i \in \left[ {1, 6} \right]} \right)$表示6个平面对应的权重。依据赤道偏置的特性，CMP的6个平面中前、后、左、右4个平面具有更大的权重，经过大量实验，式中${\omega _i}$={0.1, 0.1, 0.2, 0.2, 0.2, 0.2}，上、下两个平面为0.1，前、后、左、右4个平面为0.2。

在不同尺度上观察图像时，人眼视觉系统会捕捉不同的内容，尺度变小时会更加关注图像的整体概貌，反之就会更关注图像中的细节(Li等，2016)。为了获取不同尺度上的特征来模拟人眼对于尺度的感知特性，以参数2和4对图像进行下采样，两次下采样获取的不同尺度的相位一致性互信息相似度质量记为$Q_{\rm{M}}^{1/2}$和$Q_{\rm{M}}^{1/4}$。

1.3 基于相位一致性的局部熵的纹理相似度

图像信息熵(Ren等，2017)在图像恢复、边缘检测、目标检测和图像匹配等领域应用广泛。全局熵的大小反映了整幅图像包含的信息量，局部熵则反映了图像灰度的离散程度、图像的纹理分布情况。失真的引入会破坏图像中的纹理信息，因此使用局部熵的变化来衡量失真程度。

构建一阶相位一致性图的局部熵，假设相位一致性图中像素点为$\left({x, y} \right)$，其邻域空间$\mathit{\boldsymbol{ \boldsymbol{\varTheta} }}$, 大小为$n \times n$，则该像素点处的局部熵算子定义为

$ \begin{aligned} E(x, y)=-& \sum\limits_{i=x-(n-1) / 2}^{x+(n-1) / 2} \sum\limits_{j=y-(n-1) / 2}^{y+(n-1) / 2} p(\varTheta(i, j)) \times \\ & \log p(\varTheta(i, j)) \end{aligned} $

(14)

$ p(\varTheta(i, j))=\frac{\varTheta(i, j)}{\sum\limits_{i=x-(n-1) / 2}^{x+(n-1) / 2} \sum\limits_{i=y-(n-1) / 2}^{y+(n-1) / 2} \varTheta(i, j)} $

(15)

将局部熵算子在相位一致性图上遍历，获得一阶相位一致的局部熵图$\mathit{\boldsymbol{E}}$。图 6展示了不同失真类型对局部熵图的影响，可以看出不同的失真类型导致一阶相位一致性的局部熵产生变化。提取单个平面的参考图像和失真图像的一阶相位一致的局部熵，分别记为${E_{\rm{R}}}$和${E_{\rm{D}}}$，则单个平面的基于一阶相位一致性局部熵的质量为

$ Q_{\mathrm{LE}}=\frac{\sum\limits_{R \in \boldsymbol{G}_{r}} \sum\limits_{D \in \boldsymbol{G}_{d}}\left(E_{\mathrm{R}}-\overline{E_{\mathrm{R}}}\right)\left(E_{\mathrm{D}}-\overline{E_{\mathrm{D}}}\right)}{\sqrt{\left(\sum\limits_{R \in \boldsymbol{G}_{r}}\left(E_{\mathrm{R}}-\overline{E_{\mathrm{R}}}\right)^{2}\right)\left(\sum\limits_{D \in \boldsymbol{G}_{d}}\left(E_{\mathrm{D}}-E_{\mathrm{D}}\right)^{2}\right)}} $

(16)

图 6 原始图像以及失真图像的一阶相位一致局部熵

Fig. 6 The original image and the first-order phase consistency local entropy maps ((a) original image; (b) first-order phase consistency of original image; (c) JPEG distortion; (d) JPEG2000 distortion; (e) Gaussian blur; (f) Gaussian white noise)

式中，$\overline {{E_{\rm{R}}}} $、$\overline {{E_{\rm{D}}}} $分别为${E_{\rm{R}}}$和${E_{\rm{R}}}$均值。

应用式(16)计算出CMP格式下每个平面的质量$Q_{{\rm{LE}}}^i$，根据主观实验表明，人眼更关注全景赤道附近的中心，将6个平面质量进行融合，则基于相位一致性局部熵相似度的质量为

$ \begin{gathered} Q_{\mathrm{E}}=\omega_{1} \cdot Q_{\mathrm{LE}}^{1}+\omega_{2} \cdot Q_{\mathrm{LE}}^{2}+\omega_{3} \cdot Q_{\mathrm{LE}}^{3}+ \\ \omega_{4} \cdot Q_{\mathrm{LE}}^{4}+\omega_{5} \cdot Q_{\mathrm{LE}}^{5}+\omega_{6} \cdot Q_{\mathrm{LE}}^{6} \end{gathered} $

(17)

式中，$Q_{{\rm{LE}}}^i\left({i \in \left[ {1, 6} \right]} \right)$表示6个平面单独基于一阶相位一致性局部熵的质量，${\omega _i}\left({i \in \left[ {1, 6} \right]} \right)$表示6个平面对应的权重。与式(13)一致，依据赤道偏置的特性，CMP的6个平面中前、后、左、右4个平面具有更大的权重，经过大量实验，式中${\omega _i}$∈{0.1, 0.1, 0.2, 0.2, 0.2, 0.2}，上、下两个平面为0.1，前、后、左、右4个平面都为0.2。依据人眼视觉系统对不同尺度的感知不同，以参数为2和4进行两次下采样，获取不同尺度的一阶相位一致性局部熵相似度质量$Q_{\rm{E}}^{1/2}$和$Q_{\rm{E}}^{1/4}$。

最后，将多尺度的基于高阶相位一致性互信息的质量与基于一阶相位一致性的局部熵的质量进行融合，即联合式(13)和(17)获得最终图像质量

$ \begin{gathered} Q_{\text {all }}=\omega_{7} \cdot\left(\omega_{9} \cdot Q_{\mathrm{M}}^{1 / 2}+\omega_{9} \cdot Q_{\mathrm{E}}^{1 / 2}\right)+ \\ \omega_{8} \cdot\left(\omega_{9} \cdot Q_{\mathrm{M}}^{1 / 4}+\omega_{9} \cdot Q_{\mathrm{E}}^{1 / 4}\right) \end{gathered} $

(18)

式中，参数${\omega _7}$、${\omega _8}$分别设置为0.2、0.8，${\omega _9}$为0.5，通过大量实验结果得到。

2 实验结果与分析

实验在全景图像质量评价(omnidirectional image quality assessment, OIQA)数据集(Cheung等，2017)上进行，共含336幅图像，包括16幅原始图像和320幅失真图像。16幅原始参考图像的分辨率在11 332×5 666像素与13 320×6 660像素之间，场景多样，如图 7所示。在原始全景图像中引入JPEG压缩、JPEG2000压缩、高斯模糊(Gaussian blur, GB)和高斯白噪声(Gaussian white noise, GN)4种不同类型的失真，每一种失真存在4种不同程度的失真情况，生成320幅失真图像。

图 7 OIQA源图像

Fig. 7 OIQA source images

表 1列出本文提出的PC-PIQA算法与主流算法的性能对比，包括两个传统算法PSNR(Horé和Ziou，2010)、SSIM(Wang等，2004)和4种主流全景算法CPP-PSNR(Zakharchenko等，2017)、WS-PSNR(Sun等，2017)、S-PSNR(Yu等，2015)、WS-SSIM(Zhou等，2018)，最佳性能用粗体表示。除此之外，还展示了基于高阶相位一致性互信息(panoramic weighted-mutual information, PW-MI)的结构相似度与基于一阶相位一致性局部熵(panoramic weighted-local entropy, PW-LE)的纹理相似度单独获得的性能。

表 1 整体性能对比
Table 1 The comparison of overall performance

下载CSV

评价模型	PLCC	SRCC	KRCC	RMSE
PSNR(Horé和Ziou，2010)	0.508 0	0.497 9	0.338 2	1.821 1
SSIM(Wang等，2004)	0.249 2	0.348 3	0.437 3	1.901 4
CPP-PSNR(Zakharchenko等，2017)	0.350 2	0.518 2	0.518 6	1.807 8
WS-PSNR(Sun等，2017)	0.504 4	0.503 2	0.341 4	1.825 6
S-PSNR(Yu等，2015)	0.531 9	0.530 3	0.358 8	1.790 4
WS-SSIM(Zhou等，2018)	0.459 1	0.431 1	0.295 5	1.878 3
PW-MI	0.627 6	0.622 1	0.439 1	1.646 0
PW-LE	0.890 3	0.885 6	0.693 1	0.962 8
PC-PIQA	0.892 2	0.889 3	0.697 1	0.954 7
注：加粗字体为每列最优结果。

采用4个常用的客观质量评价统计学指标：皮尔森线性相关系数(Pearson linear correlation coefficient, PLCC)、斯皮尔曼秩序相关系数(Spearman rank order correlation coefficient, SRCC)、肯德尔秩序相关系数(Kendall rank order correlation coefficient, KRCC)和均方根误差(root of mean square error, RMSE)，计算式分别为

$ M_{\mathrm{PLCC}}=\frac{\sum\limits_{i=1}^{N}\left(s_{i}-\bar{s}\right)\left(p_{i}-\bar{p}\right)}{\sqrt{\sum\limits_{i=1}^{N}\left(s_{i}-\bar{s}\right)^{2}} \sqrt{\sum\limits_{i=1}^{N}\left(p_{i}-\bar{p}\right)^{2}}} $

(19)

式中，$N$为元素总数，${s_i}$和${p_i}$分别表示主观分数和预测分数。$\bar s$、$\bar p$为${s_i}$和${p_i}$的均值。PLCC主要衡量客观质量评价模型的准确性，${M_{{\rm{PLCC}}}}$的范围是[-1, 1]，绝对值越接近于1，表示预测与主观之间的相关性越好。

$ M_{\mathrm{SRCC}}=1-\frac{6 \sum\limits_{i=1}^{N} d_{i}^{2}}{N\left(N^{2}-1\right)} $

(20)

式中，$N$为元素总数，${d_i}$表示主观分数和预测分数的差分。SRCC是一个描述两个变量相关性的统计相关系数，${M_{{\rm{SRCC}}}}$的取值范围是[-1, 1]，绝对值越高，表示模型的准确性越好。

$ M_{\mathrm{KRCC}}=\frac{N_{c}-N_{d}}{0.5 N(N-1)} $

(21)

式中，$N$为元素总数，${N_c}$和${N_d}$分别为两组变量中排序一致的变量个数和排序不一致的变量个数。KRCC衡量的是客观预测分数与主观分数两个变量之间等级的相关性, ${M_{{\rm{KRCC}}}}$的取值范围是[-1, 1]，绝对值越接近于1，表示客观预测与主观之间的等级相关性越高，模型性能越好。

$ M_{\mathrm{RMSE}}=\sqrt{\frac{1}{N} \sum\limits_{i=1}^{N}\left(X_{i}-Y_{i}\right)^{2}} $

(22)

式中，$N$为元素总数，${X_i}$和${Y_i}$分别为主观分数和预测分数。RMSE描述的是两组变量之间的偏差程度，${M_{{\rm{RMSE}}}}$值越小，表明两变量之间偏差越小，模型的性能越好；反之，表示预测分数与主观评分偏差越大，模型越差。

从表 1可以得知：由于解决了观察空间和映射空间不一致的问题，并且融合了基于人眼感知的多尺度互信息相似度和局部熵相似度，提出的基于相位一致性的全景算法在4个指标上都达到了最佳；同时PW-MI和PW-LE的单独性能也高于这几种全景算法。此外，与基于结构信息的SSIM和WS-SSIM相比，提出的模型性能更优。图 8进一步分别给出平均意见分数(mean opinion score, MOS)与7个模型预测分数的拟合散点图。可以看出, 图 8(g)代表的PC-PIQA相较于其他6种模型拟合的更好。

图 8 MOS与预测分数的拟合曲线

Fig. 8 The fitting curve of MOS and predicted fraction

((a)PSNR; (b)SSIM; (c)CPP-PSNR; (d)WS-PSNR; (e)S-PSNR; (f)WS-SSIM; (g)PC-PIQA)

为了验证所提出的全参考全景图像质量评价算法PC-PIQA对不同失真类型具有鲁棒性，表 2中列出了4种全景算法和PC-PIQA在4种不同失真类型图像上4种指标的比较。

表 2 不同失真类型的性能对比
Table 2 Performance comparison of different distortion types

下载CSV

指标	失真类型	CPP-PSNR	WS-PSNR	S-PSNR	WS-SSIM	PC-PIQA
PLCC	JPEG	0.568 3	0.736 8	0.752 6	0.790 3	0.886 8
	JPEG2000	0.704 0	0.717 7	0.726 4	0.740 1	0.900 4
	GN	0.950 3	0.948 3	0.954 0	0.938 3	0.901 0
	GB	0.498 0	0.496 1	0.521 2	0.423 2	0.921 2

SRCC	JPEG	0.223 0	0.705 0	0.719 9	0.787 5	0.874 5
	JPEG2000	0.749 6	0.763 6	0.763 3	0.736 9	0.896 9
	GN	0.922 8	0.930 1	0.918 2	0.922 3	0.891 8
	GB	0.525 1	0.523 2	0.543 2	0.409 8	0.907 4

KRCC	JPEG	0.456 0	0.509 5	0.524 1	0.595 7	0.681 9
	JPEG2000	0.557 3	0.572 5	0.574 4	0.549 7	0.716 9
	GN	0.740 8	0.763 0	0.733 2	0.746 5	0.707 9
	GB	0.362 6	0.363 8	0.377 8	0.280 1	0.733 7

RMSE	JPEG	1.890 1	1.553 2	1.512 7	1.407 5	1.061 7
	JPEG2000	1.569 5	1.539 0	1.518 8	1.486 2	0.961 5
	GN	0.586 0	0.597 2	0.563 8	0.650 7	0.816 4
	GB	1.693 1	1.695 1	1.666 2	1.769 1	0.759 6
注：加粗字体为每列最优结果。

从表 2中可以看出，在高斯噪声这一类失真图像上，PC-PIQA所获得的性能比WS-PSNR和S-PSNR略低，但这4种全景算法在其他3种失真类型上的性能远差于提出的PC-PIQA。因此，综合4种失真类型的实验结果，提出的PC-PIQA获得的综合性能更好。

人眼视觉系统对不同尺度的图像具有不同的敏感度。如果尺度过大，容易忽略人眼对全局的感知，如果尺度过小，容易忽视局部细节。因此本文进行了下采样的实验，不同尺度下获得的性能对比如表 3所示。通过实验发现, 以参数为2和4采样的尺度获得的性能较佳，因此最终将两个尺度上的各阶相位一致性的互信息和一阶相位一致性的局部熵进行了融合。

表 3 不同尺度的性能对比
Table 3 Performance comparison of different scales

下载CSV

尺度	PLCC	SRCC	KRCC	RMSE
原尺度	0.835 0	0.830 9	0.628 8	1.163 4
1/2	0.871 9	0.868 0	0.670 2	1.035 4
1/4	0.882 1	0.879 8	0.685 2	0.996 0
1/8	0.805 2	0.803 3	0.597 0	1.253 7
1/16	0.565 2	0.567 2	0.393 6	1.744 2
注：加粗字体为每列最优结果。

最后，表 4展示了从一阶到五阶的基于局部熵相似度的质量的实验结果，可以看出，获得的性能随着阶数的不断增加在逐渐下降。因此可以认为，虽然高阶的相位一致性能够获得更加清晰的结构特征和纹理信息，但随着阶数的升高，被失真破坏的结构特征和纹理信息会对相似度的计算产生不利的影响，且局部熵会进一步放大这种不利影响，所以最终使用了一阶相位一致性的局部熵。

表 4 各阶相位一致性局部熵的影响
Table 4 Influence of phase consistency local entropy of each order

下载CSV

阶数	PLCC	SRCC	KRCC	RMSE
一阶	0.834 8	0.830 6	0.629 2	1.163 9
二阶	0.824 6	0.821 2	0.619 9	1.196 1
三阶	0.820 6	0.817 2	0.615 3	1.208 3
四阶	0.818 4	0.818 5	0.613 6	1.125 0
五阶	0.812 0	0.811 0	0.609 7	1.234 0
注：加粗字体为每列最优结果。

3 结论

本文提出一种基于相位一致性的全参考全景图像质量评价算法。首先采用基于人类视觉特性的相位一致性算子提取参考图像和失真图像结构特征相似度，然后利用一阶相位一致性局部熵的相似度反映参考图像和失真图像纹理的相似度，将两部分质量融合可得全景图像的客观质量分数。

在OIQA全景图像数据集上的实验结果表明，本文算法在4项评价指标都达到最佳结果，优于对比的全参考图像质量评估算法，与主观感受具有较高的一致性。该方法不但解决了观察空间和处理空间不一致的问题，而且对不同失真类型具有很好的鲁棒性，能够获得更好的拟合效果。

全景图像和视频的质量评价在虚拟现实(virtual reality, VR)技术及其应用的发展和普及中有着非常关键的作用，近年来成为多媒体技术领域研究的热点。随着深度学习的发展，深度学习网络所实现的框架同样也能获得较高的准确性。本文提出的全景质量评价模型属于传统算法，并未与深度学习方法进行比较。此外，该模型是否可以进一步融合到基于神经网络的全景质量评价中，还需要进一步论证和实验。

参考文献

Ai D, Dong J J, Lin N, Liu Y. 2018. Advance of 360-degree video coding for virtual reality: a survey. Application Research of Computers, 35(6): 1606-1612 (艾达, 董久军, 林楠, 刘颖. 2018. 用于虚拟现实的360度视频编码技术新. 计算机应用研究, 35(6): 1606-1612) [DOI:10.3969/j.issn.1001-3695.2018.06.002]

Cheung G, Yang L Y, Tan Z G and Huang Z. 2017. A content-aware metric for stitched panoramic image quality assessment//Proceedings of 2017 IEEE International Conference on Computer Vision Workshops (ICCVW). Venice: IEEE: 2487-2494[DOI: 10.1109/ICCVW.2017.293]

Dedhia B, Chiang J C and Char Y F. 2019. Saliency prediction for omnidirectional images considering optimization on sphere domain//Proceedings of 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Brighton, United Kingdom: IEEE: 2142-2146[DOI: 10.1109/ICASSP.2019.8683125]

Greene N. 1986. Environment mapping and other applications of world projections. IEEE Computer Graphics and Applications, 6(11): 21-29 [DOI:10.1109/MCG.1986.276658]

Horé A and Ziou D. 2010. Image quality metrics: PSNR vs. SSIM//Proceedings of the 20th IEEE International Conference on Pattern Recognition (ICPR). Istanbul: IEEE: 2366-2369[DOI: 10.1109/ICPR.2010.579]

Kovesi P. 1999. Image features from phase congruency. Videre: Journal of Computer Vision Research, 1(3): 1-26

Lebreton P, Raake A. 2018. GBVS360, BMS360, ProSal: extending existing saliency prediction models from 2D to omnidirectional images. Signal Processing: Image Communication, 69: 69-78 [DOI:10.1016/j.image.2018.03.006]

Li Q H, Lin W S, Fang Y M. 2016. No-reference quality assessment for multiply-distorted images in gradient domain. IEEE Signal Processing Letters, 23(4): 541-545 [DOI:10.1109/LSP.2016.2537321]

Ren Y F, Sun L, Wu G W and Huang W Z. 2017. DIBR-synthesized image quality assessment based on local entropy analysis//Proceedings of 2017 IEEE International Conference on the Frontiers and Advances in Data Science (FADS). Xi'an, China: IEEE: 2017: 86-90[DOI: 10.1109/FADS.2017.8253200]

Sun Y L, Lu A, Yu L. 2017. Weighted-to-spherically-uniform quality evaluation for omnidirectional video. IEEE Signal Processing Letters, 24(9): 1408-1412 [DOI:10.1109/LSP.2017.2720693]

Upenik E and Ebrahimi T. 2019. Saliency driven perceptual quality metric for omnidirectional visual content//Proceedings of 2019 IEEE International Conference on Image Processing (ICIP). Taipei, China: IEEE: 4335-4339[DOI: 10.1109/ICIP.2019.8803637]

Upenik E, Řeřábek M and Ebrahimi T. 2016. Testbed for subjective evaluation of omnidirectional visual content//Proceedings of 2016 IEEE Picture Coding Symposium (PCS). Nuremberg, Germany: IEEE: 1-5[DOI: 10.1109/PCS.2016.7906378]

Wang Z, Bovik A C, Sheikh H R, Simoncelli E P. 2004. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4): 600-612 [DOI:10.1109/TIP.2003.819861]

Xu J H, Luo Z Y, Zhou W, Zhang W Y and Chen Z B. 2019a. Quality assessment of stereoscopic 360-degree images from multi-viewports//Proceedings of 2019 IEEE Picture Coding Symposium (PCS). Ningbo, China: IEEE: 1-5[DOI: 10.1109/PCS48520.2019.8954555]

Xu M, Li C, Chen Z Z, Wang Z L, Guan Z Y. 2019b. Assessing visual quality of omnidirectional videos. IEEE Transactions on Circuits and Systems for Video Technology, 29(12): 3516-3530 [DOI:10.1109/TCSVT.2018.2886277]

Xu X, Zhang H Q, Xia Z F. 2018. Quality assessment of 360-degree spherical images based on feature extraction in the wavelet domain. Video Engineering, 42(4): 36-40 (许欣, 张会清, 夏志方. 2018. 基于小波域特征提取的360度全景图像质量评价. 电视技术, 42(4): 36-40) [DOI:10.16280/j.videoe.2018.04.007]

Yang S, Zhao J Z, Jiang T T, Wang J, Rahim T, Zhang B, Xu Z J and Fei Z S. 2017. An objective assessment method based on multi-level factors for panoramic videos//Proceedings of 2017 IEEE Visual Communications and Image Processing (VCIP). St. Petersburg, USA: IEEE: 1-4[DOI: 10.1109/VCIP.2017.8305133]

Yu M, Lakshman H and Girod B. 2015. A framework to evaluate omnidirectional video coding schemes//Proceedings of 2017 IEEE International Symposium on Mixed and Augmented Reality (ISMAR). Fukuoka, Japan: IEEE: 31-36[DOI: 10.1109/ISMAR.2015.12]

Zakharchenko V, Choi K P, Alshina E and Park J H. 2017. Omnidirectional video quality metrics and evaluation process//Proceedings of 2017 IEEE Data Compression Conference (DCC). Snowbird, USA: IEEE: #472[DOI: 10.1109/DCC.2017.90]

Zhou Y F, Yu M, Ma H L, Shao H and Jiang G Y. 2018. Weighted-to-spherically-uniform SSIM objective quality evaluation for panoramic video//Proceedings of 2018 IEEE International Conference on Signal Processing (ICSP). Beijing, China: IEEE: 54-57[DOI: 10.1109/ICSP.2018.8652269]

Zhu Y C, Zhai G T, Min X K. 2018. The prediction of head and eye movement for 360 degree images. Signal Processing: Image Communication, 69: 15-25 [DOI:10.1016/j.image.2018.05.010]