Current Issue Cover

鄢杰斌, 谭湽文, 吴康诚, 刘学林, 方玉明(江西财经大学)

摘 要
目的 全景图像质量评价(omnidirectional image quality assessment,OIQA)旨在定量描述全景图像降质情况,对于算法提升和系统优化起着重要的作用。早期的OIQA方法设计思想主要是结合全景图像的几何特性(如两级畸变和语义分布不均匀)和2D-IQA方法,这类方法并未考虑用户的观看行为,因而性能一般;现有的OIQA方法主要通过模拟用户的观看行为,提取观看视口序列;进一步,计算视口序列失真情况,然后融合视口失真得到全景图像的全局质量。然而,观看视口序列预测较为困难,且预测模型的实时性和鲁棒性难以保证。为了解决上述问题,本文提出一种非视口依赖的抗畸变无参考(no reference,NR)OIQA(NR-OIQA)模型。针对全景图像等距柱状投影(equirectangular projection,ERP)所带来的规律性几何畸变问题,本文提出一种可同时处理不规则语义和规律性畸变的新型卷积方法,称为等矩形可变形卷积方法,并基于该卷积方法构建NR-OIQA模型。方法 该模型主要由先验指导的图像块采样(prior-guided patch sampling,PPS)模块、自适应畸变特征提取(adaptive deformation feature extraction,ADFE)模块和块内-块间注意力聚集(intra-inter patch attention aggregation,A-EPAA)模块三个部件组成。其中,PPS模块根据先验概率分布从高分辨率的全景图像采样提取相同分辨率的图像块;ADFE模块通过等矩形可变形卷积渐进式地提取输入图像块质量相关特征;A-EPAA模块旨在调整单个图像块内部特征以及各图像块对整体质量评价的影响程度,以提升模型对全景图像质量的评价准确度。结果 在3个公开数据集上将本文模型与其他IQA模型进行性能比较,和性能第1的VGCN相比,参数量减少了78.9%,计算量减少了94.5%;和性能第3的MC360IQA相比,在CVIQ、OIQA和JUFE数据集上的斯皮尔曼相关系数分别提升了1.9%、1.7%和4.3%,参数量和计算量分别减少了75%和65%。结论 本文所提出的NR-OIQA模型,充分考虑了全景图像的特点,能够以不依赖视口的方式高效提取具有失真特性的质量特征,对全景图像进行准确质量评价,并具有计算量低的优点。
Viewport-independent and deformation-unaware no-reference omnidirectional image quality assessment

Yan Jiebin, Tan Ziwen, Wu Kangcheng, Liu Xuelin, Fang Yuming(Jiangxi University of Finance and Economics)

Objective With the rapid development of the virtual reality (VR) industry, the omnidirectional image acts as an important medium of visual representation of VR and may degrade in the procedure of acquisition, transmission, processing, storage, etc. Omnidirectional image quality assessment (OIQA) is an evaluation technique that aims to quantitatively describe the degradation of omnidirectional images and plays a crucial role in algorithm improvement and system optimization. Generally, the omnidirectional image has some inherent characteristics, i.e., geometric deformation in the polar region and semantic information more concentrated on the equatorial region. Besides, the viewing behavior can conspicuously affect the perceptual quality of an omnidirectional image. Early OIQA methods that simply fuse this inherent characteristic in 2D-IQA do not consider the significant user viewing behavior, thus obtaining suboptimal performance. Considering the viewport representation that is in line with the user viewing behavior, recently some deep-learning-based OIQA methods achieved promising performance by taking the predicted viewport sequence as the model input and computing the degradation. However, the prediction of the viewport sequence is of great difficulty and viewport extraction needs a series of pixel-wise computations, which leads to a significant computation load and hampers the application in the industry environment. To address the above problems, we proposed a new no-reference OIQA model, which introduces an equirectangular modulated deformable convolution (EquiMdconv) that can deal with the irregular semantics and the regular deformation caused by equirectangular projection (ERP) simultaneously, without the predicted viewport sequence. Method We propose a viewport-independent and deformation-unaware no-reference OIQA model for omnidirectional image quality assessment. Our model is composed of three parts, including a prior-guided patch sampling (PPS) module, an adaptive deformable feature extraction (ADFE) module, and an intra-inter patch attention aggregation (A-EPAA) module. The PPS module samples a set of patch images based on prior probability distribution in a slice-based manner to represent the complete image quality information. ADFE aims to extract the perceptual quality features of the input patch images, considering the irregular semantics and regular deformation in this process. It contains eight blocks, and each block is comprised of an EquiMconv layer, a 1 1 convolutional layer, a batch normalization layer, and a 3 3 max pooling layer. The EquiMconv layer employs a modulated deformable convolution layer that introduces learnable offset parameters to model distortions in the images more accurately. Furthermore, we incorporate fixed offsets based on distortion regularity factors into the deformable convolution"s offset to effectively eliminate the regular deformation. The A-EPAA comprises a convolutional block attention module (CBAM) and a patch attention module (PA). The CBAM assigns weights to each channel to adjust perceptual quality features in both channel and spatial dimensions. The PA is to adjust the contribution weights between patch images for an overall quality assessment. We train the proposed model on the CVIQ, OIQA, and JUFE databases. In the training stage, we split each database into two parts: 80% for training and 20% for testing. We sample 10 patch images from each omnidirectional image and the size of patch image is set to 224 224. All experiments are implemented on a sever with an NVIDIA GTX A5000 GPU. Adaptive moment estimation optimizer (Adam) is utilized to optimize our model. We train the model for 300 epochs on the CVIQ and OIQA databases and 20 epochs on the JUFE database, the learning rate is 0.0001 and the batch size is 16. Result We conduct experiments covering three databases, including CVIQ, OIQA, and JUFE. We demonstrate the performance of the proposed model by comparing it to nine viewport-independent models and three viewport-dependent models. To ensure a persuasive comparison result, we select Pearson linear correlation coefficient (PLCC) and Spearman rank correlation coefficient (SRCC) as performance evaluation standards. The results indicate that compared to the SOTA (state-of-the-art) viewport-dependent model, i.e., VGCN, the parameters of our model are reduced by 78.9% and the FLOPs (floating point operations) are reduced by 94.5%. Compared with the MC360IQA, the SRCC is increased by 1.9%, 1.7%, and 4.3% on the CVIQ, OIQA and JUFE databases, respectively, and the parameters and FLOPs are reduced by 75% and 65%, respectively. Conclusion Our proposed viewport-independent and deformation-unaware no-reference OIQA model thoroughly considers the characteristics of the omnidirectional image, it can effectively extract the quality features and accurately assess the quality of the omnidirectional images with limited computational cost.