自适应语义感知网络的盲图像质量评价

陈健; 万佳泽; 林丽; 李佐勇

发布时间： 2023-11-17
摘要点击次数： 917
全文下载次数： 514
DOI: 10.11834/jig.220939
2023 | Volume 28 | Number 11

自适应语义感知网络的盲图像质量评价

陈健^1,2, 万佳泽¹, 林丽¹, 李佐勇²(1.福建理工大学电子电气与物理学院, 福州 350118;2.福建省信息处理与智能控制重点实验室(闽江学院), 福州 350121)

摘要

目的盲图像质量评价（blind image quality assessment，BIQA）在图像质量控制领域具有重要的实际意义。虽然目前针对自然失真图像的盲图像质量评价取得了合理的结果，但评价准确性仍有待进一步提升。方法提出一种自适应语义感知网络（self-adaptive semantic awareness network，SSA-Net）的盲图像质量评价方法，通过理解失真图像的内容和感知图像失真的类型来提高预测的准确性。首先，利用深度卷积神经网络（deep convolutional neuralnetwork，DCNN）获取各个阶段的语义特征，并提出多头位置注意力（multi-head position attention，MPA）模块通过聚合特征图的长距离语义信息来加强对图像内容的理解。接着，提出基于多尺度内核的自适应特征感知（self-adaptivefeature awareness，SFA）模块感知图像的失真类型，并结合图像内容来捕获图像的全局失真和局部失真情况。最后，提出多级监督回归（multi-level supervision regression，MSR）网络通过利用低层次的语义特征辅助高层次的语义特征得到预测分数。结果本文方法在7个数据库上与11种不同方法进行了比较，在LIVEC（LIVE in the Wild ImageQuality Challenge）、BID（blurred image database）、KonIQ-10k（Konstanz authentic image quality 10k database）和SPAQ（smartphone photography attribute and quality）4个自然失真图像数据库中的斯皮尔曼等级相关系数（Spearman rankorder correlation coefficient，SRCC）值分别为0.867、0.877、0.913和0.915，获得了所有方法中最好的性能结果。同时在两个人工失真图像数据库中获得了排名前2的SRCC值。实验结果表明，与其他先进方法相比，本文方法在自然失真图像质量评价数据库上的表现更为优异。结论本文方法通过结合图像内容理解与不同失真类型感知，能更好地适应自然图像的失真，提高评价准确性。

关键词

图像质量评价（IQA）盲图像质量评价（BIQA）深度学习自适应语义感知网络（SSA-Net）多级监督回归（MSR）

Self-adaptive semantic awareness network for blind image quality assessment

Chen Jian^1,2, Wan Jiaze¹, Lin Li¹, Li Zuoyong²(1.School of Electronic, Electrical Engineering and Physics, Fujian University of Technology, Fuzhou 350118, China;2.Fujian Provincial Key Laboratory of Information Processing and Intelligent Control(Minjiang University), Fuzhou 350121, China)

Abstract

Objective The rapid development of imaging technology has been accompanied by continuous updates in acquisition equipment and related technologies over the past few decades. However，the quality of images is susceptible to interferences from various stages，including acquisition，processing，transmission，and storage，which eventually introduce different types（e. g. ，JPEG2000 compression，JPEG compression，white Gaussian noise，Gaussian blur，fast fading distortion，and contrast distortion）and degrees of distortions that degrade image quality. Therefore，blind image quality assessment（BIQA）has practical significance in the field of image quality control and is helpful for subsequent image processing and analysis. Although many other methods have achieved reasonable results in the blind image quality assessment of degraded images，their image quality assessment accuracy warrants further improvement when dealing with the distortions of natural images. The challenges in assessing natural image distortions include the following：1）natural image distortions are much more complex compared with synthetic image distortions because the former contains not only global distortion （e. g. ，out of focus and Gaussian noise）but also local distortion（e. g. ，overexposure and motion blur），which increases the difficulty of image quality assessment；2）among the different semantic features extracted by deep convolutional neural network（DCNN），the lower-level semantic features contain less semantic information and cannot provide a comprehensive overview and understanding of the image information，thereby hindering networks from coping with the distortions of natural images with diverse contents；and 3）although the high-level semantic features obtained by DCNN contain rich semantic information，the lack of local detail information of the image easily makes the whole network overlook the local distortions. To address these problems，this paper proposes a blind image quality evaluation method called self-adaptive semantic awareness network（SSA-Net）. Method First，images from different databases are not uniform in size and are prone to be large，and deep-learning-based networks usually require a fixed size for input images. Therefore，all input images are randomly cropped 25 times to represent the content of the original image. Second，to enable the network to extract rich semantic features，a 50-layer deep residual network（ResNet-50）with pre-trained weights obtained from ImageNet is leveraged for feature extraction and is used to capture the semantic features of the images at each stage. Third，a multi-head position attention（MPA）module is designed to address the content diversity of naturally degraded images，which would improve the understanding of image content and the accuracy of the subsequent perceptions of distortion types by adding absolute position encoding into the multi-head position attention to acquire fixed distortion position information. Fourth，the selfadaptive feature awareness（SFA）module is presented to address the diversity of distortion types in naturally degraded images. This module combines the understanding of image content and the use of pooling kernels with different sizes to capture the global and local distortions in images. Fifth，a multi-level supervision regression（MSR）network with learnable parameters that uses lower-level semantic features to assist the higher-level semantic features is proposed to derive prediction scores that are in line with the human visual system. Result Experiments are conducted on 7 databases with 11 different methods for comparison. The proposed method achieves the best performance on four natural distortion image databases with Spearman rank order correlation coefficient（SRCC）values of 0. 867，0. 877，0. 913，and 0. 915 for LIVE in the Wild Image Quality Challenge（LIVEC）database，blurred image database（BID），Konstanz authentic image quality 10k database（KonIQ-10k），and smartphone photography attribute and quality（SPAQ）database，respectively. This method also obtains the highest Pearson linear correlation coefficient（PLCC）values of 0. 886，0. 881，0. 923，and 0. 921 on these databases. This method also obtains the top two SRCC values in two synthetic distortion image databases，including the laboratory for image & video engineering（LIVE）database and categorical subjective image quality（CSIQ）database. In the cross-validation，SSA-Net achieves competitive results in several natural distortion image quality databases and reasonable evaluation results in synthetic/natural image quality evaluation databases. SSA-Net also shows more desirable generalization performance than the self-adaptive hyper network and visual compensation restoration network on Waterloo Exploration database. Experimental results show that the proposed method outperforms the state-of-the-art methods in natural distortion image quality assessment databases and demonstrate stronger generalization performance. Conclusion The proposed method acquires accurate image distortion information by combining the understanding of the image content with the perception of different distortion types. The network can fuse information from different stages through an improved deep supervision mechanism and by setting learnable parameters that can efficiently adapt to the distortion of natural images and subsequently improve the image quality assessment accuracy.

Keywords

image quality assessment（IQA） blind image quality assessment（BIQA） deep learning self-adaptive semantic awareness network（SSA-Net） multi-level supervision regression（MSR）