自适应语义感知网络的盲图像质量评价

陈健; 万佳泽; 林丽; 李佐勇

doi:10.11834/jig.220939

图像处理和编码 | 浏览量 : 0 下载量: 2 CSCD: 0

PDF
导出
分享
收藏
专辑

自适应语义感知网络的盲图像质量评价
Self-adaptive semantic awareness network for blind image quality assessment
2023年28卷第11期页码：3400-3414
纸质出版日期： 2023-11-16 ，
DOI： 10.11834/jig.220939
稿件说明：

移动端阅览

陈健，万佳泽，林丽，李佐勇. 2023. 自适应语义感知网络的盲图像质量评价. 中国图象图形学报， 28(11):3400-3414

Chen Jian， Wan Jiaze， Lin Li， Li Zuoyong. 2023. Self-adaptive semantic awareness network for blind image quality assessment. Journal of Image and Graphics， 28(11):3400-3414
陈健，万佳泽，林丽，李佐勇. 2023. 自适应语义感知网络的盲图像质量评价. 中国图象图形学报， 28(11):3400-3414 DOI： 10.11834/jig.220939.

Chen Jian， Wan Jiaze， Lin Li， Li Zuoyong. 2023. Self-adaptive semantic awareness network for blind image quality assessment. Journal of Image and Graphics， 28(11):3400-3414 DOI： 10.11834/jig.220939.

摘要

目的

盲图像质量评价（blind image quality assessment，BIQA）在图像质量控制领域具有重要的实际意义。虽然目前针对自然失真图像的盲图像质量评价取得了合理的结果，但评价准确性仍有待进一步提升。

方法

提出一种自适应语义感知网络（self-adaptive semantic awareness network，SSA-Net）的盲图像质量评价方法，通过理解失真图像的内容和感知图像失真的类型来提高预测的准确性。首先，利用深度卷积神经网络（deep convolutional neural network，DCNN）获取各个阶段的语义特征，并提出多头位置注意力（multi-head position attention，MPA）模块通过聚合特征图的长距离语义信息来加强对图像内容的理解。接着，提出基于多尺度内核的自适应特征感知（self-adaptive feature awareness，SFA）模块感知图像的失真类型，并结合图像内容来捕获图像的全局失真和局部失真情况。最后，提出多级监督回归（multi-level supervision regression，MSR）网络通过利用低层次的语义特征辅助高层次的语义特征得到预测分数。

结果

本文方法在7个数据库上与11种不同方法进行了比较，在LIVEC（LIVE in the Wild Image Quality Challenge）、BID（blurred image database）、KonIQ-10k（Konstanz authentic image quality 10k database）和SPAQ（smartphone photography attribute and quality）4个自然失真图像数据库中的斯皮尔曼等级相关系数（Spearman rank order correlation coefficient，SRCC）值分别为0.867、0.877、0.913和0.915，获得了所有方法中最好的性能结果。同时在两个人工失真图像数据库中获得了排名前2的SRCC值。实验结果表明，与其他先进方法相比，本文方法在自然失真图像质量评价数据库上的表现更为优异。

结论

本文方法通过结合图像内容理解与不同失真类型感知，能更好地适应自然图像的失真，提高评价准确性。

Abstract

Objective

The rapid development of imaging technology has been accompanied by continuous updates in acquisition equipment and related technologies over the past few decades. However， the quality of images is susceptible to interferences from various stages， including acquisition， processing， transmission， and storage， which eventually introduce different types （e.g.， JPEG2000 compression， JPEG compression， white Gaussian noise， Gaussian blur， fast fading distortion， and contrast distortion） and degrees of distortions that degrade image quality. Therefore， blind image quality assessment （BIQA） has practical significance in the field of image quality control and is helpful for subsequent image processing and analysis. Although many other methods have achieved reasonable results in the blind image quality assessment of degraded images， their image quality assessment accuracy warrants further improvement when dealing with the distortions of natural images. The challenges in assessing natural image distortions include the following： 1） natural image distortions are much more complex compared with synthetic image distortions because the former contains not only global distortion （e.g.， out of focus and Gaussian noise） but also local distortion （e.g.， overexposure and motion blur）， which increases the difficulty of image quality assessment； 2） among the different semantic features extracted by deep convolutional neural network （DCNN）， the lower-level semantic features contain less semantic information and cannot provide a comprehensive overview and understanding of the image information， thereby hindering networks from coping with the distortions of natural images with diverse contents； and 3） although the high-level semantic features obtained by DCNN contain rich semantic information， the lack of local detail information of the image easily makes the whole network overlook the local distortions. To address these problems， this paper proposes a blind image quality evaluation method called self-adaptive semantic awareness network （SSA-Net）.

Method

First， images from different databases are not uniform in size and are prone to be large， and deep-learning-based networks usually require a fixed size for input images. Therefore， all input images are randomly cropped 25 times to represent the content of the original image. Second， to enable the network to extract rich semantic features， a 50-layer deep residual network （ResNet-50） with pre-trained weights obtained from ImageNet is leveraged for feature extraction and is used to capture the semantic features of the images at each stage. Third， a multi-head position attention （MPA） module is designed to address the content diversity of naturally degraded images， which would improve the understanding of image content and the accuracy of the subsequent perceptions of distortion types by adding absolute position encoding into the multi-head position attention to acquire fixed distortion position information. Fourth， the self-adaptive feature awareness （SFA） module is presented to address the diversity of distortion types in naturally degraded images. This module combines the understanding of image content and the use of pooling kernels with different sizes to capture the global and local distortions in images. Fifth， a multi-level supervision regression （MSR） network with learnable parameters that uses lower-level semantic features to assist the higher-level semantic features is proposed to derive prediction scores that are in line with the human visual system.

Result

Experiments are conducted on 7 databases with 11 different methods for comparison. The proposed method achieves the best performance on four natural distortion image databases with Spearman rank order correlation coefficient （SRCC） values of 0.867， 0.877， 0.913， and 0.915 for LIVE in the Wild Image Quality Challenge （LIVEC） database， blurred image database （BID）， Konstanz authentic image quality 10k database （KonIQ-10k）， and smartphone photography attribute and quality （SPAQ） database， respectively. This method also obtains the highest Pearson linear correlation coefficient （PLCC） values of 0.886， 0.881， 0.923， and 0.921 on these databases. This method also obtains the top two SRCC values in two synthetic distortion image databases， including the laboratory for image & video engineering （LIVE） database and categorical subjective image quality （CSIQ） database. In the cross-validation， SSA-Net achieves competitive results in several natural distortion image quality databases and reasonable evaluation results in synthetic/natural image quality evaluation databases. SSA-Net also shows more desirable generalization performance than the self-adaptive hyper network and visual compensation restoration network on Waterloo Exploration database. Experimental results show that the proposed method outperforms the state-of-the-art methods in natural distortion image quality assessment databases and demonstrate stronger generalization performance.

Conclusion

The proposed method acquires accurate image distortion information by combining the understanding of the image content with the perception of different distortion types. The network can fuse information from different stages through an improved deep supervision mechanism and by setting learnable parameters that can efficiently adapt to the distortion of natural images and subsequently improve the image quality assessment accuracy.

关键词

图像质量评价（IQA）盲图像质量评价（BIQA）深度学习自适应语义感知网络（SSA-Net）多级监督回归（MSR）

Keywords

image quality assessment （IQA）blind image quality assessment （BIQA）deep learningself-adaptive semantic awareness network （SSA-Net）multi-level supervision regression （MSR）

references

Bare B， Li K and Yan B. 2017. An accurate deep convolutional neural networks model for no-reference image quality assessment//Proceedings of 2017 IEEE International Conference on Multimedia and Expo. Hong Kong， China： IEEE： 1356-1361 ［DOI： 10.1109/ICME.2017.8019508http://dx.doi.org/10.1109/ICME.2017.8019508］

Bosse S， Maniry D， Müller K R， Wiegand T and Samek W. 2018. Deep neural networks for no-reference and full-reference image quality assessment. IEEE Transactions on Image Processing， 27（1）： 206-219 ［DOI： 10.1109/TIP.2017.2760518http://dx.doi.org/10.1109/TIP.2017.2760518］

Chen J， Li S Y， Lin L， Wang M and Li Z Y. 2022. A review on no-reference quality assessment for blurred image. Acta Automatica Sinica， 48（3）： 689-711

陈健，李诗云，林丽，王猛，李佐勇. 2022. 模糊失真图像无参考质量评价综述. 自动化学报， 48（3）： 689-711 ［DOI： 10.16383/j.aas.c201030http://dx.doi.org/10.16383/j.aas.c201030］

Chen Y， Wu M M， Fang H and Liu H L. 2020. No-reference image quality assessment based on differential excitation. Acta Automatica Sinica， 46（8）： 1727-1737

陈勇，吴明明，房昊，刘焕淋. 2020. 基于差异激励的无参考图像质量评价. 自动化学报， 46（8）： 1727-1737 ［DOI： 10.16383/j.aas.c180088http://dx.doi.org/10.16383/j.aas.c180088］

Ciancio A， Da Costa A L N T， De Silva E A B， Said A， Samadani R and Obrador P. 2011. No-reference blur assessment of digital pictures based on multifeature classifiers. IEEE Transactions on Image Processing， 20（1）： 64-75 ［DOI： 10.1109/TIP.2010.2053549http://dx.doi.org/10.1109/TIP.2010.2053549］

Fang Y M， Zhu H W， Zeng Y， Ma K D and Wang Z. 2020. Perceptual quality assessment of smartphone photography//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle， USA： IEEE： 3674-3683 ［DOI： 10.1109/CVPR42600.2020.00373http://dx.doi.org/10.1109/CVPR42600.2020.00373］

Gao M J， Dang H S， Wei L L， Wang H L and Zhang X D. 2020. Combining global and local variation for image quality assessment. Acta Automatica Sinica， 46（12）： 2662-2671

高敏娟，党宏社，魏立力，王海龙，张选德. 2020. 结合全局与局部变化的图像质量评价. 自动化学报， 46（12）： 2662-2671 ［DOI： 10.16383/j.aas.c190697http://dx.doi.org/10.16383/j.aas.c190697］

Ghadiyaram D and Bovik A C. 2016. Massive online crowdsourced study of subjective and objective picture quality. IEEE Transactions on Image Processing， 25（1）： 372-387 ［DOI： 10.1109/TIP.2015.2500021http://dx.doi.org/10.1109/TIP.2015.2500021］

He K M， Zhang X Y， Ren S Q and Sun J. 2015. Delving deep into rectifiers： surpassing human-level performance on imageNet classification//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago， Chile： IEEE： 1026-1034 ［DOI： 10.1109/ICCV.2015.123http://dx.doi.org/10.1109/ICCV.2015.123］

He K M， Zhang X Y， Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas， USA： IEEE： 770-778 ［DOI： 10.1109/CVPR.2016.90http://dx.doi.org/10.1109/CVPR.2016.90］

Hosu V， Lin H H， Sziranyi T and Saupe D. 2020. KonIQ-10k： an ecologically valid database for deep learning of blind image quality assessment. IEEE Transactions on Image Processing， 29： 4041-4056 ［DOI： 10.1109/TIP.2020.2967829http://dx.doi.org/10.1109/TIP.2020.2967829］

Kang L， Ye P， Li Y and Doermann D. 2014. Convolutional neural networks for no-reference image quality assessment//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus， USA： IEEE： 1733-1740 ［DOI： 10.1109/CVPR.2014.224http://dx.doi.org/10.1109/CVPR.2014.224］

Kim J and Lee S. 2017. Fully deep blind image quality predictor. IEEE Journal of Selected Topics in Signal Processing， 11（1）： 206-220 ［DOI： 10.1109/JSTSP.2016.2639328http://dx.doi.org/10.1109/JSTSP.2016.2639328］

Krizhevsky A， Sutskever I and Hinton G E. 2012. ImageNet classification with deep convolutional neural networks//Proceedings of the 25th International Conference on Neural Information Processing Systems. Lake Tahoe， USA： Curran Associates Inc.， 1097-1105

Larson E C and Chandler D M. 2010. Most apparent distortion： full-reference image quality assessment and the role of strategy. Journal of Electronic Imaging， 19（1）： #11006 ［DOI： 10.1117/1.3267105http://dx.doi.org/10.1117/1.3267105］

Lee C Y， Xie S N， Gallagher P W， Zhang Z Y and Tu Z W. 2015. Deeply-supervised Nets//Proceedings of the 18th International Conference on Artificial Intelligence and Statistics. San Diego， USA： JMLR.org： 562-570

Li B W， Tian M， Zhang W X and Wang X P. 2021. Blind image quality assessment based on sparse representation of multi-level information. Journal of Huazhong University of Science and Technology （Nature Science Edition）， 49（8）： 40-45

李博文，田猛，张维夏，王先培. 2021. 基于多层级信息稀疏表征的盲图像质量评价. 华中科技大学学报（自然科学版）， 49（8）： 40-45 ［DOI： 10.13245/j.hust.210808http://dx.doi.org/10.13245/j.hust.210808］

Li D Q， Jiang T T， Lin W S and Jiang M. 2019. Which has better visual quality： the clear blue sky or a blurry animal？ IEEE Transactions on Multimedia， 21（5）： 1221-1234 ［DOI： 10.1109/TMM.2018.2875354http://dx.doi.org/10.1109/TMM.2018.2875354］

Ma K D， Duanmu Z F， Wu Q B， Wang Z， Yong H W， Li H L and Zhang L. 2017. Waterloo exploration database： new challenges for image quality assessment models. IEEE Transactions on Image Processing， 26（2）： 1004-1016 ［DOI： 10.1109/TIP.2016.2631888http://dx.doi.org/10.1109/TIP.2016.2631888］

Ma K D， Wu Q B， Wang Z， Duanmu Z F， Yong H， Li H L and Zhang L. 2016. Group MAD competition？ A new methodology to compare objective image quality models//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas， USA： IEEE： 1664-1673 ［DOI： 10.1109/CVPR.2016.184http://dx.doi.org/10.1109/CVPR.2016.184］

Pan Z Q， Yuan F， Lei J J， Fang Y M， Shao X and Kwong S. 2022. VCRNet： visual compensation restoration network for no-reference image quality assessment. IEEE Transactions on Image Processing， 31： 1613-1627 ［DOI： 10.1109/TIP.2022.3144892http://dx.doi.org/10.1109/TIP.2022.3144892］

Qureshi M A， Deriche M and Beghdadi A. 2016. Quantifying blur in colour images using higher order singular values. Electronics Letters， 52（21）： 1755-1757 ［DOI： 10.1049/el.2016.1792http://dx.doi.org/10.1049/el.2016.1792］

Ren H Y， Chen D Q and Wang Y Z. 2018. RAN4IQA： restorative adversarial nets for no-reference image quality assessment//Proceedings of the 32nd AAAI Conference on Artificial Intelligence. New Orleans， USA： AAAI： #12258 ［DOI： 10.1609/aaai.v32i1.12258http://dx.doi.org/10.1609/aaai.v32i1.12258］

Sandler M， Howard A， Zhu M L， Zhmoginov A and Chen L C. 2018. MobileNetV2： inverted residuals and linear bottlenecks//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City， USA： IEEE： 4510-4520 ［DOI： 10.1109/CVPR.2018.00474http://dx.doi.org/10.1109/CVPR.2018.00474］

Sheikh H R， Sabir M F and Bovik A C. 2006. A statistical evaluation of recent full reference image quality assessment algorithms. IEEE Transactions on Image Processing， 15（11）： 3440-3451 ［DOI： 10.1109/TIP.2006.881959http://dx.doi.org/10.1109/TIP.2006.881959］

Song T S， Li L D， Zhu H C and Qian J S. 2021. IE-IQA： intelligibility enriched generalizable no-reference image quality assessment. Frontiers in Neuroscience， 15： #739138 ［DOI： 10.3389/fnins.2021.739138http://dx.doi.org/10.3389/fnins.2021.739138］

Su S L， Yan Q S， Zhu Y， Zhang C， Ge X， Sun J Q and Zhang Y N. 2020. Blindly assess image quality in the wild guided by a self-adaptive hyper network//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle， USA： IEEE： 3664-3673 ［DOI： 10.1109/CVPR42600.2020.00372http://dx.doi.org/10.1109/CVPR42600.2020.00372］

Wang P Q， Chen P F， Yuan Y， Liu D， Huang Z H， Hou X D and Cottrell G. 2018a. Understanding convolution for semantic segmentation//Proceedings of 2018 IEEE Winter Conference on Applications of Computer Vision. Lake Tahoe， USA： IEEE： 1451-1460. ［DOI： 10.1109/WACV.2018.00163］.

Wang X L， Girshick R， Gupta A， He K M. 2018b. Non-local neural networks//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City， USA： IEEE： 7794-7803 ［DOI： 10.1109/CVPR.2018.00813http://dx.doi.org/10.1109/CVPR.2018.00813］

Wu H S， Wang W， Zhong J F， Lei B Y， Wen Z K and Qin J. 2021. SCS-Net： a scale and context sensitive network for retinal vessel segmentation. Medical Image Analysis， 70： #102025 ［DOI： 10.1016/j.media.2021.102025http://dx.doi.org/10.1016/j.media.2021.102025］

Yan J B， Fang Y M and Liu X L. 2022. The review of distortion-related image quality assessment. Journal of Image and Graphics， 27（5）： 1430-1466

鄢杰斌，方玉明，刘学林. 2022. 图像质量评价研究综述 —— 从失真的角度. 中国图象图形学报， 27（5）： 1430-1466 ［DOI： 10.11834/jig.210790http://dx.doi.org/10.11834/jig.210790］

Zeng H， Zhang L and Bovik A C. 2017. A probabilistic quality representation approach to deep blind image quality prediction ［EB/OL］. ［2022-09-26］. https://arxiv.org/pdf/1708.08190.pdfhttps://arxiv.org/pdf/1708.08190.pdf

Zhang F Y and Roysam B. 2016. Blind quality metric for multidistortion images based on cartoon and texture decomposition. IEEE Signal Processing Letters， 23（9）： 1265-1269 ［DOI： 10.1109/LSP.2016.2594166http://dx.doi.org/10.1109/LSP.2016.2594166］

Zhang W X， Ma K D， Yan J， Deng D X and Wang Z. 2020. Blind image quality assessment using a deep bilinear convolutional neural network. IEEE Transactions on Circuits and Systems for Video Technology， 30（1）： 36-47 ［DOI： 10.1109/TCSVT.2018.2886771http://dx.doi.org/10.1109/TCSVT.2018.2886771］

Zhang W X， Ma K D， Zhai G T and Yang X K. 2021. Uncertainty-aware blind image quality assessment in the laboratory and wild. IEEE Transactions on Image Processing， 30： 3474-3486 ［DOI： 10.1109/TIP.2021.3061932http://dx.doi.org/10.1109/TIP.2021.3061932］

Zhu H C， Li L D， Wu J J， Dong W S and Shi G M. 2020. MetaIQA： deep meta-learning for no-reference image quality assessment//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle， USA： IEEE： 14131-14140 ［DOI： 10.1109/CVPR42600.2020.01415http://dx.doi.org/10.1109/CVPR42600.2020.01415］

Zhu H C， Li L D， Wu J J， Dong W S and Shi G M. 2022. Generalizable no-reference image quality assessment via deep meta-learning. IEEE Transactions on Circuits and Systems for Video Technology， 32（3）： 1048-1060 ［DOI： 10.1109/TCSVT.2021.3073410http://dx.doi.org/10.1109/TCSVT.2021.3073410］

Zhu Z， Xu M D， Bai S， Huang T T and Bai X. 2019. Asymmetric non-local neural networks for semantic segmentation//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul， Korea （South）： IEEE： 593-602 ［DOI： 10.1109/ICCV.2019.00068http://dx.doi.org/10.1109/ICCV.2019.00068］

文章被引用时，请邮件提醒。

提交