Scene-assisted image aesthetic attribute assessment
- Vol. 27, Issue 11, Pages: 3199-3209(2022)
Published: 16 November 2022 ,
Accepted: 03 November 2021
DOI: 10.11834/jig.210561
移动端阅览
浏览全部资源
扫码关注微信
Published: 16 November 2022 ,
Accepted: 03 November 2021
移动端阅览
Leida Li, Jiachen Duan, Yuzhe Yang, Yaqian Li. Scene-assisted image aesthetic attribute assessment. [J]. Journal of Image and Graphics 27(11):3199-3209(2022)
目的
2
图像美学属性评价可以提供丰富的美学要素,极大地增强图像美学的可解释性。然而现有的图像美学属性评价方法并没有考虑到图像场景类别的多样性,导致评价任务的性能不够理想。为此,本文提出一种深度多任务卷积神经网络(multi task convolutional neural network,MTCNN)模型,利用场景信息辅助图像的美学属性预测。
方法
2
本文模型由双流深度残差网络组成,其中一支网络基于场景预测任务进行训练,以提取图像的场景特征;另一支网络提取图像的美学特征。然后融合这两种特征,通过多任务学习的方式进行训练,以预测图像的美学属性和整体美学分数。
结果
2
为了验证模型的有效性,在图像美学属性数据集(aesthetics and attributes database,AADB)上进行实验验证。结果显示,在斯皮尔曼相关系数(Spearman rank-order correlation coefficient,SRCC)指标上,本文方法各美学属性预测的结果较其他方法的最优值平均提升了6.1%,本文方法整体美学分数预测的结果较其他方法的最优值提升了6.2%。
结论
2
提出的图像美学属性预测方法,挖掘了图像中的场景语义与美学属性的耦合关系,有效地提高了图像美学属性及美学分数预测的准确率。
Objective
2
Image aesthetic assessment is oriented to simulate human perception of beauty and identify image-related aesthetic quality assessment. It is essential for computer vision applications in the context of image forecasting
photos portfolio management
image enhancement and retrieval. Current image aesthetic quality evaluation method has been mainly focused on three major tasks as mentioned below: 1) aesthetic binary classification: divide images quality into high aesthetic and low aesthetic context; 2) aesthetic score regression: calculate the overall aesthetic average score of an image; 3) aesthetics distribution prediction: predict the probability of different aesthetic ratings of an image. Beyond binary classification to aesthetic score regression
more aesthetic information can be provided via the prediction of aesthetic distribution. However
these methods are still restricted of the factors of aesthetic prior knowledge and challenged for the source of aesthetic feeling. Image attributes has rich aesthetic contexts like content
brightness
depth of field and color richness. As a "hub" between image low-level features and aesthetic quality
these attributes can enhance the interpretability of aesthetic evaluation and play an important role in image aesthetic quality assessment. The aesthetic quality of an image is judged with a specific scene in common. Specifically
people make aesthetic judgments according to multiple aesthetic attributes. There is a strong correlation between aesthetic attributes and aesthetic quality
and the aesthetic attributes can provide interpretable details for aesthetic quality assessment. For instance
to assess a portrait image
we focus on the details of the foreground rather than those of the background. In contrast
we tend to treat the details less important than in the assessment of a portrait image for assessing a landscape image. Hence
we facilitate an image aesthetic attribute prediction model based on multi-tasks deep learning technique
which uses scene information to assist image aesthetic attributes prediction. More accurate image aesthetic score prediction is achieved.
Method
2
The model consists of a two-stream deep residual network. To obtain the scene information of the image
the first stream of the network is trained based on the scene prediction task. To predict the aesthetic attributes and overall aesthetic scores of the image
the second stream is used to extract the aesthetic features of the image
and then combine the two features for training through multi-tasks learning. In order to use the scene information of the image to assist the prediction of aesthetic attributes
we train the first stream of the network to predict the image scene category. After training the scene prediction stream
we train the attribute prediction stream via attributes-labeled aesthetic images. We use concatenation to fuse the features of the dual-stream network
and the full connection layers are trained to obtain the joint distribution of the aesthetic attributes and the overall score. For each image aesthetic attribute
we want to get its individual regression score. Our mean square error (MSE) loss function is used to measure the degree of difference between the predicted value and the ground truth. Our experiment is based on the aesthetic and attributes database (AADB). AADB consists of a total of 10 000 images
and the standard partition is followed on the basis of 8 500 images for training
500 images for validation and the remaining 1 000 images for testing. We scale the images to 256×256×3 before inputting to the network. The i7-10700 CPU and NVIDIA GTX 1660 super GPU are equipped. The batch size is set to 12
epoch is set to 15
and adam optimization algorithm is used. The learning rate of the backbone network is set to 1E-5
and the learning rate of fully connected network is set to 1E-6. In Combination with the image scene information
the proposed model improves the prediction accuracy in terms of the image aesthetic attributes and aesthetic scores.
Result
2
Our method has improved the prediction accuracy of the majority of aesthetic attributes
and the correlation coefficient of the overall aesthetic score prediction has also improved about 6%
which is feasible to melt scene information into the prediction of aesthetic attributes.
Conclusion
2
The integrated scene information for aesthetic attributes prediction clarify the intimate relation between image scene category and aesthetic attributes
and the experimental results demonstrate that our scene information has its potentials for image aesthetic quality assessment. The future research direction can be focused on deep relationship between scene semantics and image aesthetics. This deep relationship could build a more robust image aesthetic assessment framework
which can consistently improve the performance of image aesthetic quality assessment
as well as enhance the interpretability of aesthetic assessment.
图像美学评价美学属性深度卷积网络多任务学习场景分类
image aesthetic quality assessmentaesthetic attributesdeep convolution networkmulti-task learningscene classification
Cao G M, Xie X M, Yang W Z, Liao Q, Shi G M and Wu J J. 2018. Feature-fused SSD: fast detection for small objects//Proceedings Volume 10615, Ninth International Conference on Graphic and Image Processing (ICGIP 2017). Qingdao, China: SPIE: #106151[DOI:10.1117/12.2304811http://dx.doi.org/10.1117/12.2304811]
Chaudhary P, Shaw K and Mallick P K. 2018. A survey on image enhancement techniques using aesthetic community//Proceedings of 2018 International Conference on Intelligent Computing and Applications. Singapore, Singapore: Springer: 585-596[DOI:10.1007/978-981-10-5520-1_53http://dx.doi.org/10.1007/978-981-10-5520-1_53]
Deng J, Dong W, Socher R, Li L J, Li K and Li F F. 2009. ImageNet: a large-scale hierarchical image database//Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami, USA: IEEE: 248-255[DOI:10.1109/CVPR.2009.5206848http://dx.doi.org/10.1109/CVPR.2009.5206848]
Deng Y B, Loy C C and Tang X O. 2017. Image aesthetic assessment: an experimental survey. IEEE Signal Processing Magazine, 34(4): 80-106[DOI: 10.1109/MSP.2017.2696576]
Dong Z, Shen X, Li H Q and Tian X M. 2015. Photo quality assessment with DCNN that understands image well//Proceedings of the 21st International Conference on MultiMedia Modeling. Sydney, Australia: Springer: 524-535[DOI:10.1007/978-3-319-14442-9_57http://dx.doi.org/10.1007/978-3-319-14442-9_57]
He K M, Zhang X Y, Ren S Q and J Sun. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 770-778[DOI:10.1109/CVPR.2016.90http://dx.doi.org/10.1109/CVPR.2016.90]
Jin X, Wu L, Li X D, Chen S Y, Peng S W, Chi J Y, Ge S M, Song C G and Zhao G. 2018. Predicting aesthetic score distribution through cumulative Jensen-Shannon divergence//The 32nd AAAI Conference on Artificial Intelligence and 30th Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence. New Orleans, USA: AAAI Press: 77-84
Joshi D, Datta R, Fedorovskaya E, Luong Q T, Wang J Z, Li J and Luo J B. 2011. Aesthetics and emotions in images. IEEE Signal Processing Magazine, 28(5): 94-115[DOI: 10.1109/MSP.2011.941851]
Kang C, Valenzise G and Dufaux F. 2020. EVA: an explainable visual aesthetics dataset//Joint Workshop on Aesthetic and Technical Quality Assessment of Multimedia and Media Analytics for Societal Trends. Seattle, USA: ACM: 5-13[DOI:10.1145/3423268.3423590http://dx.doi.org/10.1145/3423268.3423590]
Kong S, Shen X H, Lin Z, Mech R and Fowlkes C. 2016. Photo aesthetics ranking network with attributes and content adaptation//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands: Springer: 662-679[DOI:10.1007/978-3-319-46448-0_40http://dx.doi.org/10.1007/978-3-319-46448-0_40]
Li L D, Zhu H C, Zhao S C, Ding G G and Lin W S. 2020. Personality-assisted multi-task learning for generic and personalized image aesthetics assessment. IEEE Transactions on Image Processing, 29: 3898-3910[DOI: 10.1109/TIP.2020.2968285]
Liu L G, Chen R J, Wolf L and Cohen-Or D. 2010. Optimizing photo composition. Computer Graphics Forum, 29(2): 469-478[DOI: 10.1111/j.1467-8659.2009.01616.x]
Lu X, Lin Z, Jin H L, Yang J C and Wang J Z. 2015. Rating image aesthetics using deep learning. IEEE Transactions on Multimedia, 17(11): 2021-2034[DOI: 10.1109/TMM.2015.2477040]
Luo Y W and Tang X O. 2008. Photo and video quality evaluation: focusing on the subject//Proceedings of the 10th European Conference on Computer Vision. Marseille, France: Springer: 386-399[DOI:10.1007/978-3-540-88690-7_29http://dx.doi.org/10.1007/978-3-540-88690-7_29]
Ma S, Liu J and Chen C W. 2017. A-Lamp: adaptive layout-aware multi-patch deep convolutional neural network for photo aesthetic assessment//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 722-731[DOI:10.1109/CVPR.2017.84http://dx.doi.org/10.1109/CVPR.2017.84]
Malu G, Bapi R S and Indurkhya B. 2017. Learning photography aesthetics with deep CNNs[EB/OL]. [2021-07-13].https://arxiv.org/pdf/1707.03981.pdfhttps://arxiv.org/pdf/1707.03981.pdf
Nanay B, Smith M, Irvin S and Schellekens E. 2019. Is psychology relevant to aesthetics? A symposium. Estetika: The Central European Journal of Aesthetics, 56(1): 87-138[DOI:10.33134/eeja.185http://dx.doi.org/10.33134/eeja.185]
Pan B W, Wang S F and Jiang Q S. 2019. Image aesthetic assessment assisted by attributes through adversarial learning. Proceedings of the AAAI Conference on Artificial Intelligence, 33(1): 679-686[DOI: 10.1609/aaai.v33i01.3301679]
Szegedy C, Liu W, Jia Y Q, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V and Rabinovich A. 2015. Going deeper with convolutions//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 1-9[DOI:10.1109/CVPR.2015.7298594http://dx.doi.org/10.1109/CVPR.2015.7298594]
Talebi H and Milanfar P. 2018. NIMA: neural image assessment. IEEE Transactions on Image Processing, 27(8): 3998-4011[DOI: 10.1109/TIP.2018.2831899]
Tang X O, Luo W and Wang X G. 2013. Content-based photo quality assessment. IEEE Transactions on Multimedia, 15(8): 1930-1943[DOI: 10.1109/TMM.2013.2269899]
Wang W N, Wang L, Zhao M Q, Cai J C, Shi T T and Xu X M. 2016. Image aesthetic classification using parallel deep convolutional neural networks. Acta Automatica Sinica, 42(6): 904-914
王伟凝, 王励, 赵明权, 蔡成加, 师婷婷, 徐向民. 2016. 基于并行深度卷积神经网络的图像美感分类. 自动化学报, 42(6): 904-914[DOI: 10.16383/j.aas.2016.c150718]
Wang W N, Yi J J and He Q H. 2012. Review for computational image aesthetics. Journal of Image and Graphics, 17(8): 893-901
王伟凝, 蚁静缄, 贺前华. 2012. 可计算图像美学研究进展. 中国图象图形学报, 17(8): 893-901[DOI: 10.11834/jig.20120801]
Yan Z X, Zhang H, Wang B Y, Paris S and Yu Y Z. 2016. Automatic photo adjustment using deep neural networks. ACM Transactions on Graphics, 35(2): #11[DOI: 10.1145/2790296]
Zeng H, Cao Z S, Zhang L and Bovik A C. 2020. A unified probabilistic formulation of image aesthetic assessment. IEEE Transactions on Image Processing, 29: 1548-1561[DOI: 10.1109/TIP.2019.2941778]
Zhang J, Yang Y, Tian Q, Zhuo L and Liu X. 2017. Personalized social image recommendation method based on user-image-tag model. IEEE Transactions on Multimedia, 19(11): 2439-2449[DOI:10.1109/TMM.2017.2701641]
相关文章
相关作者
相关机构