Current Issue Cover
融合场景信息的图像美学属性评价

李雷达1, 段佳忱1, 杨宇哲2, 李亚乾2(1.西安电子科技大学人工智能学院, 西安 710071;2.OPPO研究院, 上海 200032)

摘 要
目的 图像美学属性评价可以提供丰富的美学要素,极大地增强图像美学的可解释性。然而现有的图像美学属性评价方法并没有考虑到图像场景类别的多样性,导致评价任务的性能不够理想。为此,本文提出一种深度多任务卷积神经网络(multi task convolutional neural network,MTCNN)模型,利用场景信息辅助图像的美学属性预测。方法 本文模型由双流深度残差网络组成,其中一支网络基于场景预测任务进行训练,以提取图像的场景特征;另一支网络提取图像的美学特征。然后融合这两种特征,通过多任务学习的方式进行训练,以预测图像的美学属性和整体美学分数。结果 为了验证模型的有效性,在图像美学属性数据集(aesthetics and attributes database,AADB)上进行实验验证。结果显示,在斯皮尔曼相关系数(Spearman rank-order correlation coefficient,SRCC)指标上,本文方法各美学属性预测的结果较其他方法的最优值平均提升了6.1%,本文方法整体美学分数预测的结果较其他方法的最优值提升了6.2%。结论 提出的图像美学属性预测方法,挖掘了图像中的场景语义与美学属性的耦合关系,有效地提高了图像美学属性及美学分数预测的准确率。
关键词
Scene-assisted image aesthetic attribute assessment

Li Leida1, Duan Jiachen1, Yang Yuzhe2, Li Yaqian2(1.School of Artificial Intelligence, Xidian University, Xi'an 710071, China;2.OPPO Research Institute, Shanghai 200032, China)

Abstract
Objective Image aesthetic assessment is oriented to simulate human perception of beauty and identify image-related aesthetic quality assessment. It is essential for computer vision applications in the context of image forecasting, photos portfolio management, image enhancement and retrieval. Current image aesthetic quality evaluation method has been mainly focused on three major tasks as mentioned below:1) aesthetic binary classification:divide images quality into high aesthetic and low aesthetic context; 2) aesthetic score regression:calculate the overall aesthetic average score of an image; 3) aesthetics distribution prediction:predict the probability of different aesthetic ratings of an image. Beyond binary classification to aesthetic score regression, more aesthetic information can be provided via the prediction of aesthetic distribution. However, these methods are still restricted of the factors of aesthetic prior knowledge and challenged for the source of aesthetic feeling. Image attributes has rich aesthetic contexts like content, brightness, depth of field and color richness. As a "hub" between image low-level features and aesthetic quality, these attributes can enhance the interpretability of aesthetic evaluation and play an important role in image aesthetic quality assessment. The aesthetic quality of an image is judged with a specific scene in common. Specifically, people make aesthetic judgments according to multiple aesthetic attributes. There is a strong correlation between aesthetic attributes and aesthetic quality, and the aesthetic attributes can provide interpretable details for aesthetic quality assessment. For instance, to assess a portrait image, we focus on the details of the foreground rather than those of the background. In contrast, we tend to treat the details less important than in the assessment of a portrait image for assessing a landscape image. Hence, we facilitate an image aesthetic attribute prediction model based on multi-tasks deep learning technique, which uses scene information to assist image aesthetic attributes prediction. More accurate image aesthetic score prediction is achieved. Method The model consists of a two-stream deep residual network. To obtain the scene information of the image, the first stream of the network is trained based on the scene prediction task. To predict the aesthetic attributes and overall aesthetic scores of the image, the second stream is used to extract the aesthetic features of the image, and then combine the two features for training through multi-tasks learning. In order to use the scene information of the image to assist the prediction of aesthetic attributes, we train the first stream of the network to predict the image scene category. After training the scene prediction stream, we train the attribute prediction stream via attributes-labeled aesthetic images. We use concatenation to fuse the features of the dual-stream network, and the full connection layers are trained to obtain the joint distribution of the aesthetic attributes and the overall score. For each image aesthetic attribute, we want to get its individual regression score. Our mean square error (MSE) loss function is used to measure the degree of difference between the predicted value and the ground truth. Our experiment is based on the aesthetic and attributes database (AADB). AADB consists of a total of 10 000 images, and the standard partition is followed on the basis of 8 500 images for training, 500 images for validation and the remaining 1 000 images for testing. We scale the images to 256×256×3 before inputting to the network. The i7-10700 CPU and NVIDIA GTX 1660 super GPU are equipped. The batch size is set to 12, epoch is set to 15, and adam optimization algorithm is used. The learning rate of the backbone network is set to 1E-5, and the learning rate of fully connected network is set to 1E-6. In Combination with the image scene information, the proposed model improves the prediction accuracy in terms of the image aesthetic attributes and aesthetic scores. Result Our method has improved the prediction accuracy of the majority of aesthetic attributes, and the correlation coefficient of the overall aesthetic score prediction has also improved about 6%, which is feasible to melt scene information into the prediction of aesthetic attributes. Conclusion The integrated scene information for aesthetic attributes prediction clarify the intimate relation between image scene category and aesthetic attributes, and the experimental results demonstrate that our scene information has its potentials for image aesthetic quality assessment. The future research direction can be focused on deep relationship between scene semantics and image aesthetics. This deep relationship could build a more robust image aesthetic assessment framework, which can consistently improve the performance of image aesthetic quality assessment, as well as enhance the interpretability of aesthetic assessment.
Keywords

订阅号|日报