自适应多任务学习的自动艺术分析
Automatic art analysis based on adaptive multi-task learning
- 2022年27卷第4期 页码:1226-1237
收稿:2020-11-17,
修回:2021-2-18,
录用:2021-2-25,
纸质出版:2022-04-16
DOI: 10.11834/jig.200648
移动端阅览

浏览全部资源
扫码关注微信
收稿:2020-11-17,
修回:2021-2-18,
录用:2021-2-25,
纸质出版:2022-04-16
移动端阅览
目的
2
艺术品数字化为从计算机视觉角度对艺术品研究提供了巨大机会。为更好地为数字艺术品博物馆提供艺术作品分类和艺术检索功能,使人们深入理解艺术品内涵,弘扬传统文化,促进文化遗产保护,本文将多任务学习引入自动艺术分析任务,基于贝叶斯理论提出一种原创性的自适应多任务学习方法。
方法
2
基于层次贝叶斯理论利用各任务之间的相关性引入任务簇约束损失函数模型。依据贝叶斯建模方法,通过最大化不确定性的高斯似然构造多任务损失函数,最终构建了一种自适应多任务学习模型。这种自适应多任务学习模型能够很便利地扩展至任意同类学习任务,相比其他最新模型能够更好地提升学习的性能,取得更佳的分析效果。
结果
2
本文方法解决了多任务学习中每个任务损失之间相对权重难以决策这一难题,能够自动决策损失函数的权重。为了评估本文方法的性能,在多模态艺术语义理解SemArt数据库上进行艺术作品分类以及跨模态艺术检索实验。艺术作品分类实验结果表明,本文方法相比于固定权重的多任务学习方法,在“时间范围”属性上提升了4.43%,同时本文方法的效果也优于自动确定损失权重的现有方法。跨模态艺术检索实验结果也表明,与使用“作者”属性的最新的基于知识图谱模型相比较,本文方法的改进幅度为9.91%,性能与分类的结果一致。
结论
2
本文方法可以在多任务学习框架内自适应地学习每个任务的权重,与目前流行的方法相比能显著提高自动艺术分析任务的性能。
Objective
2
To improve learning efficiency and prediction accuracy
multi-task learning aims to tackle multiple tasks based on the generic features assumption those are prior to task-related features. Multi-task learning technique has been applied in a variety of computer vision applications on the aspects of object detection and tracking
object recognition
human-based identification and human facial attribute classification. The worldwide digitization of artwork has called to art research from the aspect of computer vision and further facilitated cultural heritage preservation. Automatic artwork analysis has been developing the art style
the content of the painting
or the oriented attributes analysis for art research. Our multi-task learning for automatic art analysis application is based on the historical
social and artistic information. The existing multi-task joint learning methods learn multiple tasks based on a labor cost and time consuming weighted sum of losses. Our method illustrates art classification and art retrieval tools for the application of Digital Art Museum
which is convenient for researchers to deeply understand the connotation of art and further harness traditional cultural heritage research.
Method
2
A multiple objectives learning method is based on Bayesian theory. In terms of Bayesian analyzed results
we use the correlation between each task and introduce task cluster (clustering) to constrain the model. Then
we formulate a multi-task loss function via maximizing the Gaussian possibility derived of homoscedastic uncertainty via task-dependent uncertainty in Bayesian modeling.
Result
2
In order to slice into art classification and art retrieval missions
we identify the SemArt dataset
a recent multi-modal benchmark for understanding the semantic essence of the art
which is designed to retrieve the art paginating cross different modal
and could be readily modified for the classification of art paginating. This dataset contains 21 384 art painting images
which is randomly split into training
validation and test sets based on 19 244
1 069 and 1 069 samples
respectively. First
we conduct art classification experiments on the SemArt dataset
and then evaluate the performance through classification accuracy
i.e.
the proportion of properly predicted paintings to the total amount of paintings in test procedure. The art classification results demonstrate that our model is qualified based on proposed adaptive multi-task learning technique while in the previous multi-task learning model
the weight of each task in fixed. For example
in "Timeframe" classification task
the improvement is about 4.43% with respect to the previous model. In order to calculate the task-specific weighting
the previous model barriers are limited to twice back forward tracing. The art classification results also validate the importance of introducing weighting constraints in our model. Next
we also evaluate our model on cross-modal art retrieval tasks. Experiments are conducted through Text2Art Challenge Evaluation where painting samples are sorted out based on their similarity to an oriented text
and vice versa. The calculated ranking results are evaluated by median rank and recall rate at
K
with
K
being 1
5 and 10 on the test dataset and performances. Median rank denotes the value separating the higher half of the relevant ranking position amount all samples
whereas recall at rate
K
represents the rate of samples for which its relevant image is in the top
K
positions of the ranking. Compared with the most recent knowledge-graph-based model in the context of author attribute
the improvement is about 9.91% in average which is consistent of classification results. Finally
we compare our model with manual evaluators. Following an artistic text
which contains comment
title
author
type
school and time schedule
participants are required to pick the most proper painting image out from a collection of 10 images. There are two distinct levels in this task as mentioned below: the collection of painting images are easy to random selected from the test set
and the difficulty is where the 10 collected images have the identical attribute category (i.e.
portraits
landscapes). All participants are required to conduct the task for 100 artistic texts in each level. The performance is reported as the proportion of clear feedbacks over all responses. Our demonstrated results also illustrate that our modeling accuracy is quite closer to human evaluators.
Conclusion
2
We harness an adaptive multi-task learning method to weight multiple loss functions based on Bayesian theory for automatic art analysis tasks. Furthermore
we conduct several experiments on the public available art dataset. The synthesized results on this dataset include both art classification and art retrieval challenges.
Bakker B and Heskes T. 2003. Task clustering and gating for Bayesian multitask learning. The Journal of Machine Learning Research, 4: 83-99 [DOI: 10.1162/153244304322765658]
Carneiro G, da Silva N P, Del Bue A and Paulo Costeira J. 2012. Artistic image classification: an analysis on the PRINTART database//Proceedings of the 12th European Conference on Computer Vision. Florence, Italy: Springer: 143-157 [ DOI: 10.1007/978-3-642-33765-9_11 http://dx.doi.org/10.1007/978-3-642-33765-9_11 ]
Chen Z, Badrinarayanan V, Lee C Y and Rabinovich A. 2018. GradNorm: gradient normalization for adaptive loss balancing in deep multitask networks//Proceedings of the 35th International Conference on Machine Learning. Stockholm, Sweden: PMLR: 794-803
Chu W T and Wu Y L. 2018. Image style classification based on learnt deep correlation features. IEEE Transactions on Multimedia, 20(9): 2491-2502 [DOI: 10.1109/TMM.2018.2801718]
Cipolla R, Gal Y and Kendall A. 2018. Multi-task learning using uncertainty to weigh losses for scene geometry and semantics//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 7482-7491 [ DOI: 10.1109/CVPR.2018.00781 http://dx.doi.org/10.1109/CVPR.2018.00781 ]
Crowley E and Zisserman A. 2014. The state of the art: object retrieval in paintings using discriminative regions//Proceedings of British Machine Vision Conference. Nottingham, UK: BMVA Press: 1-12 [ DOI: 10.5244/C.28.38 http://dx.doi.org/10.5244/C.28.38 ]
Evgeniou T and Pontil M. 2004. Regularized multi——task learning//Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Seattle, USA: ACM: 109-117 [ DOI: 10.1145/1014052.1014067 http://dx.doi.org/10.1145/1014052.1014067 ]
Garcia N and Vogiatzis G. 2018. How to read paintings: semantic art understanding with multi-modal retrieval//Proceedings of the European Conference on Computer Vision. Munich, Germany: Springer: 676-691 [ DOI: 10.1007/978-3-030-11012-3_52 http://dx.doi.org/10.1007/978-3-030-11012-3_52 ]
Garcia N, Renoust B and Nakashima Y. 2019. Context-aware embeddings for automatic art analysis//Proceedings of 2019 on International Conference on Multimedia Retrieval. Ottawa, Canada: ACM: 25-33[ DOI: 10.1145/3323873.3325028 http://dx.doi.org/10.1145/3323873.3325028 ]
He K M, Zhang X Y, Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 770-778 [ DOI: 10.1109/CVPR.2016.90 http://dx.doi.org/10.1109/CVPR.2016.90 ]
Johnson C R, Hendriks E, Berezhnoy I J, Brevdo E, Hughes S M, Daubechies I, Li J, Postma E and Wang J Z. 2008. Image processing for artist identification. IEEE Signal Processing Magazine, 25(4): 37-48 [DOI: 10.1109/MSP.2008.923513]
Karayev S. Trentacoste M, Han H, Agarwala A, Darrell T, Hertzmann A and Winnemoeller H. 2014. Recognizing image style//Proceedings of 2014 British Machine Vision Conference. Nottingham, UK: BMVA Press: 1-11 [ DOI: 10.5244/C.28.122 http://dx.doi.org/10.5244/C.28.122 ]
Khan F S, Beigpour S, Van de Weijer J and Felsberg M. 2014. Painting-91: a large scale database for computational painting categorization. Machine Vision and Applications, 25(6): 1385-1397 [DOI: 10.1007/s00138-014-0621-6]
Li L H. 2019. Artistic Conception Composition of Chinese Freehand Brushwork Oil Painting from the Perspective of Integration of Chinese and Western. Nanjing: Nanjing University of the Arts
李立红. 2019. 中西融合视域下中国写意油画的意境构成. 南京: 南京艺术学院
Mao H, Cheung M and She J. 2017. DeepArt: learning joint representations of visual arts//Proceedings of the 25th ACM international conference on Multimedia. Mountain View, USA: ACM: 1183-1191 [ DOI: 10.1145/3123266.3123405 http://dx.doi.org/10.1145/3123266.3123405 ]
Sanakoyeu A, Kotovenko D, Lang S and Ommer B. 2018. A style-aware content loss for real-time HD style transfer//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 715-731 [ DOI: 10.1007/978-3-030-01237-3_43 http://dx.doi.org/10.1007/978-3-030-01237-3_43 ]
Sheng J C and Li Y Z. 2018. Learning artistic objects for improved classification of Chinese paintings. Journal of Image and Graphics, 23(8): 1193-1206
盛家川, 李玉芝. 2018. 国画的艺术目标分割及深度学习与分类. 中国图象图形学报, 23(8): 1193-1206 [DOI: 10.11834/jig.170545]
Simonyan K and Zisserman A. 2015. Very deep convolutional networks for large-scale image recognition//Proceedings of the 3rd International Conference on Learning Representations. San Diego, USA: Elsevier: 1-14
Yang X Q and Zhang H X. 2020. Art image classification with double kernel squeeze-and-excitation neural network. Journal of Image and Graphics, 25(5): 967-976
杨秀芹, 张华熊. 2020. 双核压缩激活神经网络艺术图像分类. 中国图象图形学报, 25(5): 967-976 [DOI: 10.11834/jig.190245]
Zhang Y, Liu J W and Zuo X. 2020. Survey of multi-task learning. Chinese Journal of Computers, 43(7): 1340-1378
张钰, 刘建伟, 左信. 2020. 多任务学习. 计算机学报, 43(7): 1340-1378 [DOI: 10.11897/SP.J.1016.2020.01340]
相关文章
相关作者
相关机构
京公网安备11010802024621