多损失融合网络的中国书法字体与风格分类

程文炎; 周勇; 陶承英; 刘丽; 李志刚; 邱桃荣

发布时间： 2023-08-17
摘要点击次数： 796
全文下载次数： 440
DOI: 10.11834/jig.220252
2023 | Volume 28 | Number 8

多损失融合网络的中国书法字体与风格分类

程文炎¹, 周勇¹, 陶承英², 刘丽¹, 李志刚¹, 邱桃荣¹(1.南昌大学数学与计算机学院, 南昌 330031;2.北京师范大学实验小学, 北京 100875)

摘要

目的中国书法博大精深，是中国文化很重要的组成部分。书法字体与风格分类是书法领域的研究热点。目前书法字体和书法风格两个概念混淆，并且书法风格分类准确率不高，针对上述问题，本文将两个概念进行区分，并提出了一个融合多损失的孪生卷积神经网络，能同时解决中国书法字体以及风格分类问题。方法提出的网络包含两个共享权重的分支，每个分支用于提取输入图像的特征。为了获得不同尺度下的特征表示，将Haar小波分解嵌入到每个网络分支中。与传统孪生神经网络不同的是，将网络的每个分支扩展为一个分类网络。网络训练时融合了两类不同的损失，即对比损失和分类损失，进而从两个角度同时对网络训练进行监督。具体来说，为了使来自同一类的两幅输入图像特征之间的距离尽可能小、使来自不同类的两幅输入图像特征之间的距离尽可能大，网络采用对比损失作为损失函数。此外，为了充分利用每幅输入图像的类别信息，在网络每个分支上采用交叉熵作为分类损失。结果实验结果表明，本文方法在两个中国书法字体数据集和两个中国书法风格数据集上的分类准确率分别达到了99.90%、94.09%、99.38%和93.28%，高于对比方法。两种损失起到了良好的互补作用，Haar小波分解的引入在4个数据集上均提升了分类准确率，在风格数据集的提升效果更为明显。结论本文方法在中国书法字体以及风格分类两个任务中取得了令人满意的效果，为书法领域研究工作提供了新思路。

关键词

中国书法风格分类字体分类多损失融合孪生卷积神经网络对比损失交叉熵损失

Multi-loss siamese convolutional neural network for Chinese calligraphy font and style classification

Cheng Wenyan¹, Zhou Yong¹, Tao Chengying², Liu Li¹, Li Zhigang¹, Qiu Taorong¹(1.School of Mathematics and Computer Sciences, Nanchang University, Nanchang 330031, China;2.Experimental Primary School, Beijing Normal University, Beijing 100875, China)

Abstract

Objective Chinese calligraphy can be seen as one of the symbolized icons in Chinese culture. Nowadays， Machine learning and pattern recognition techniques-derived calligraphy artworks are required for digitalization and preservation intensively. Our research is mainly focused on Chinese calligraphy classification in related to such font and style classification. However，the difference between calligraphy font and calligraphy style is often distorted. To resolve the problem of style classification，first，we distinguish the difference between font and style. Then，we illustrate a novel multiloss siamese convolutional neural network to cope with the two mentioned problems simultaneously. Method The difference can be summarized between calligraphy font and style as follows：Calligraphy font refers to a broad taxonomy of scripts. For example，the popular Chinese calligraphy fonts are composed of standard，seal，clerical，cursive and semi-cursive fonts. Calligraphy style is closely related to the calligraphers to a certain extent. Each calligrapher has its own unique style. Compared to calligraphy font classification，calligraphy style classification is more challenging due to the subtle difference among different styles. Current researches are dedicated to Chinese calligraphy font classification. Yet，there are only a few literatures are concerned with Chinese style classification. Our network is proposed and composed of two weightsshared streams. Each stream of the network can be used to extract features from the input image using convolutional neural network（CNN）. In detail，the CNN has involved of five convolutional layers，while each layer is followed via a max pooling layer. Batch normalization is used to speed up training as well. The ReLU is used as the activation function. Afterwards，the global average pooling is used to aggregate the feature maps into a compactable feature vector. To get a multiresolution representation of the image，the Haar wavelet decomposition is embedded into each stream. To optimize traditional siamese network，each stream of the proposed siamese network is extended as a classification network. In this way， image-related features extraction is then fed to a fully-connected layer for classification. The cross-entropy loss is employed for each stream. So，the supervised information of each individual image can be fully exploited. The contrastive loss can be used for feature constraints：1）features-between distance of the two input images from the same category will be reduced； 2）features-between distance of the two input images from different categories will be enlarged. Overall，the proposed network is trained to optimize the two types of loss jointly：contrastive loss and cross-entropy loss. A weight parameter is used to balance the contribution of the two parts as well. Result We carried out extensive experiments to validate the effectiveness of the proposed network. Since there are no public datasets for Chinese calligraphy font and style classification，we have collected four sort of datasets in related to viz. CNCalliFont，CNCalliNoisyFont，CNCalliStyle and CNCalliNoisyStyle. The CNCalliFont dataset is composed of 30 000 images and melted into five different fonts，called clerical，cursive， seal，semi-cursive and standard fonts for each. Each font has final 6 000 images. The CNCalliNoisyFont dataset can be used to extend CNCalliFont dataset in terms of the added Gaussian noise. The CNCalliStyle dataset consists of 12 000 images，which represent four styles from four popular Chinese ancient calligraphers in related to viz. ，Ouyang Xun，Yan Zhenqing，Zhao Mengfu and Liu Gongquan. Therefore，each style of that has 3 000 images more. All the images are related to grayscale and kept in JPEG format. Likewise，the CNCalliNoisyStyle dataset can be focused on CNCalliStyle dataset extension via adding Gaussian noise. Each dataset is split into training set，validation set and test set further with a ratio of 6∶2∶2. The training set is used to learn the parameters in the proposed network. Different configuration of the hyper-parameters is compared on the validation set with the best configuration selected，and it is applied to the test set. Ten sort of random splits and the average classification accuracy are melted into as the evaluation metric. The four datasetsrelated experiments demonstrate that the performance is increased by embedding the Haar wavelet decomposition in each stream of the network. It is mutual-benefited on CNCalliStyle and CNCalliNoisyStyle datasets，indicating that the subtle difference among different styles can be captured better in terms of the Haar wavelet decomposition. The performance of the proposed network is compared with them when the cross-entropy loss is employed only. The result shows that the performance is decreased when the cross-entropy loss is employed only. So，the two types of loss can be mutual-benefited as well. Moreover，we compare the proposed network with such popular methods in the context of manual feature-based，CNNbased and vision transformer-based. For the handcrafted feature-based methods，features like local binary pattern（LBP）， Gabor and histogram of oriented gradient（HOG）are first employed，and the suppert vector machine（SVM）classifier is then used. For CNN-based methods，recent four sorts of methods for Chinese calligraphy font and style classification are listed. Additionally，we compare our proposed network with four sort of popular CNNs，which involves AlexNet，Visual Geometry Group（VGG-16），residual neural network（ResNet-50）and Xception. The performances of all the methods are still decreased on the CNCalliStyle dataset. Specifically，a very sharp decrease is observed for the feature-handcrafted methods，the four popular CNNs，as well as the vision transformer-based methods. It indicates that these methods cannot capture the subtle difference among different styles. Each of the accuracy can be reached to 99. 90%，94. 09%，99. 38% and 93. 28% on the four datasets. Conclusion The proposed multi-loss siamese CNN can be dealt with the two tasks in relevance to viz. Chinese calligraphy font and calligraphy style classification simultaneously. Two sorts of task can be jointly optimized based on two types of loss as well.

Keywords

Chinese calligraphy style classification font classification multi-loss siamese convolutional neural network contrastive loss cross-entropy loss

在线采编平台

在线出版

年度会议

下载中心

年度信息