多损失融合网络的中国书法字体与风格分类

程文炎; 周勇; 陶承英; 刘丽; 李志刚; 邱桃荣

doi:10.11834/jig.220252

文档图像智能处理与识别 | 浏览量 : 0 下载量: 1 CSCD: 0

PDF
导出
分享
收藏
专辑

多损失融合网络的中国书法字体与风格分类
Multi-loss siamese convolutional neural network for Chinese calligraphy font and style classification
2023年28卷第8期页码：2370-2381
纸质出版日期： 2023-08-16 ，
DOI： 10.11834/jig.220252
稿件说明：

移动端阅览

程文炎，周勇，陶承英，刘丽，李志刚，邱桃荣. 2023. 多损失融合网络的中国书法字体与风格分类. 中国图象图形学报， 28(08):2370-2381

Cheng Wenyan， Zhou Yong， Tao Chengying， Liu Li， Li Zhigang， Qiu Taorong. 2023. Multi-loss siamese convolutional neural network for Chinese calligraphy font and style classification. Journal of Image and Graphics， 28(08):2370-2381
程文炎，周勇，陶承英，刘丽，李志刚，邱桃荣. 2023. 多损失融合网络的中国书法字体与风格分类. 中国图象图形学报， 28(08):2370-2381 DOI： 10.11834/jig.220252.

Cheng Wenyan， Zhou Yong， Tao Chengying， Liu Li， Li Zhigang， Qiu Taorong. 2023. Multi-loss siamese convolutional neural network for Chinese calligraphy font and style classification. Journal of Image and Graphics， 28(08):2370-2381 DOI： 10.11834/jig.220252.

摘要

目的

中国书法博大精深，是中国文化很重要的组成部分。书法字体与风格分类是书法领域的研究热点。目前书法字体和书法风格两个概念混淆，并且书法风格分类准确率不高，针对上述问题，本文将两个概念进行区分，并提出了一个融合多损失的孪生卷积神经网络，能同时解决中国书法字体以及风格分类问题。

方法

提出的网络包含两个共享权重的分支，每个分支用于提取输入图像的特征。为了获得不同尺度下的特征表示，将Haar小波分解嵌入到每个网络分支中。与传统孪生神经网络不同的是，将网络的每个分支扩展为一个分类网络。网络训练时融合了两类不同的损失，即对比损失和分类损失，进而从两个角度同时对网络训练进行监督。具体来说，为了使来自同一类的两幅输入图像特征之间的距离尽可能小、使来自不同类的两幅输入图像特征之间的距离尽可能大，网络采用对比损失作为损失函数。此外，为了充分利用每幅输入图像的类别信息，在网络每个分支上采用交叉熵作为分类损失。

结果

实验结果表明，本文方法在两个中国书法字体数据集和两个中国书法风格数据集上的分类准确率分别达到了99.90%、94.09% 、99.38%和93.28%，高于对比方法。两种损失起到了良好的互补作用，Haar小波分解的引入在4个数据集上均提升了分类准确率，在风格数据集的提升效果更为明显。

结论

本文方法在中国书法字体以及风格分类两个任务中取得了令人满意的效果，为书法领域研究工作提供了新思路。

Abstract

Objective

Chinese calligraphy can be seen as one of the symbolized icons in Chinese culture. Nowadays， Machine learning and pattern recognition techniques-derived calligraphy artworks are required for digitalization and preservation intensively. Our research is mainly focused on Chinese calligraphy classification in related to such font and style classification. However， the difference between calligraphy font and calligraphy style is often distorted. To resolve the problem of style classification， first， we distinguish the difference between font and style. Then， we illustrate a novel multi-loss siamese convolutional neural network to cope with the two mentioned problems simultaneously.

Method

The difference can be summarized between calligraphy font and style as follows： Calligraphy font refers to a broad taxonomy of scripts. For example， the popular Chinese calligraphy fonts are composed of standard， seal， clerical， cursive and semi-cursive fonts. Calligraphy style is closely related to the calligraphers to a certain extent. Each calligrapher has its own unique style. Compared to calligraphy font classification， calligraphy style classification is more challenging due to the subtle difference among different styles. Current researches are dedicated to Chinese calligraphy font classification. Yet， there are only a few literatures are concerned with Chinese style classification. Our network is proposed and composed of two weights-shared streams. Each stream of the network can be used to extract features from the input image using convolutional neural network （CNN）. In detail， the CNN has involved of five convolutional layers， while each layer is followed via a max pooling layer. Batch normalization is used to speed up training as well. The ReLU is used as the activation function. Afterwards， the global average pooling is used to aggregate the feature maps into a compactable feature vector. To get a multi-resolution representation of the image， the Haar wavelet decomposition is embedded into each stream. To optimize traditional siamese network， each stream of the proposed siamese network is extended as a classification network. In this way， image-related features extraction is then fed to a fully-connected layer for classification. The cross-entropy loss is employed for each stream. So， the supervised information of each individual image can be fully exploited. The contrastive loss can be used for feature constraints： 1） features-between distance of the two input images from the same category will be reduced； 2） features-between distance of the two input images from different categories will be enlarged. Overall， the proposed network is trained to optimize the two types of loss jointly： contrastive loss and cross-entropy loss. A weight parameter is used to balance the contribution of the two parts as well.

Result

We carried out extensive experiments to validate the effectiveness of the proposed network. Since there are no public datasets for Chinese calligraphy font and style classification， we have collected four sort of datasets in related to viz. CNCalliFont， CNCalliNoisyFont， CNCalliStyle and CNCalliNoisyStyle. The CNCalliFont dataset is composed of 30 000 images and melted into five different fonts， called clerical， cursive， seal， semi-cursive and standard fonts for each. Each font has final 6 000 images. The CNCalliNoisyFont dataset can be used to extend CNCalliFont dataset in terms of the added Gaussian noise. The CNCalliStyle dataset consists of 12 000 images， which represent four styles from four popular Chinese ancient calligraphers in related to viz.， Ouyang Xun， Yan Zhenqing， Zhao Mengfu and Liu Gongquan. Therefore， each style of that has 3 000 images more. All the images are related to grayscale and kept in JPEG format. Likewise， the CNCalliNoisyStyle dataset can be focused on CNCalliStyle dataset extension via adding Gaussian noise. Each dataset is split into training set， validation set and test set further with a ratio of 6∶2∶2. The training set is used to learn the parameters in the proposed network. Different configuration of the hyper-parameters is compared on the validation set with the best configuration selected， and it is applied to the test set. Ten sort of random splits and the average classification accuracy are melted into as the evaluation metric. The four datasets-related experiments demonstrate that the performance is increased by embedding the Haar wavelet decomposition in each stream of the network. It is mutual-benefited on CNCalliStyle and CNCalliNoisyStyle datasets， indicating that the subtle difference among different styles can be captured better in terms of the Haar wavelet decomposition. The performance of the proposed network is compared with them when the cross-entropy loss is employed only. The result shows that the performance is decreased when the cross-entropy loss is employed only. So， the two types of loss can be mutual-benefited as well. Moreover， we compare the proposed network with such popular methods in the context of manual feature-based， CNN-based and vision transformer-based. For the handcrafted feature-based methods， features like local binary pattern （LBP）， Gabor and histogram of oriented gradient （HOG） are first employed， and the suppert vector machine （SVM） classifier is then used. For CNN-based methods， recent four sorts of methods for Chinese calligraphy font and style classification are listed. Additionally， we compare our proposed network with four sort of popular CNNs， which involves AlexNet， Visual Geometry Group （VGG-16）， residual neural network （ResNet-50） and Xception. The performances of all the methods are still decreased on the CNCalliStyle dataset. Specifically， a very sharp decrease is observed for the feature-handcrafted methods， the four popular CNNs， as well as the vision transformer-based methods. It indicates that these methods cannot capture the subtle difference among different styles. Each of the accuracy can be reached to 99.90%， 94.09%， 99.38% and 93.28% on the four datasets.

Conclusion

The proposed multi-loss siamese CNN can be dealt with the two tasks in relevance to viz. Chinese calligraphy font and calligraphy style classification simultaneously. Two sorts of task can be jointly optimized based on two types of loss as well.

关键词

中国书法风格分类字体分类多损失融合孪生卷积神经网络对比损失交叉熵损失

Keywords

Chinese calligraphystyle classificationfont classificationmulti-loss siamese convolutional neural networkcontrastive losscross-entropy loss

references

Cai Z D， Ying N， Guo C S， Guo R and Yang P. 2021. Research on multiperson pose estimation combined with YOLOv3 pruning model. Journal of Image and Graphics， 26（4）： 837-846

蔡哲栋，应娜，郭春生，郭锐，杨鹏. 2021. YOLOv3剪枝模型的多人姿态估计. 中国图象图形学报， 26（4）： 837-846 ［DOI： 10.11834/jig.200138http://dx.doi.org/10.11834/jig.200138］

Cao L C， Li H Q， Zhang Y J， Zhang L and Xu L. 2020. Hierarchical method for cataract grading based on retinal images using improved Haar wavelet. Information Fusion， 53： 196-208 ［DOI： 10.1016/j.inffus.2019.06.022http://dx.doi.org/10.1016/j.inffus.2019.06.022］

Chen C F R， Fan Q F and Panda R. 2021. CrossViT： cross-attention multi-scale vision transformer for image classiﬁcation//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal， Canada： IEEE： 347-356 ［DOI： 10.1109/ICCV48922.2021.00041http://dx.doi.org/10.1109/ICCV48922.2021.00041］

Chollet F. 2017. Xception： deep learning with depthwise separable convolutions//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu， USA： IEEE： 1800-1807 ［DOI： 10.1109/CVPR.2017.195http://dx.doi.org/10.1109/CVPR.2017.195］

Dai F R， Tang C W and Lyu J C. 2018. Classification of calligraphy style based on convolutional neural network//Proceedings of the 25th International Conference on Neural Information Processing. Siem Reap， Cambodia： Springer： 359-370 ［DOI： 10.1007/978-3-030-04212-7_31http://dx.doi.org/10.1007/978-3-030-04212-7_31］

Ding X Q， Chen L and Wu T. 2007. Character independent font recognition on a single Chinese character. IEEE Transactions on Pattern Analysis and Machine Intelligence， 29（2）： 195-204 ［DOI： 10.1109/TPAMI.2007.26http://dx.doi.org/10.1109/TPAMI.2007.26］

Dosovitskiy A， Beyer L， Kolesnikov A， Weissenborn D， Zhai X H， Unterthiner T， Dehghani M， Minderer M， Heigold G， Gelly S， Uszkoreit J and Houlsby N. 2021. An image is worth 16 × 16 words： transformers for image recognition at scale ［EB/OL］. ［2021-03-22］. https://arxiv.org/pdf/2010.11929.pdfhttps://arxiv.org/pdf/2010.11929.pdf

Dubey S R. 2022. A decade survey of content based image retrieval using deep learning. IEEE Transactions on Circuits and Systems for Video Technology， 32（5）： 2687-2704 ［DOI： 10.1109/TCSVT.2021.3080920http://dx.doi.org/10.1109/TCSVT.2021.3080920］

He K M， Zhang X Y， Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas， USA： IEEE： 770-778 ［DOI： 10.1109/CVPR.2016.90http://dx.doi.org/10.1109/CVPR.2016.90］

Hou Q B， Zhou D Q and Feng J S. 2021. Coordinate attention for efficient mobile network design//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville， USA： IEEE： 13708-13717 ［DOI： 10.1109/CVPR46437.2021.01350http://dx.doi.org/10.1109/CVPR46437.2021.01350］

Huang S P， Zhong Z Y， Jin L W， Zhang S Y and Wang H B. 2018. DropRegion training of inception font network for high-performance Chinese font recognition. Pattern Recognition， 77： 395-411 ［DOI： 10.1016/j.patcog.2017.10.018http://dx.doi.org/10.1016/j.patcog.2017.10.018］

Khatami A， Nazari A， Beheshti A， Nguyen T T， Nahavandi S and Zieba J. 2020. Convolutional neural network for medical image classification using wavelet features//Proceedings of 2020 International Joint Conference on Neural Networks. Glasgow， UK： IEEE： 1-8 ［DOI： 10.1109/IJCNN48605.2020.9206791http://dx.doi.org/10.1109/IJCNN48605.2020.9206791］

Krizhevsky A， Sutskever I and Hinton G E. 2012. ImageNet classification with deep convolutional neural networks//Proceedings of the 25th International Conference on Neural Information Processing Systems. Lake Tahoe， Nevada： Curran Associates Inc.： 1097-1105

Liu L， Lao S Y， Fieguth P W， Guo Y L， Wang X G and Pietikäinen M. 2016. Median robust extended local binary pattern for texture classification. IEEE Transactions on Image Processing， 25（3）： 1368-1381 ［DOI： 10.1109/TIP.2016.2522378http://dx.doi.org/10.1109/TIP.2016.2522378］

Liu L， Cheng W Y， Qiu T R， Tao C Y， Chen Q， Lu Y and Suen C Y. 2021. Multi-loss Siamese convolutional neural network for Chinese calligraphy style classification//Proceedings of the 28th International Conference on Neural Information Processing. Sanur， Indonesia： Springer： 425-432 ［DOI： 10.1007/978-3-030-92310-5_49http://dx.doi.org/10.1007/978-3-030-92310-5_49］

Liu W， Yan Q and Zhao Y Z. 2020. Densely self-guided wavelet network for image denoising//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Seattle， USA： IEEE： 1742-1750 ［DOI： 10.1109/CVPRW50498.2020.00224http://dx.doi.org/10.1109/CVPRW50498.2020.00224］

Omid-Zohoor A， Young C， Ta D and Murmann B. 2018. Toward always-on mobile object detection： energy versus performance tradeoffs for embedded HOG feature extraction. IEEE Transactions on Circuits and Systems for Video Technology， 28（5）： 1102-1115 ［DOI： 10.1109/TCSVT.2017.2653187http://dx.doi.org/10.1109/TCSVT.2017.2653187］

Simonyan K and Zisserman A. 2015. Very deep convolutional networks for large-scale image recognition ［EB/OL］. ［2021-03-22］. https://arxiv.org/pdf/1409.1556.pdfhttps://arxiv.org/pdf/1409.1556.pdf

Song W K， Lian Z H， Tang Y M and Xiao J G. 2015. Content-independent font recognition on a single Chinese character using sparse representation//Proceedings of the 13th International Conference on Document Analysis and Recognition. Tunis， Tunisia： IEEE： 376-380 ［DOI： 10.1109/ICDAR.2015.7333787http://dx.doi.org/10.1109/ICDAR.2015.7333787］

Sun X L， Han G L， Guo L H， Yang H， Wu X T and Li Q Q. 2022. Two-stage aware attentional Siamese network for visual tracking. Pattern Recognition， 124： #108502 ［DOI： 10.1016/j.patcog.2021.108502http://dx.doi.org/10.1016/j.patcog.2021.108502］

Sun Z S， Li J， Liu P， Cao W J， Yu T and Gu X F. 2021. SAR image classification using greedy hierarchical learning with unsupervised stacked CAEs. IEEE Transactions on Geoscience and Remote Sensing， 59（7）： 5721-5739 ［DOI： 10.1109/TGRS.2020.302319http://dx.doi.org/10.1109/TGRS.2020.302319］

Tao D P， Lin X， Jin L W and Li X L. 2016. Principal component 2-D long short-term memory for font recognition on single Chinese characters. IEEE Transactions on Cybernetics， 46（3）： 756-765 ［DOI： 10.1109/TCYB.2015.2414920http://dx.doi.org/10.1109/TCYB.2015.2414920］

Wang D G， Rao W Q， Sun X， Qu Y， Liu X M and Gao L R. 2021. Siamese network with pixel-pair for hyperspectral image anomaly detection. Journal of Image and Graphics， 26（8）： 1860-1870

王德港，饶伟强，孙旭，渠瀛，刘雪梅，高连如. 2021. 结合孪生网络和像素配对的高光谱图像异常检测. 中国图象图形学报， 26（8）： 1860-1870 ［DOI： 10.11834/jig.210073http://dx.doi.org/10.11834/jig.210073］

Woo S， Park J， Lee J Y and Kweon I S. 2018. CBAM： convolutional block attention module//Proceedings of the 15th European Conference on Computer Vision. Munich， Germany： Springer： 3-19 ［DOI： 10.1007/978-3-030-01234-2_1http://dx.doi.org/10.1007/978-3-030-01234-2_1］

Xia M， Cao G， Wang G Y and Shang Y F. 2017. Remote sensing image classification based on deep learning and conditional random fields. Journal of Image and Graphics， 22（9）： 1289-1301

夏梦，曹国，汪光亚，尚岩峰. 2017. 结合深度学习与条件随机场的遥感图像分类. 中国图象图形学报， 22（9）： 1289-1301 ［DOI： 10.11834/jig.170122http://dx.doi.org/10.11834/jig.170122］

Zhang J L， Guo M T and Fan J P. 2019. A novel CNN structure for fine-grained classification of Chinese calligraphy styles. International Journal on Document Analysis and Recognition， 22（2）： 177-188 ［DOI： 10.1007/s10032-019-00324-1http://dx.doi.org/10.1007/s10032-019-00324-1］

Zhang J L， Yu W H， Wang Z X， Li J H and Pan Z G. 2021. Attention-enhanced CNN for Chinese calligraphy style classification//Proceedings of the 7th IEEE International Conference on Virtual Reality. Foshan， China： IEEE： 352-358 ［DOI： 10.1109/ICVR51878.2021.9483820http://dx.doi.org/10.1109/ICVR51878.2021.9483820］

Zhang S Y， Jin L W， Tao D P and Yang Z. 2013. A faster method for Chinese font recognition based on Harris corner//Proceedings of 2013 IEEE International Conference on Systems， Man， and Cybernetics. Manchester， UK： IEEE： 4271-4275 ［DOI： 10.1109/SMC.2013.728http://dx.doi.org/10.1109/SMC.2013.728］

文章被引用时，请邮件提醒。

提交

暂无数据