结合部首字形和层级结构的手写汉字纠错方法

李云青; 杜俊; 胡鹏飞; 张建树

doi:10.11834/jig.220906

文档图像智能处理与识别 | 浏览量 : 0 下载量: 1 CSCD: 0

PDF
导出
分享
收藏
专辑

结合部首字形和层级结构的手写汉字纠错方法
A method of radical form and hierarchical structure based handwritten Chinese character error correction
2023年28卷第8期页码：2382-2395
纸质出版日期： 2023-08-16 ，
DOI： 10.11834/jig.220906
稿件说明：

移动端阅览

李云青，杜俊，胡鹏飞，张建树. 2023. 结合部首字形和层级结构的手写汉字纠错方法. 中国图象图形学报， 28(08):2382-2395

Li Yunqing， Du Jun， Hu Pengfei， Zhang Jianshu. 2023. A method of radical form and hierarchical structure based handwritten Chinese character error correction. Journal of Image and Graphics， 28(08):2382-2395
李云青，杜俊，胡鹏飞，张建树. 2023. 结合部首字形和层级结构的手写汉字纠错方法. 中国图象图形学报， 28(08):2382-2395 DOI： 10.11834/jig.220906.

Li Yunqing， Du Jun， Hu Pengfei， Zhang Jianshu. 2023. A method of radical form and hierarchical structure based handwritten Chinese character error correction. Journal of Image and Graphics， 28(08):2382-2395 DOI： 10.11834/jig.220906.

摘要

目的

手写汉字纠错（handwritten Chinese character error correction，HCCEC）任务具有两重性，即判断汉字正确性和对错字进行纠正，该任务在教育场景下应用广泛，可以帮助学生学习汉字、纠正书写错误。由于手写汉字具有复杂的空间结构、多样的书写风格以及巨大的数量，且错字与正确字之间具有高度的相似性，因此，手写汉字纠错的关键是如何精确地建模一个汉字。为此，提出一种层级部首网络（hierarchical radical network，HRN）。

方法

从部首字形的角度出发，挖掘部首形状结构上的相似性，通过注意力模块捕获包含部首信息的细粒度图像特征，增大相似字之间的区分性。另外，结合汉字本身的层级结构特性，采用基于概率解码的思路，对部首的层级位置进行建模。

结果

在手写汉字数据集上进行实验，与现有方案相比，HRN在正确字测试集与错字测试集上，精确率分别提升了0.5%和9.8%，修正率在错字测试集上提升了15.3%。此外，通过注意力机制的可视化分析，验证了HRN可以捕捉包含部首信息的细粒度图像特征。部首表征之间的欧氏距离证明了HRN学习到的部首表征向量中包含了部首的字形结构信息。

结论

本文提出的HRN能够更好地对相似部首进行区分，进而精确地区分正确字与错字，具有很强的鲁棒性和泛化性。

Abstract

Objective

Handwritten Chinese character error correction （HCCEC） is developed to handle the complex hierarchical structure， multiple writing styles， and large-scale character vocabulary of Chinese characters recently. The HCCEC is focused on two aspects for assessment and correction. The assessment can be used to determine whether a given handwritten isolated character is correct or not. The correction can be used to locate and correct specific character-misspelled errors. However， HCCEC has its unique chateristics beyond handwritten Chinese character recognition （HCCR） on three aspects as mentioned below： first， such categories of misspelled characters are endless to deal with more inquality Chinese characters， which puts a high demand on the generalization ability of the model. We assume that the training samples are right characters， in which both right characters and misspelled ones are involved in test set. The transfer learning ability of the model is still challenged to handle unclear misspelled characters. Therefore， HCCEC is melted into a generalized zero-shot learning （GZSL） problem further. Compared to zero-shot learning， GZSL-related test set contains seen and unseen classes， which makes it more realistic and challenging. Simutaneously， characters-misspelled misclassification is to be optimized as the right ones when testing. Second， misspelled characters could be quite similar to the right ones. It requires the ability of the model to capture fine-grained features. Third， to optimize HCCR， HCCEC-relevant verification is oriented to link corresponding right characters with misspelled characters.

Method

Radical-between similarities is developed in terms of radical shape and structure， and a hierarchical radical network （HRN） is melted into. For the analysis of Chinese characters， the key issue is to extract radical and structural information. For similar radicals， their distance in the representation space should be close. The completed radical information is beneficial for similar characters-between clarification， which is crucial for resolving the HCCEC task to some extent. Structure refers to the two-dimensional spatial contexts of the entire character. The hierarchical decomposition modeling of Chinese characters is also required for dealing with the problem of hierarchical structure of Chinese characters. The attention mechanism is implemented to capture fine-grained image features for similar character-between clarification. Specifically， the HRN is proposed in relevance to a convolutional neural network-based encoder and two attention modules. To obtain the representation of radicals， all radicals in the dictionary are fed into the embedding layer in the input stage. Through the first attention module， attention weights are calculated， which is used to obtain scores on the existence of radicals. After that， the radical attention module is used to balance the weight of each radical in different Chinese characters. Finally， the hierarchical-related embedding can be used to get the probability of each character.

Result

Experiments are carried out on the basis of the in-house handwritten Chinese character dataset. It contains 401 400 handwritten samples for 7 000 common characters and 570 misspelled characters. It also consists of corresponding character-level and radical-level labels. Three sorts of metrics are introduced to evaluate the quality of models. The first one is the F1 score， a measure of pre-judgement ability. The second one is accuracy， a fine measure of classification ability. The last one is correction rate， which aims to measure the error correction ability of models. Each HRN is optimized by 0.5% and 9.8% for the right character test set and the misspelled character test set. And， the correction rate is improved by 15.3% on the misspelled character test set. For ablation experiments， we verify the effectiveness of the attention modules and hierarchical embedding for each. At the same time， we also conduct experiments on the dataset Chinese text in the wild （CTW）， which has occupied 1 million street view images approximately. The accuracy is improved by 0.5% as well. Due to the diversity and complexity of CTW， it has potential robustness and feasibility of the HRN. Qualitative results show that the attention module can capture the corresponding positions of each radical to a certain extent.

Conclusion

we develop a radical shape based hierarchical radical network. It can be used to learn the representation of each radical through the attention mechanism， and fine-grained features can be captured more precisely. Similar radicals can be better sorted out， and handwritten characters-related errors can be detected more easily. Our proposed model is still challenged for sufficient and effective training samples. Future research direction can be probably focused on the extension to text lines beyond isolated characters.

关键词

手写汉字纠错（HCCEC）汉字识别部首分析广义零样本学习（GZSL）注意力机制卷积神经网络（CNN）

Keywords

handwritten Chinese character error correction （HCCEC）Chinese character recognitionradical analysisgeneralized zero-shot learning （GZSL）attention mechanismconvolutional neural network （CNN）

references

Cao Z， Lu J， Cui S and Zhang C S. 2020. Zero-shot handwritten Chinese character recognition with hierarchical decomposition embedding. Pattern Recognition， 107： #107488 ［DOI： 10.1016/j.patcog.2020.107488http://dx.doi.org/10.1016/j.patcog.2020.107488］

Chang S K. 1973. An interactive system for Chinese character generation and retrieval. IEEE Transactions on Systems， Man， and Cybernetics， SMC-3（3）： 257-265 ［DOI： 10.1109/TSMC.1973.4309214http://dx.doi.org/10.1109/TSMC.1973.4309214］

Chao W L， Changpinyo S， Gong B Q and Sha F. 2016. An empirical study and analysis of generalized zero-shot learning for object recognition in the wild//Proceedings of the 14th European Conference on Computer Vision. Amsterdam， the Netherlands： Springer： 52-68 ［DOI： 10.1007/978-3-319-46475-6_4http://dx.doi.org/10.1007/978-3-319-46475-6_4］

Chen J， Li B and Xue X. 2021. Zero-Shot Chinese Character Recognition with Stroke-Level Decomposition. Proceedings of the 30th International Joint Conference on Artificial Intelligence， 2021， 615-621

Chen Z， Huang Z and Li J. 2021. Entropy-based uncertainty calibration for generalized zeroshot learning//Australasian Database Conference. Dunedin， New Eealand：［s.n.］： 139-151

Cireşan D and Meier U. 2015. Multi-column deep neural networks for offline handwritten Chinese character classification//Proceedings of 2015 International Joint Conference on Neural Networks （IJCNN）. Killarney， Ireland： IEEE： 1-6 ［DOI： 10.1109/IJCNN.2015.7280516http://dx.doi.org/10.1109/IJCNN.2015.7280516］

Goodfellow I， Pouget-Abadie J and Mirza M. 2014. Generative adversarial nets//Proceedings of the 27th International Conference on Neural Information Processing Systems. Montreal， Canada：［s.n.］： 2672-2680

He K M， Zhang X Y and Ren S Q. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition， Las Vegas， USA： IEEE： 770-778 ［DOI： 10.1109/CVPR.2016.90http://dx.doi.org/10.1109/CVPR.2016.90］

Huang G， Liu Z and Laurens Van D M and Weinberger K Q. 2017. Densely connected convolutional networks//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition， Honolulu， USA： IEEE 4700-4708 ［DOI： 10.1109/CVPR.2017.243http://dx.doi.org/10.1109/CVPR.2017.243］

Huynh D and Elhamifar E. 2020. Fine-grained generalized zero-shot learning via dense attribute-based attention//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Seattle， USA： IEEE： 4482-4492 ［DOI： 10.1109/CVPR42600.2020.00454http://dx.doi.org/10.1109/CVPR42600.2020.00454］

Jiang W F and Liu J L. 2001. Recognition of a limited Chinese character set based on PCA learning subspace algorithm. Journal of Image and Graphics， 6（2）： 186-190

蒋伟峰，刘济林. 2001. 基于PCA学习子空间算法的有限汉字识别. 中国图象图形学报， 6（2）： 186-190 ［DOI： 10.11834/jig.20010246http://dx.doi.org/10.11834/jig.20010246］

Kingma D P and Welling M. 2013. Auto-encoding variational Bayes［EB/OL］. ［2022-08-15］. https：//arxiv.org/pdf/1312.6114.pdfhttps://arxiv.org/pdf/1312.6114.pdf

Krizhevsky A， Sutskever I and Hinton G E. 2017. ImageNet classification with deep convolutional neural networks. Communications of the ACM， 60（6）： 84-90

Li Y Q， Du J， Zhang J S and Wu C J. 2022. A tree-structure analysis network on handwritten Chinese character error correction. IEEE Transactions on Multimedia ［DOI： 10.1109/TMM.2022.3163517http://dx.doi.org/10.1109/TMM.2022.3163517］

Li Y Q， Zhu Y X， Du J， Wu C J and Zhang J S. 2020. Radical counter network for robust Chinese character recognition//Proceedings of the 25th International Conference on Pattern Recognition （ICPR）. Milan， Italy： IEEE： 4191-4197 ［DOI： 10.1109/ICPR48806.2021.9412918http://dx.doi.org/10.1109/ICPR48806.2021.9412918］

Luan S Z， Chen C， Zhang B C， Han J G and Liu J Z. 2018. Gabor convolutional networks. IEEE Transactions on Image Processing， 27（9）： 4357-4366 ［DOI： 10.1109/TIP.2018.2835143http://dx.doi.org/10.1109/TIP.2018.2835143］

Maaten L V D and Hinton G. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research， 9 （Nov）： 2579-2605

Schonfeld E， Ebrahimi S and Sinha S. 2019. Generalized zero-and few-shot learning via aligned variational autoencoders//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition， 8247-8255

Sylvain T， Petrini L and Hjelm D. 2019. Locality and compositionality in zero-shot learning ［EB/OL］. ［2022-08-15］. http://arxiv.org/pdf/1512.03012.pdfhttp://arxiv.org/pdf/1512.03012.pdf

Szegedy C， Liu W， Jia Y Q， Sermanet P， Reed S， Anguelov D， Erhan D， Vanhoucke V and Rabinovich A. 2015. Going deeper with convolutions//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition （CVPR）. Boston， USA： IEEE： 1-9 ［DOI： 10.1109/CVPR.2015.7298594http://dx.doi.org/10.1109/CVPR.2015.7298594］

Tang W， Su Y J， Li X， Zha D R， Wang W Y， Gao N and Xiang J. 2018. CNN-based Chinese character recognition with skeleton feature//Proceedings of the 25th International Conference on Neural Information Processing. Siem Reap， Cambodia： Springer： 461-472 ［DOI： 10.1007/978-3-030-04221-9_41http://dx.doi.org/10.1007/978-3-030-04221-9_41］

Verma V K， Arora G， Mishra A and Rai P. 2018. Generalized zero-shot learning via synthesized examples //Proceedings of 2018 IEEE Conference on Computer Vision and Pattern Recognition， Salt Lake City， USA： IEEE， 4281-4289 ［DOI： 10.1109/CVPR.2018.00450http://dx.doi.org/10.1109/CVPR.2018.00450］

Verma V K， Brahma D and Rai P. 2020. Meta-learning for generalized zero-shot learning. Proceedings of the AAAI Conference on Artificial Intelligence， 34（4）： 6062-6069 ［DOI： 10.1609/aaai.v34i04.6069http://dx.doi.org/10.1609/aaai.v34i04.6069］

Wang T， Xie Z and Li Z. 2019. Radical aggregation network for few-shot offline handwritten chinese character recognition. Pattern Recognition Letters， 125： 821-827

Wang T Q， Yin F and Liu C L. 2017. Radical-based Chinese character recognition via multi-labeled learning of deep residual networks//Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition （ICDAR）. Kyoto， Japan： IEEE： 579-584 ［DOI： 10.1109/ICDAR.2017.100http://dx.doi.org/10.1109/ICDAR.2017.100］

Wang W C， Zhang J S， Du J， Wang Z R AND Zhu Y X. 2018. Denseran for offline handwritten Chinese character recognition//Proceedings of the 16th International Conference on Frontiers in Handwriting Recognition （ICFHR）， Niagara Falls， USA： IEEE， 104-109 ［ DOI： 10.1109/ICFHR-2018.2018.00027http://dx.doi.org/10.1109/ICFHR-2018.2018.00027］

Xie G S， Liu L， Jin X B， Zhu F， Zhang Z， Qin J， Yao Y Z and Shao L. 2019. Attentive region embedding network for zero-shot learning//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Long Beach， USA： IEEE： 9376-9385 ［DOI： 10.1109/CVPR.2019.00961http://dx.doi.org/10.1109/CVPR.2019.00961］

Xian Y Q， Lorenz T， Schiele B and Akata. 2018. Feature generating networks for zero-shot learning//Proceedings of 2018 IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City， USA： IEEE： 5542-5551

Yan Z A， Yan C Z and Zhang C S. 2017. Rare Chinese character recognition by radical extraction network//Proceedings of 2017 IEEE International Conference on Systems， Man， and Cybernetics （SMC）. Banff， Canada： IEEE： 924-929 ［DOI： 10.1109/SMC.2017.8122728http://dx.doi.org/10.1109/SMC.2017.8122728］

Yang W X， Jin L W， Tao D C， Xie Z C and Feng Z Y. 2016. DropSample： a new training method to enhance deep convolutional neural networks for large-scale unconstrained handwritten Chinese character recognition. Pattern Recognition， 58： 190-203 ［DOI： 10.1016/j.patcog.2016.04.007http://dx.doi.org/10.1016/j.patcog.2016.04.007］

Yin F， Wang Q F， Zhang X Y and Liu C L. 2013. ICDAR 2013 Chinese handwriting recognition competition//Proceedings of the 12th International Conference on Document Analysis and Recognition. Washington， USA： IEEE： 1464-1470 ［DOI： 10.1109/ICDAR.2013.218http://dx.doi.org/10.1109/ICDAR.2013.218］

Yuan T L， Zhu Z， Xu K， Li C J， Mu T J and Hu S M. 2019. A large Chinese text dataset in the wild. Journal of Computer Science and Technology， 34（3）： 509-521 ［DOI： 10.1007/s11390-019-1923-yhttp://dx.doi.org/10.1007/s11390-019-1923-y］

Zhang J S， Zhu Y X， Du J and Dai L R. 2018. Radical analysis network for zero-shot learning in printed Chinese character recognition//Proceedings of 2018 IEEE International Conference on Multimedia and Expo （ICME）. San Diego， USA： IEEE： 1-6 ［DOI： 10.1109/ICME.2018.8486456http://dx.doi.org/10.1109/ICME.2018.8486456］

Zhang R， Ding X Q and Fang C. 2002. New method of optimal sampling features for offline handwritten Chinese character recognition. Journal of Image and Graphics， 7（2）： 176-180

张睿，丁晓青，方驰. 2002. 脱机手写汉字识别的最优采样特征新方法. 中国图象图形学报， 7（2）： 176-180 ［DOI： 10.11834/jig.20020241http://dx.doi.org/10.11834/jig.20020241］

Zhang X Y， Bengio Y and Liu C L. 2017. Online and offline handwritten Chinese character recognition： a comprehensive study and new benchmark. Pattern Recognition， 61： 348-360 ［DOI： 10.1016/j.patcog.2016.08.005http://dx.doi.org/10.1016/j.patcog.2016.08.005］

Zhao B， Sun X W， Yao Y and Wang Y Z. 2017. Zero-shot learning via shared-reconstruction-graph pursuit ［EB/OL］. ［2022-08-15］. http://arxiv.org/pdf/1711.07302.pdfhttp://arxiv.org/pdf/1711.07302.pdf

Zhong Z Y， Jin L W and Feng Z Y. 2015. Multi-font printed Chinese character recognition using multi-pooling convolutional neural network//Proceedings of the 13th International Conference on Document Analysis and Recognition （ICDAR）. Tunis， Tunisia： IEEE： 96-100 ［DOI： 10.1109/ICDAR.2015.7333733http://dx.doi.org/10.1109/ICDAR.2015.7333733］

Zhu Y， Xie J and Tang Z. 2019. Semantic-guided multi-attention localization for zero-shot learning//Proceedings of the 33rd International Conference on Neural Information Processing Systems. Vancouver， Canada：［s.n.］： 14943-14953

文章被引用时，请邮件提醒。

提交

嵌入卷积增强型Transformer的头影解剖关键点检测

层级语义融合的场景文本检测

红外与可见光图像特征动态选择的目标检测网络

注意力引导局部特征联合学习的人脸表情识别

面向高光谱场景分类的空—谱模型蒸馏网络