卷积神经网络的多字体汉字识别
Recognition of Chinese characters using deep convolutional neural network
- 2018年23卷第3期 页码:410-417
收稿:2017-07-19,
修回:2017-11-16,
纸质出版:2018-03-16
DOI: 10.11834/jig.170399
移动端阅览

浏览全部资源
扫码关注微信
收稿:2017-07-19,
修回:2017-11-16,
纸质出版:2018-03-16
移动端阅览
目的
2
多字体的汉字识别在中文自动处理及智能输入等方面具有广阔的应用前景,是模式识别领域的一个重要课题。近年来,随着深度学习新技术的出现,基于深度卷积神经网络的汉字识别在方法和性能上得到了突破性的进展。然而现有方法存在样本需求量大、训练时间长、调参难度大等问题,针对大类别的汉字识别很难达到最佳效果。
方法
2
针对无遮挡的印刷及手写体汉字图像,提出了一种端对端的深度卷积神经网络模型。不考虑附加层,该网络主要由3个卷积层、2个池化层、1个全连接层和一个Softmax回归层组成。为解决样本量不足的问题,提出了综合运用波纹扭曲、平移、旋转、缩放的数据扩增方法。为了解决深度神经网络参数调整难度大、训练时间长的问题,提出了对样本进行批标准化以及采用多种优化方法相结合精调网络等策略。
结果
2
实验采用该深度模型对国标一级3 755类汉字进行识别,最终识别准确率达到98.336%。同时通过多组对比实验,验证了所提出的各种方法对改善模型最终效果的贡献。其中使用数据扩增、使用混合优化方法和使用批标准化后模型对测试样本的识别率分别提高了8.0%、0.3%和1.4%。
结论
2
与其他文献中利用手工提取特征结合卷积神经网络的方法相比,减少了人工提取特征的工作量;与经典卷积神经网络相比,该网络特征提取能力更强,识别率更高,训练时间更短。
Objective
2
The recognition of Chinese characters has a broad application prospect in Chinese automatic processing and intelligent input. It is an important subject in the field of pattern recognition. With the emergence of the new technology of deep learning in recent years
the recognition of Chinese characters based on a deep convolutional neural network has made a breakthrough in theoretical method and actual performance. However
many problems still exist
such as the need for a large sample size
long training time
and great difficulty in parameter tuning. Thus
achieving the best identification result for Chinese characters
which belong to numerous categories
is difficult.
Method
2
An end-to-end deep convolutional neural network model was proposed for processing unscreened images with printed and handwritten Chinese characters. Regardless of the additional layers
such as batch normalization and dropout layers
the network mainly consisted of three convolutional layers
two pooling layers
one fully connected layer
and a softmax regression layer. This paper proposed the data augmentation method
which comprehensively adopted a wave distortion
translation
rotation
and zooming
to solve the problem of a small sample size. The translation and zooming scale
the rotation angles
and a large number of pseudo-samples were randomly generated by controlling the amplitude and period of the sine function that caused the wave distortion. The overall structure of the characters could not be changed
and the number of the trainset samples could be increased to infinity. Advanced strategies
such as batch normalization and fine-tuning the model by combining two optimizers
namely
stochastic gradient descent (SGD) and adaptive moment estimation (Adam)
were used to reduce the difficulty of parameter adjustment and the long model training duration. Batch normalization refers to normalizing the input data for each training mini-batch in the process of stochastic gradient descent. Thus
the probability distribution in each dimension becomes a stable probability distribution with mean 0 and standard deviation 1. We define
internal covariate shift
as the change in the distribution of network activations due to the change in network parameters during training. The network should learn to adapt to different distributions at each iteration
which will greatly reduce the training speed of the network. Batch normalization is an effective way to solve this problem. In the proposed network
the batch normalization layer was placed in front of the activation function layer. In the classic convolutional neural network
the mini-batch stochastic gradient descent method is usually adopted during the training process. However
selecting suitable hyper-parameters is difficult. Parameter selection
such as learning rate and initial weight
greatly affects training speed and classification results. Adam is an algorithm for first-order gradient-based optimization of stochastic objective functions based on adaptive estimates of lower-order moments. The method computes individual adaptive learning rates for different parameters from estimates of the first and second moments of the gradients. The greatest advantage of the method is that the magnitudes of parameter updates are invariant to the rescaling of the gradient and that the training speed can be accelerated tremendously. However
the single use of this method cannot ensure state-of-the-art results. Therefore
this paper presents a new training method that combines the novel optimization method
Adam
and the traditional method
SGD. We divided the training process into two steps. First
we adopted Adam to adjust the parameter
such as learning rate
to avoid manual adjustment and make the network coverage immediately. This process lasted for 200 iterations
and the best model was saved after the first training step. Second
SGD was used to further fine-tune the trained model with a minimal learning rate to achieve the best classification result. The initial learning rate was set to 0.0001 in this step and exponentially decayed. Through these methods
the network performed well in terms of training speed and generalization ability.
Result
2
A seven-layer deep model was trained to categorize 3
755 Chinese characters
and the recognition accuracy rate reached 98.336%. The contribution of each proposed method to improve the final effect of the model was verified by several sets of comparative experiments.The recognition rate of the model increased by 8.0%
0.3%
and 1.4% by using data augmentation
combining the two kinds of optimizers
and using batch normalization
respectively.The training time of the model was 483 and 43 minutes less than when SGD was used and batch normalization was not used
respectively.
Conclusion
2
The workload of extracting features is manually reduced compared with traditional recognition methods that use handcrafted features in combination with convolutional neural networks in the reference paper. Our proposed method achieves superior performance because it has a higher recognition rate
stronger extraction ability
and shorter training time compared with the classic convolutional neural network.
Ding X Q. Chinese character recognition:A review[J]. Acta Electronica Sinica, 2002, 30(9):1364-1368.
丁晓青.汉字识别研究的回顾[J].电子学报, 2002, 30(9):1364-1368. [DOI:10.3321/j.issn:0372-2112.2002.09.029]
Jin L W, Zhong Z Y, Yang Z, et al. Applications of deep learning for handwritten Chinese character recognition:A review[J]. Acta Automatica Sinica, 2016, 42(8):1125-1141.
金连文, 钟卓耀, 杨钊, 等.深度学习在手写汉字识别中的应用综述[J].自动化学报, 2016, 42(8):1125-1141. [DOI:10.16383/j.aas.2016.c150725]
Hinton G E, Salakhutdinov R R. Reducing the dimensionality of data with neural networks[J]. Science, 2006, 313(5786):504-507.[DOI:10.1126/science.1127647]
Ranzato M A, Poultney C, Chopra S, et al. Efficient learning of sparse representations with an energy-based model[C]//Proceedings of the 20th Annual Conference on Neural Information Processing Systems. Vancouver, BC, Canada:MIT Press, 2007:1137-1144.
Ciresan D C, Meier U, Gambardella L M, et al. Convolutional neural network committees for handwritten character classification[C]//Proceedings of 2011 International Conference on Document Analysis and Recognition. Beijing, China:IEEE, 2011:1135-1139.[ DOI:10.1109/ICDAR.2011.229 http://dx.doi.org/10.1109/ICDAR.2011.229 ]
Ciregan D, Meier U, Schmidhuber J. Multi-column deep neural networks for image classification[C]//Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence, RI:IEEE, 2012:3642-3649.[ DOI:10.1109/CVPR.2012.6248110 http://dx.doi.org/10.1109/CVPR.2012.6248110 ]
Pan W S, Jin L W, Feng Z Y. Recognition of Chinese characters based on multi-scale gradient and deep neural network[J]. Journal of Beijing University of Aeronautics and Astronautics, 2015, 41(4):751-756.
潘炜深, 金连文, 冯子勇.基于多尺度梯度及深度神经网络的汉字识别[J].北京航空航天大学学报, 2015, 41(4):751-756. [DOI:10.13700/j.bh.1001-5965.2014.0499]
Zhong Z Y, Jin L W, Xie Z C. High performance offline handwritten Chinese character recognition using GoogLeNet and directional feature maps[C]//Proceedings of the 201513th International Conference on Document Analysis and Recognition (ICDAR). Tunis, Tunisia:IEEE, 2015:846-850.[ DOI:10.1109/ICDAR.2015.7333881 http://dx.doi.org/10.1109/ICDAR.2015.7333881 ]
Yang W X, Jin L W, Tao D C, et al. DropSample:A new training method to enhance deep convolutional neural networks for large-scale unconstrained handwritten Chinese character recognition[J]. Pattern Recognition, 2016, 58:190-203.[DOI:10.1016/j.patcog.2016.04.007]
Ioffe S, Szegedy C. Batch normalization:Accelerating deep network training by reducing internal covariate shift[EB/OL]. 2015-03-02[2017-06-14] . https://arxiv.org/abs/1502.03167 https://arxiv.org/abs/1502.03167 .
Kingma D P, Ba J. Adam:A method for stochastic optimization[EB/OL]. 2017-01-30[2017-06-24] . https://arxiv.org/abs/1412.6980 https://arxiv.org/abs/1412.6980 .
Bastien F, Lamblin P, Pascanu R, et al. Theano:New features and speed improvements[EB/OL]. 2012-11-23[2017-07-01] . https://arxiv.org/abs/1211.5590 https://arxiv.org/abs/1211.5590 .
相关作者
相关机构
京公网安备11010802024621