用于在线手写公式合成的编解码网络

杨晨; 杜俊; 薛莫白; 张建树

doi:10.11834/jig.220894

文档图像智能处理与识别 | 浏览量 : 0 下载量: 0 CSCD: 2

PDF
导出
分享
收藏
专辑

用于在线手写公式合成的编解码网络
An encoder-decoder based generation model for online handwritten mathematical expressions
2023年28卷第8期页码：2356-2369
纸质出版日期： 2023-08-16 ，
DOI： 10.11834/jig.220894
稿件说明：

移动端阅览

杨晨，杜俊，薛莫白，张建树. 2023. 用于在线手写公式合成的编解码网络. 中国图象图形学报， 28(08):2356-2369

Yang Chen， Du Jun， Xue Mobai， Zhang Jianshu. 2023. An encoder-decoder based generation model for online handwritten mathematical expressions. Journal of Image and Graphics， 28(08):2356-2369
杨晨，杜俊，薛莫白，张建树. 2023. 用于在线手写公式合成的编解码网络. 中国图象图形学报， 28(08):2356-2369 DOI： 10.11834/jig.220894.

Yang Chen， Du Jun， Xue Mobai， Zhang Jianshu. 2023. An encoder-decoder based generation model for online handwritten mathematical expressions. Journal of Image and Graphics， 28(08):2356-2369 DOI： 10.11834/jig.220894.

摘要

目的

在线公式识别是一种将在线输入手写轨迹点序列转换为公式文本的任务，其广泛应用在手机、平板等便携式设备上。众所周知，训练数据对于神经网络十分重要，但获取有标注的在线公式数据所需要的成本十分昂贵，在训练数据不足的情况下，深度神经网络在该任务上的泛化性和鲁棒性会受到影响。为此，提出了一个基于编码—解码模型的在线数据生成模型。

方法

该模型从给定的公式文本生成对应的在线轨迹点序列，从而灵活地扩充训练数据规模。生成模型在编码器端设计了结合树形表示的文本特征提取模块，并且引入了基于位置的注意力算法，使模型实现了输入文本序列与输出轨迹序列间的对齐。同时，解码器端融入了不同手写人风格特征，使模型可以生成多种手写人风格的样本。

结果

实验中，首先，将本文生成方法在不同类型输入文本和不同手写人风格上的结果可视化，并展示了模型在多数情况下的有效性。其次，生成模型合成的额外数据可作为训练集的增广，该数据被用于训练Transformer-TAP（track， attend， and parse）、TAP和DenseTAP-TD（DenseNet TAP with tree decoder）模型，并分析了3种模型在使用增广数据前后的性能变化。结果表明，引入增广数据分进行训练后，3个模型的绝对识别率分别提升了0.98%、1.55%和1.06%；相对识别率分别提升了9.9%、12.37%和9.81%。

结论

本文提出的在线生成模型可以更加灵活地实现对原有数据集的增广，并有效提升了在线识别模型的泛化性能。

Abstract

Objective

The emerging digitization and intelligence techniques have facilitated the path to accept and recognize text content originated from paper documents， photos， or contexts nowadasys. Recent online mathematical expression recognition is widely used for such domain of portable devices like mobile phones and tablet PCs. The devices are required for converting the online handwritten trajectory into mathematical expression text and indicate symbols-between logical relationship in relevance to such of power， subscript and matrix. Online math calculator can be used to receive handwritten mathematical expressions in terms of online mathematical expression recognition， which makes input easier beyond LaTeX mathematical expressions with symbols of complex mathematical relation. At the same time， instant electric recording in complex scenarios becomes feasible for such scenarios like classes and academic meetings. Current encoder-decoder based mathematical expression recognition methods have been developing intensively. The quality and quantity of training data have a great impact on the performance of deep neural network. The lack of data has threatened the optimization of generalization and robustness of the model in consistency. The input form of the mathematical expression in the online scene is recognized as the track point sequence， which needs to be collected on the annotation-before real time handwriting device further. Therefore， cost of online data collection is higher than offline data. The model still has poor performance due to insufficient data.

Method

To resolve the problems mentioned above， we develop an encoder-decoder based generation model for online handwritten mathematical expressions. The model can generate the corresponding online trajectory point sequence in terms of the given mathematical expression text. We also can synthesize different-writing-style mathematical expressions by different style symbols input. A large amount of near real handwriting data is obtained at a very low cost， which expands the scale of training data flexibly and avoids lacked data fitting or over fitting of the model. For generation tasks， the ability of representation and discrimination of the encoder often affect the performance directly. The encoder aims to model the input text effectively. In detail， sufficient difference is needed between the representations of different inputs， and certain similarity is required between the ones of similar inputs as well. Intuitively， the representation of tree structure can well reflect expressions-between similarities and differences to some extent. Therefore， we design a tree representation-based text feature extraction module for the generation model in the encoder， which makes full use of the two-dimensional structure information. In addition， there is no corresponding relationship between each character of input text and the output track points. Therefore， to align the input text sequence with the output track points， we introduce a location-based attention model into the decoder. Simutaneously， to generate multiple handwriting style samples， we also integrate different handwriting style features into the decoder. The decoder can be used to synthesize the skeleton of the track through the input text， and writing style feature-related can be rendered into different styles.

Result

The method proposed is evaluated from two aspects： visual effect of generated results and the improvement of recognition tasks. First， we illustrate generation results of different difficulty， including simple sequence， complex fraction， multi-line expression and long text. Second， we select and display the generated data with similar and different writing styles. Next， we generate a large number of mathematical expression texts and synthesize online data randomly based on the generation model. Finally， we use these synthetic data as data augmentation to train the Transformer-TAP （track， attend， and parse)， TAP and Densetap-TD （DenseNet TAP with tree decoder） as well. The performance of these three models is significantly improved beneficial from synthetic data. The additional data enriches the training set and the model is mutual-benefited for more symbol combinations with different writing styles. The results show that each of the absolute recognition rates is increased by 0.98%， 1.55% and 1.06%， as well as each of the relative recognition rates is increased by 9.9%， 12.37% and 9.81%.

Conclusion

An online mathematical expression generation method is introduced based on encoder-decoder model. The method can be used to realize the generation of on-line trajectory point sequence from given expression text. It can expand the original data set more flexibly to a certain extent. Experimental result demonstrates that the synthetic data can improve the accuracy of online handwriting mathematical expression recognition effectively. It improves the generalization and robustness of the recognition model further.

关键词

深度学习手写公式识别端到端编解码模型数据增广

Keywords

deep learninghandwritten expression recognitionend-to-end networkencoder-decoder; data augmentation

references

Bian X H， Qin B， Xin X Z， Li J W， Su X F and Wang Y F. 2022. Handwritten mathematical expression recognition via attention aggregation based bi-directional mutual learning. Proceedings of the AAAI Conference on Artificial Intelligence， 36（1）： 113-121 ［DOI： 10.1609/aaai.v36i1.19885http://dx.doi.org/10.1609/aaai.v36i1.19885］

Bishop C M. 1994. Mixture density networks ［EB/OL］. ［2022-09-01］. https://publications.aston.ac.uk/id/eprint/373/1/NCRG_94_004.pdfhttps://publications.aston.ac.uk/id/eprint/373/1/NCRG_94_004.pdf

Cho K， Van Merriënboer B， Gulcehre C， Bahdanau D， Bougares F， Schwenk H and Bengio Y. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation ［EB/OL］. ［2022-09-01］. https://arxiv.org/pdf/1406.1078.pdfhttps://arxiv.org/pdf/1406.1078.pdf

Deng Y T， Kanervisto A， Ling J and Rush A M. 2017. Image-to-markup generation with coarse-to-fine attention ［EB/OL］. ［2022-09-01］. https://arxiv.org/pdf/1609.04938.pdfhttps://arxiv.org/pdf/1609.04938.pdf

Ding J， Lou Z and Yang J Y. 2009. Segmentation of numeral strings using stroke grouping. Journal of Image and Graphics， 14（8）： 1609-1614

丁杰，娄震，杨静宇. 2009. 基于笔划组合的手写数字切分. 中国图象图形学报， 14（8）： 1609-1614 ［DOI： 10.11834/jig.20090822http://dx.doi.org/10.11834/jig.20090822］

Ding J and Yang J Y. 2009. Segmentation of numeral strings based on fuzzy features. Journal of Image and Graphics， 14（11）： 2292-2298

丁杰，杨静宇. 2009. 一种基于模糊规则的手写体粘连数字串分割. 中国图象图形学报， 14（11）： 2292-2298 ［DOI： 10.11834/jig.20091116http://dx.doi.org/10.11834/jig.20091116］

Graves A. 2014. Generating sequences with recurrent neural networks ［EB/OL］. ［2022-09-01］. https://arxiv.org/pdf/1308.0850.pdfhttps://arxiv.org/pdf/1308.0850.pdf

Hong Z L， You N， Tan J and Bi N. 2019. Residual BiRNN based Seq2Seq model with transition probability matrix for online handwritten mathematical expression recognition//Proceedings of 2019 International Conference on Document Analysis and Recognition. Sydney， Australia： IEEE： 635-640 ［DOI： 10.1109/ICDAR.2019.00107http://dx.doi.org/10.1109/ICDAR.2019.00107］

Kingma D P and Ba J. 2017. Adam： a method for stochastic optimization ［EB/OL］. ［2022-09-01］. https://arxiv.org/pdf/1412.6980.pdfhttps://arxiv.org/pdf/1412.6980.pdf

Le A D， Indurkhya B and Nakagawa M. 2019. Pattern generation strategies for improving recognition of handwritten mathematical expressions. Pattern Recognition Letters， 128： 255-262 ［DOI： 10.1016/j.patrec.2019.09.002http://dx.doi.org/10.1016/j.patrec.2019.09.002］

Le A D and Nakagawa M. 2017. Training an end-to-end system for handwritten mathematical expression recognition by generated patterns//Proceedings of the 14th IAPR I International Conference on Document Analysis and Recognition. Kyoto， Japan： IEEE： 1056-1061 ［DOI： 10.1109/ICDAR.2017.175http://dx.doi.org/10.1109/ICDAR.2017.175］

Lecun Y， Bottou L， Bengio Y and Haffner P. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE， 86（11）： 2278-2324 ［DOI： 10.1109/5.726791http://dx.doi.org/10.1109/5.726791］

Ma Z M， Yuan C， Cheng Y Y and Zhu X R. 2019. Image-to-tree： a tree-structured decoder for image captioning//Proceedings of 2019 IEEE International Conference on Multimedia and Expo. Shanghai， China： IEEE： 1294-1299 ［DOI： 10.1109/ICME.2019.00225http://dx.doi.org/10.1109/ICME.2019.00225］

Reynolds D. 2009. Gaussian mixture models. Encyclopedia of Biometrics， 659-663 ［DOI： 10.1007/978-0-387-73003-5_196http://dx.doi.org/10.1007/978-0-387-73003-5_196］

Tang S S， Xia Z Q， Lian Z H， Tang Y M and Xiao J G. 2019. FontRNN： generating large-scale Chinese fonts via recurrent neural network. Computer Graphics Forum， 38（7）： 567-577 ［DOI： 10.1111/cgf.13861http://dx.doi.org/10.1111/cgf.13861］

Truong T N， Nguyen C T， Phan K M and Nakagawa M. 2020. Improvement of end-to-end offline handwritten mathematical expression recognition by weakly supervised learning//Proceedings of the 17th International Conference on Frontiers in Handwriting Recognition. Dortmund， Germany： IEEE： 181-186 ［DOI： 10.1109/ICFHR2020.2020.00042http://dx.doi.org/10.1109/ICFHR2020.2020.00042］

Wang J M， Du J， Zhang J S， Wang B and Ren B. 2021. Stroke constrained attention network for online handwritten mathematical expression recognition. Pattern Recognition， 119： #108047 ［DOI： 10.1016/j.patcog.2021.108047http://dx.doi.org/10.1016/j.patcog.2021.108047］

Wu J W， Yin F， Zhang Y M， Zhang X Y and Liu C L. 2021. Graph-to-Graph： towards accurate and interpretable online handwritten mathematical expression recognition. Proceedings of the AAAI Conference on Artificial Intelligence， 35（4）： 2925-2933 ［DOI： 10.1609/aaai.v35i4.16399http://dx.doi.org/10.1609/aaai.v35i4.16399］

Yang J F， Shi G S and Wang K. 2010. Recognition and analysis of online handwritten chemical formulas. Journal of Image and Graphics， 15（9）： 1291-1298

杨巨峰，史广顺，王恺. 2010. 联机手写化学公式识别与分析. 中国图象图形学报， 15（9）： 1291-1298 ［DOI： 10.11834/jig.20100919http://dx.doi.org/10.11834/jig.20100919］

Yuan Y， Liu X， Dikubab W， Liu H， Ji Z L， Wu Z Q and Bai X. 2022. Syntax-aware network for handwritten mathematical expression recognition ［EB/OL］. ［2022-09-01］. https://arxiv.org/pdf/2203.01601.pdfhttps://arxiv.org/pdf/2203.01601.pdf

Zeiler M D. 2012. ADADELTA： an adaptive learning rate method ［EB/OL］. ［2022-09-01］. https://arxiv.org/pdf/1212.5701.pdfhttps://arxiv.org/pdf/1212.5701.pdf

Zhang J S， Du J and Dai L R. 2019. Track， attend， and parse （TAP）： an end-to-end framework for online handwritten mathematical expression recognition. IEEE Transactions on Multimedia， 21（1）： 221-233 ［DOI： 10.1109/TMM.2018.2844689http://dx.doi.org/10.1109/TMM.2018.2844689］

Zhang J S， Du J， Yang Y X， Song Y Z and Dai L R. 2021. SRD： a tree structure based decoder for online handwritten mathematical expression recognition. IEEE Transactions on Multimedia， 23： 2471-2480 ［DOI： 10.1109/TMM.2020.3011316http://dx.doi.org/10.1109/TMM.2020.3011316］

Zhang T， Mouchère H and Viard-Gaudin C. 2020. A tree-BLSTM-based recognition system for online handwritten mathematical expressions. Neural Computing and Applications， 32（9）： 4689-4708 ［DOI： 10.1007/s00521-018-3817-2http://dx.doi.org/10.1007/s00521-018-3817-2］

Zhang X Y， Yin F， Zhang Y M， Liu C L and Bengio Y. 2018. Drawing and recognizing Chinese characters with recurrent neural network. IEEE Transactions on Pattern Analysis and Machine Intelligence， 40（4）： 849-862 ［DOI： 10.1109/TPAMI.2017.2695539http://dx.doi.org/10.1109/TPAMI.2017.2695539］

Zhao W Q， Gao L C， Yan Z Y， Peng S， Du L and Zhang Z Y. 2021. Handwritten mathematical expression recognition with bidirectionally trained transformer//Proceedings of the 16th International Conference Document Analysis and Recognition. Lausanne， Switzerland： Springer： 570-584 ［DOI： 10.1007/978-3-030-86331-9_37http://dx.doi.org/10.1007/978-3-030-86331-9_37］

文章被引用时，请邮件提醒。

提交