Current Issue Cover
用于在线手写公式合成的编解码网络

杨晨1, 杜俊1, 薛莫白1, 张建树2(1.中国科学技术大学语音及语言信息处理国家工程研究中心, 合肥 230026;2.科大讯飞股份有限公司, 合肥 230088)

摘 要
目的 在线公式识别是一种将在线输入手写轨迹点序列转换为公式文本的任务,其广泛应用在手机、平板等便携式设备上。众所周知,训练数据对于神经网络十分重要,但获取有标注的在线公式数据所需要的成本十分昂贵,在训练数据不足的情况下,深度神经网络在该任务上的泛化性和鲁棒性会受到影响。为此,提出了一个基于编码—解码模型的在线数据生成模型。方法 该模型从给定的公式文本生成对应的在线轨迹点序列,从而灵活地扩充训练数据规模。生成模型在编码器端设计了结合树形表示的文本特征提取模块,并且引入了基于位置的注意力算法,使模型实现了输入文本序列与输出轨迹序列间的对齐。同时,解码器端融入了不同手写人风格特征,使模型可以生成多种手写人风格的样本。结果 实验中,首先,将本文生成方法在不同类型输入文本和不同手写人风格上的结果可视化,并展示了模型在多数情况下的有效性。其次,生成模型合成的额外数据可作为训练集的增广,该数据被用于训练Transformer-TAP(track,attend,and parse)、TAP和DenseTAP-TD(DenseNet TAP with tree decoder)模型,并分析了3种模型在使用增广数据前后的性能变化。结果表明,引入增广数据分进行训练后,3个模型的绝对识别率分别提升了0.98%、1.55%和1.06%;相对识别率分别提升了9.9%、12.37%和9.81%。结论 本文提出的在线生成模型可以更加灵活地实现对原有数据集的增广,并有效提升了在线识别模型的泛化性能。
关键词
An encoder-decoder based generation model for online handwritten mathematical expressions

Yang Chen1, Du Jun1, Xue Mobai1, Zhang Jianshu2(1.National Engineering Research Center of Speech and Language Information Processing, University of Science and Technology of China, Hefei 230026, China;2.iFLYTEK CO. LTD., Hefei 230088, China)

Abstract
Objective The emerging digitization and intelligence techniques have facilitated the path to accept and recognize text content originated from paper documents,photos,or contexts nowadasys. Recent online mathematical expression recognition is widely used for such domain of portable devices like mobile phones and tablet PCs. The devices are required for converting the online handwritten trajectory into mathematical expression text and indicate symbols-between logical relationship in relevance to such of power,subscript and matrix. Online math calculator can be used to receive handwritten mathematical expressions in terms of online mathematical expression recognition,which makes input easier beyond LaTeX mathematical expressions with symbols of complex mathematical relation. At the same time,instant electric recording in complex scenarios becomes feasible for such scenarios like classes and academic meetings. Current encoder-decoder based mathematical expression recognition methods have been developing intensively. The quality and quantity of training data have a great impact on the performance of deep neural network. The lack of data has threatened the optimization of generalization and robustness of the model in consistency. The input form of the mathematical expression in the online scene is recognized as the track point sequence,which needs to be collected on the annotation-before real time handwriting device further. Therefore,cost of online data collection is higher than offline data. The model still has poor performance due to insufficient data. Method To resolve the problems mentioned above,we develop an encoder-decoder based generation model for online handwritten mathematical expressions. The model can generate the corresponding online trajectory point sequence in terms of the given mathematical expression text. We also can synthesize different-writing-style mathematical expressions by different style symbols input. A large amount of near real handwriting data is obtained at a very low cost,which expands the scale of training data flexibly and avoids lacked data fitting or over fitting of the model. For generation tasks,the ability of representation and discrimination of the encoder often affect the performance directly. The encoder aims to model the input text effectively. In detail,sufficient difference is needed between the representations of different inputs,and certain similarity is required between the ones of similar inputs as well. Intuitively,the representation of tree structure can well reflect expressions-between similarities and differences to some extent. Therefore,we design a tree representation-based text feature extraction module for the generation model in the encoder,which makes full use of the two-dimensional structure information. In addition,there is no corresponding relationship between each character of input text and the output track points. Therefore,to align the input text sequence with the output track points,we introduce a location-based attention model into the decoder. Simutaneously,to generate multiple handwriting style samples,we also integrate different handwriting style features into the decoder. The decoder can be used to synthesize the skeleton of the track through the input text,and writing style feature-related can be rendered into different styles. Result The method proposed is evaluated from two aspects:visual effect of generated results and the improvement of recognition tasks. First,we illustrate generation results of different difficulty,including simple sequence,complex fraction,multi-line expression and long text. Second, we select and display the generated data with similar and different writing styles. Next,we generate a large number of mathematical expression texts and synthesize online data randomly based on the generation model. Finally,we use these synthetic data as data augmentation to train the Transformer-TAP (track,attend,and parse),TAP and Densetap-TD (DenseNet TAP with tree decoder)as well. The performance of these three models is significantly improved beneficial from synthetic data. The additional data enriches the training set and the model is mutual-benefited for more symbol combinations with different writing styles. The results show that each of the absolute recognition rates is increased by 0. 98%, 1. 55% and 1. 06%,as well as each of the relative recognition rates is increased by 9. 9%,12. 37% and 9. 81%. Conclusion An online mathematical expression generation method is introduced based on encoder-decoder model. The method can be used to realize the generation of on-line trajectory point sequence from given expression text. It can expand the original data set more flexibly to a certain extent. Experimental result demonstrates that the synthetic data can improve the accuracy of online handwriting mathematical expression recognition effectively. It improves the generalization and robustness of the recognition model further.
Keywords

订阅号|日报