文档智能分析与识别前沿：回顾与展望

刘成林; 金连文; 白翔; 李晓辉; 殷飞

doi:10.11834/jig.221112

文档图像智能处理与识别 | 浏览量 : 0 下载量: 2383 CSCD: 4

PDF
导出
分享
收藏
专辑

文档智能分析与识别前沿：回顾与展望
Frontiers of intelligent document analysis and recognition： review and prospects
2023年28卷第8期页码：2223-2252
收稿：2022-11-16，

修回：2023-01-26，

纸质出版：2023-08-16
DOI： 10.11834/jig.221112
稿件说明：

移动端阅览

刘成林，金连文，白翔，李晓辉，殷飞. 2023. 文档智能分析与识别前沿：回顾与展望. 中国图象图形学报， 28(08):2223-2252 DOI： 10.11834/jig.221112.

Liu Chenglin， Jin Lianwen， Bai Xiang， Li Xiaohui， Yin Fei. 2023. Frontiers of intelligent document analysis and recognition： review and prospects. Journal of Image and Graphics， 28(08):2223-2252 DOI： 10.11834/jig.221112.

摘要

文档分析与识别（简称文档识别）技术将各种非结构化文档数据（图像、联机笔迹）转化为结构化数据，便于计算机处理和理解，应用场景十分广阔。20世纪60年代以来，文档识别方法研究与应用受到广泛关注并取得巨大进展。得益于深度学习技术的发展和应用，文档识别的性能快速提升，相关技术在文档数字化、票据处理、笔迹录入、智能交通、文档检索与信息抽取等领域得到广泛应用。首先介绍文档识别的背景和技术范畴，回顾该领域发展历史，然后重点对深度学习方法兴起以来的研究进行综述，分析当前技术存在的不足，并建议未来值得重视的研究方向。研究现状综述部分，按文档分析与识别的几个主要技术环节（文档图像预处理、版面分析、场景文本检测、文本识别、结构化符号和图形识别、文档检索与信息抽取）分别进行介绍，简述传统方法研究的代表性工作，重点介绍深度学习方法研究的新进展。总体上，当前研究对象向深度、广度扩展，处理方法全面转向深度神经网络模型和深度学习方法，识别性能大幅提升且应用场景不断扩展。在现状分析基础上，指出当前技术在识别精度和可靠性、可解释性、学习能力和自适应性等方面还有明显不足。最后从提升性能、应用扩展、提升学习能力几个角度提出一些研究方向。从提升性能角度，研究问题包括文本识别可靠性、可解释性、全要素识别、长尾问题、多语言、复杂版面分割与理解、变形文档分析与识别等。应用扩展包括新应用（如机器人流程自动化（robotic process automation， RPA）、文字信息抄录、考古）和新技术问题（语义信息抽取、跨模态融合、面向应用的推理决策等）两方面。从提升学习能力角度，相关问题包括小样本学习、迁移学习、多任务学习、领域自适应、结构化预测、弱监督学习、自监督学习、开放集学习和跨模态学习等。

Abstract

Document analysis and recognition （called document recognition in brief） is aimed to covert non-structured documents （typically， document images and online handwriting） into structured texts for facilitating computer processing and understanding. It is needed in wide applications due to the pervasive communication and usage of documents. The field of document recognition has attracted intensive attention and produced enormous progress in research and applications since 1960s. Particularly， the recent development of deep learning technology has boosted the performance of document recognition remarkably compared to traditional methods， and the technology has been applied successfully to document digitization， form processing， handwriting input， intelligent transportation， document retrieval and information extraction.In this article， we first introduce the background and involved techniques of document recognition， give an overview of the history of research （divided into four periods according to the objects of research， the methods and applications）， and then review the main research progress with emphasis on deep learning based methods developed in recent years. After identifying the insufficiency of current technology， we finally suggest some important issues for future research.The review of recent progress is divided into sections corresponding to main processing steps， namely image pre-processing， layout analysis， scene text detection， text recognition， structured symbol and graphics recognition， document retrieval and information extraction.The review of recent progress is divided into sections corresponding to the main processing steps， namely image pre-processing， layout analysis， scene text detection， text recognition， structured symbol and graphics recognition， document retrieval and information extraction. 1） Due to the popularity of camera-captured document images， the current main task in image pre-processing is the rectification of distorted image while the task of binarization is still concerned. Recent methods are mostly end-to-end deep learning based transformation methods. 2） Layout analysis is dichotomized into physical layout analysis （page segmentation） and logical layout analysis （semantic region segmentation and reading order prediction）. Recent page segmentation methods based on fully convolutional network （FCN） or graph neural network （GNN） have shown promises. Logical layout analysis has been addressed by deep neural networks fusing multi-modal information. Table structure analysis is a special task of layout analysis and has been studied intensively in recent years. 3） Scene text detection is a hot topic in document analysis and computer vision fields. Deep learning based methods for text methods can be divided into regression-based methods， segmentation-based methods and hybrid methods. FCN is prevalently used for extracting visual features， based on which models are built to predict text regions. 4） Text recognition is the core task in document analysis. We review recent works for handwritten text recognition and scene text recognition， which share some common strategies but also show different preferences. There are two main streams of methods： segmentation-based and sequence-to-sequence learning methods. The convolutional recurrent neural network （CRNN） model has received high attention in recent years and is being extended in respect of encoding， decoding or learning strategies， while segmentation-based methods combining deep learning are still performing competitively. A noteworthy tendency is the extension of text line recognition to page-level recognition. Following text recognition， we also review the works of end-to-end scene text recognition （also called as text spotting）， for which text detection and recognition models are learned jointly. 5） Among symbol and graphics in documents， mathematical expressions and flowcharts have received increasing attention. Recent methods for mathematical expression recognition are mostly image-to-markup generation methods using encoder-decoder models， while graph-based methods promise in generating both recognition and segmentation results. Flowchart recognition is addressed using structured prediction models such as GNN. 6） Document retrieval concerned mainly keyword spotting in pre-deep learning era， while recent works focus on information extraction （spotting semantic entities） by fusing layout and language information. Pre-trained layout and multi-modal language models are showing promises， while visual information is not considered adequately.Overall， the recent progress shows that the objects of recognition are expanded in breadth and depth， the methods are getting closer to deep neural networks and deep learning， the recognition performance is improved constantly， and the technology is applied to extensive scenes. The review also reveals the insufficiencies of the current technology in accuracy and reliability on various tasks， the interpretability， the learning ability and adaptability.Future works are suggested in respect of performance promotion， application extension， and improved learning. Issues of performance promotion include the reliability of recognition， interpretability， omni-element recognition， long-tailed recognition， multi-lingual documents， complex layout analysis and understanding， recognition of distorted documents. Issues related to applications include new applications （such as robotic process automation （RPA）， text scription in natural scenes， archeology）， new technical problems involved in applications （such as semantic information extraction， cross-modal fusion， reasoning and decision related to application scenes）. Aiming to improve the automatic system design， learning ability and adaptability， the involved learning problems/methods include small sample learning， transfer learning， multi-task learning， domain adaptation， structured prediction， weakly-supervised learning， self-supervised learning， open set learning， and cross-modal learning.

关键词

Keywords

references

Aberdam A ， Litman R ， Tsiper S ， Anschel O ， Slossberg R ， Mazor S ， Manmatha R and Perona P . 2021 . Sequence-to-sequence contrastive learning for text recognition // Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Nashville， USA ： IEEE： 15297 - 15307 ［ DOI： 10.1109/CVPR46437.2021.01505 http://dx.doi.org/10.1109/CVPR46437.2021.01505 ］

Almaz􀆦n J ， Gordo A ， Fornés A and Valveny E . 2014 . Word spotting and recognition with embedded attributes . IEEE Transactions on Pattern Analysis and Machine Intelligence ， 36 （ 12 ）： 2552 - 2566 ［ DOI： 10.1109/TPAMI.2014.2339814 http://dx.doi.org/10.1109/TPAMI.2014.2339814 ］

Álvaro F ， S􀆦nchez J A and Benedí J M . 2014 . Recognition of on-line handwritten mathematical expressions using 2D stochastic context-free grammars and hidden Markov models . Pattern Recognition Letters ， 35 ： 58 - 67 ［ DOI： 10.1016/j.patrec.2012.09.023 http://dx.doi.org/10.1016/j.patrec.2012.09.023 ］

Álvaro F ， S􀆦nchez J A and Benedí J M . 2016 . An integrated grammar-based approach for mathematical expression recognition . Pattern Recognition ， 51 ： 135 - 147 ［ DOI： 10.1016/j.patcog.2015.09.013 http://dx.doi.org/10.1016/j.patcog.2015.09.013 ］

Ao X ， Zhang X Y and Liu C L . 2022 . Cross-modal prototype learning for zero-shot handwritten character recognition . Pattern Recognition ， 131 ： # 108859 ［ DOI： 10.1016/j.patcog.2022.108859 http://dx.doi.org/10.1016/j.patcog.2022.108859 ］

Baek Y ， Lee B ， Han D ， Yun S and Lee H . 2019 . Character region awareness for text detection // Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Long Beach， USA ： IEEE： 9357 - 9366 ［ DOI： 10.1109/CVPR.2019.00959 http://dx.doi.org/10.1109/CVPR.2019.00959 ］

Baek Y ， Shin S ， Baek J ， Park S ， Lee J ， Nam D and Lee H . 2020 . Character region attention for text spotting // Proceedings of the 16th European Conference on Computer Vision . Glasgow， UK ： Springer： 504 - 521 ［ DOI： 10.1007/978-3-030-58526-6_30 http://dx.doi.org/10.1007/978-3-030-58526-6_30 ］

Bautista D and Atienza R . 2022 . Scene text recognition with permuted autoregressive sequence models // Proceedings of the 17th European Conference on Computer Vision . Tel Aviv， Israel ： Springer： 178 - 196 ［ DOI： 10.1007/978-3-031-19815-1_11 http://dx.doi.org/10.1007/978-3-031-19815-1_11 ］

Bian X H ， Qin B ， Xin X Z ， Li J W ， Su X F and Wang Y F . 2022 . Handwritten mathematical expression recognition via attention aggregation based bi-directional mutual learning //Proceedings of the 36th AAAI Conference on Artificial Intelligence. ［s.l.］： AAAI： 113 - 121 ［ DOI： 10.1609/aaai.v36i1.19885 http://dx.doi.org/10.1609/aaai.v36i1.19885 ］

Binmakhashen G M and Mahmoud S A . 2019 . Document layout analysis： a comprehensive survey . ACM Computing Surveys ， 52 （ 6 ）： # 109 ［ DOI： 10.1145/3355610 http://dx.doi.org/10.1145/3355610 ］

Bissacco A ， Cummins M ， Netzer Y and Neven H . 2013 . PhotoOCR： reading text in uncontrolled conditions // Proceedings of 2013 IEEE International Conference on Computer Vision . Sydney， Australia ： IEEE： 785 - 792 ［ DOI： 10.1109/ICCV.2013.102 http://dx.doi.org/10.1109/ICCV.2013.102 ］

Bluche T ， Louradour J and Messina R O . 2017 . Scan， attend and read： end-to-end handwritten paragraph recognition with MDLSTM attention // Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition . Kyoto， Japan ： IEEE： 1050 - 1055 ［ DOI： 10.1109/ICDAR.2017.174 http://dx.doi.org/10.1109/ICDAR.2017.174 ］

Bozinovic R and Srihari S N . 1982 . A string correction algorithm for cursive script recognition . IEEE Transactions on Pattern Analysis and Machine Intelligence， PAMI-4 （ 6 ）： 655 - 663 ［ DOI： 10.1109/TPAMI.1982.4767321 http://dx.doi.org/10.1109/TPAMI.1982.4767321 ］

Bresler M ， Průša D and Hlav􀆦č V . 2016 . Online recognition of sketched arrow-connected diagrams . International Journal on Document Analysis and Recognition （IJDAR）， 19 （ 3 ）： 253 - 267 ［ DOI： 10.1007/s10032-016-0269-z http://dx.doi.org/10.1007/s10032-016-0269-z ］

Brown M S and Seales W B . 2001 . Document restoration using 3D shape： a general deskewing algorithm for arbitrarily warped documents // Proceedings of the 8th IEEE International Conference on Computer Vision . Vancouver， Canada ： IEEE： 367 - 374 ［ DOI： 10.1109/ICCV.2001.937649 http://dx.doi.org/10.1109/ICCV.2001.937649 ］

Brunessaux S ， Giroux P ， Grilhères B ， Manta M ， Bodin M ， Choukri K ， Galibert O and Kahn J . 2014 . The Maurdor project： improving automatic processing of digital documents // Proceedings of the 11th IAPR International Workshop on Document Analysis Systems . Tours， France ： IEEE： 349 - 354 ［ DOI： 10.1109/DAS.2014.58 http://dx.doi.org/10.1109/DAS.2014.58 ］

Cao H G ， Bhardwaj A and Govindaraju V . 2009a . A probabilistic method for keyword retrieval in handwritten document images . Pattern Recognition ， 42 （ 12 ）： 3374 - 3382 ［ DOI： 10.1016/j.patcog.2009.02.003 http://dx.doi.org/10.1016/j.patcog.2009.02.003 ］

Cao H G . and Govindaraju V . 2009b . Preprocessing of low-quality handwritten documents using Markov random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence ， 31 （ 7 ）： 1184 - 1194 ［ DOI： 10.1109/TPAMI.2008.126 http://dx.doi.org/10.1109/TPAMI.2008.126 ］

Carbune V ， Gonnet P ， Deselaers T ， Rowley H A ， Daryin A ， Calvo M ， Wang L L ， Keysers D ， Feuz S and Gervais P . 2020 . Fast multi-language LSTM-based online handwriting recognition . International Journal on Document Analysis and Recognition （IJDAR）， 23 （ 2 ）： 89 - 102 ［ DOI： 10.1007/s10032-020-00350-4 http://dx.doi.org/10.1007/s10032-020-00350-4 ］

Carion N ， Massa F ， Synnaeve G ， Usunier N ， Kirillov A and Zagoruyko S . 2020 . End-to-end object detection with transformers // Proceedings of the 16th European Conference on Computer Vision . Glasgow， UK ： Springer： 213 - 229 ［ DOI： 10.1007/978-3-030-58452-8_13 http://dx.doi.org/10.1007/978-3-030-58452-8_13 ］

Casey R G and Lecolinet E . 1996 . A survey of methods and strategies in character segmentation . IEEE Transactions on Pattern Analysis and Machine Intelligence ， 18 （ 7 ）： 690 - 706 ［ DOI： 10.1109/34.506792 http://dx.doi.org/10.1109/34.506792 ］

Casey R G and Nagy G . 1982 . Recursive segmentation and classification of composite character patterns // Proceedings of the 6th International Conference on Pattern Recognition . Munich， Germany ：［s.n.］： 1023 - 1026

Ch’ng C K ， Chan C S and Liu C L . 2020 . Total-text： toward orientation robustness in scene text detection . International Journal on Document Analysis and Recognition （IJDAR）， 23 （ 1 ）： 31 - 52 ［ DOI： 10.1007/s10032-019-00334-z http://dx.doi.org/10.1007/s10032-019-00334-z ］

Chen X X ， Jin L W ， Zhu Y Z ， Luo C J and Wang T W . 2021 . Text recognition in the wild： a survey . ACM Computing Surveys ， 54 （ 2 ）： # 42 ［ DOI： 10.1145/3440756 http://dx.doi.org/10.1145/3440756 ］

Chen Z ， Yin F ， Yang Q and Liu C L . 2022 . Cross-lingual text image recognition via multi-hierarchy cross-modal mimic . IEEE Transactions on Multimedia ［ DOI： 10.1109/TMM.2022.3183386 http://dx.doi.org/10.1109/TMM.2022.3183386 ］

Chen Z ， Yin F ， Zhang X Y ， Yang Q and Liu C L . 2020 . MuLTReNets： Multilingual text recognition networks for simultaneous script identification and handwriting recognition . Pattern Recognition ， 108 ： # 107555 ［ DOI： 10.1016/j.patcog.2020.107555 http://dx.doi.org/10.1016/j.patcog.2020.107555 ］

Cheng Z Z ， Bai F ， Xu L L ， Zheng G ， Pu S L and Zhou S G . 2017 . Focusing attention： towards accurate text recognition in natural images // Proceedings of 2017 IEEE International Conference on Computer Vision . Venice， Italy ： IEEE： 5086 - 5094 ［ DOI： 10.1109/ICCV.2017.543 http://dx.doi.org/10.1109/ICCV.2017.543 ］

Chow C K . 1957 . An optimum character recognition system using decision functions . IRE Transactions on Electronic Computers， EC-6 （ 4 ）： 247 - 254 ［ DOI： 10.1109/TEC.1957.5222035 http://dx.doi.org/10.1109/TEC.1957.5222035 ］

Ch’ng C K ， Liu Y L ， Sun Y P ， Ng C C ， Luo C J ， Ni Z H ， Fang C M ， Zhang S T ， Han J Y ， Ding E ， Liu J T ， Karatzas D ， Chan C S and Jin L W . 2019 . ICDAR2019 robust reading challenge on arbitrary-shaped text — RRC-ArT // Proceedings of the 15th International Conference on Document Analysis and Recognition . Sydney， Australia ： IEEE： 1571 - 1576 ［ DOI： 10.1109/ICDAR.2019.00252 http://dx.doi.org/10.1109/ICDAR.2019.00252 ］

Coquenet D ， Chatelain C and Paquet T . 2023 . End-to-end handwritten paragraph text recognition using a vertical attention network . IEEE Transactions on Pattern Analysis and Machine Intelligence ， 45 （ 1 ）： 508 - 524 ［ DOI： 10.1109/TPAMI.2022.3144899 http://dx.doi.org/10.1109/TPAMI.2022.3144899 ］

Costagliola G ， de Rosa M and Fuccella V . 2014 . Local context-based recognition of sketched diagrams . Journal of Visual Languages and Computing ， 25 （ 6 ）： 955 - 962 ［ DOI： 10.1016/j.jvlc.2014.10.021 http://dx.doi.org/10.1016/j.jvlc.2014.10.021 ］

Cui L ， Xu Y H ， Lv T C and Wei F R . 2021 . Document AI： benchmarks， models and applications ［EB/OL］. ［ 2022-11-01 ］. https://arxiv.org/pdf/2111.08609.pdf https://arxiv.org/pdf/2111.08609.pdf

Das S ， Ma K ， Shu Z X ， Samaras D and Shilkrot R . 2019 . DewarpNet： single-image document unwarping with stacked 3D and 2D regression networks // Proceedings of 2019 IEEE/CVF International Conference on Computer Vision . Seoul， Korea （South）： IEEE： 131 - 140 ［ DOI： 10.1109/ICCV.2019.00022 http://dx.doi.org/10.1109/ICCV.2019.00022 ］

Davila L ， Setlur S ， Doermann D ， Kota B U and Govindaraju V . 2021 . Chart mining： a survey of methods for automated chart analysis . IEEE Transactions on Pattern Analysis and Machine Intelligence ， 43 （ 11 ）： 3799 - 3819 ［ DOI： 10.1109/TPAMI.2020.2992028 http://dx.doi.org/10.1109/TPAMI.2020.2992028 ］

Deng D ， Liu H F ， Li X L and Cai D . 2018 . PixelLink： detecting scene text via instance segmentation // Proceedings of the 32nd AAAI Conference on Artificial Intelligence . New Orleans， USA ： AAAI ［ DOI： 10.1609/aaai.v32i1.12269 http://dx.doi.org/10.1609/aaai.v32i1.12269 ］

Deng Y T ， Kanervisto A ， Ling J and Rush A M . 2017 . Image-to-markup generation with coarse-to-fine attention // Proceedings of the 34th International Conference on Machine Learning . Sydney， Australia ： JMLR.org： 980 - 989 ［ DOI： 10.5555/3305381.3305483 http://dx.doi.org/10.5555/3305381.3305483 ］

Denk T I and Reisswig C . 2019 . BERTgrid： contextualized embedding for 2D document representation and understanding // Proceedings of the 33rd Conference on Neural Information Processing Systems . Vancouver， Canada ： NeurIPS

Devlin J ， Chang M W ， Lee K and Toutanova K . 2019 . BERT： pre-training of deep bidirectional transformers for language understanding // Proceedings of 2019 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies . Minneapolis， USA ： Association for Computational Linguistics： 4171 - 4186 ［ DOI： 10.18653/v1/N19-1423 http://dx.doi.org/10.18653/v1/N19-1423 ］

Ding H S ， Chen K and Huo Q . 2021 . An encoder-decoder approach to handwritten mathematical expression recognition with multi-head attention and stacked decoder // Proceedings of the 16th International Conference on Document Analysis and Recognition . Lausanne， Switzerland ： Springer： 602 - 616 ［ DOI： 10.1007/978-3-030-86331-9_39 http://dx.doi.org/10.1007/978-3-030-86331-9_39 ］

Doermann D . 1998 . The indexing and retrieval of document images： a survey . Computer Vision and Image Understanding ， 70 （ 3 ）： 287 - 298 ［ DOI： 10.1006/cviu.1998.0692 http://dx.doi.org/10.1006/cviu.1998.0692 ］

Du Y K ， Chen Z N ， Jia C Y ， Yin X T ， Zheng T L ， Li C X ， Du Y N and Jiang Y G . 2022 . SVTR： scene text recognition with a single visual model // Proceedings of the 31st International Joint Conference on Artificial Intelligence . Vienna， Austria ： ijcai.org ［ DOI： 10.24963/ijcai.2022/124 http://dx.doi.org/10.24963/ijcai.2022/124 ］

Epshtein B ， Ofek E and Wexler Y . 2010 . Detecting text in natural scenes with stroke width transform // Proceedings of 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition . San Francisco， USA ： IEEE： 2963 - 2970 ［ DOI： 10.1109/CVPR.2010.5540041 http://dx.doi.org/10.1109/CVPR.2010.5540041 ］

Fang S C ， Xie H T ， Wang Y X ， Mao Z D and Zhang Y D . 2021 . Read like humans： autonomous， bidirectional and iterative language modeling for scene text recognition // Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Nashville， USA ： IEEE： #702 ［ DOI： 10.1109/CVPR46437.2021.00702 http://dx.doi.org/10.1109/CVPR46437.2021.00702 ］

Feng W ， He W H ， Yin F ， Zhang X Y and Liu C L . 2019 . TextDragon： an end-to-end framework for arbitrary shaped text spotting // Proceedings of 2019 IEEE/CVF International Conference on Computer Vision . Seoul， Korea （South）： IEEE： 9075 - 9084 ［ DOI： 10.1109/ICCV.2019.00917 http://dx.doi.org/10.1109/ICCV.2019.00917 ］

Fujisawa H . 2008 . Forty years of research in character and document recognition —— an industrial perspective . Pattern Recognition ， 41 （ 8 ）： 2435 - 2446 ［ DOI： 10.1016/j.patcog.2008.03.015 http://dx.doi.org/10.1016/j.patcog.2008.03.015 ］

Gao L C ， Huang Y L ， Déjean H ， Meunier J L ， Yan Q Q ， Fang Y ， Kleber F and Lang E . 2019 . ICDAR 2019 competition on table detection and recognition （cTDaR）//Proceedings of the 15th International Conference on Document Analysis and Recognition . Sydney， Australia ： IEEE ： 1510 - 1515 ［ DOI： 10.1109/ICDAR.2019.00243 http://dx.doi.org/10.1109/ICDAR.2019.00243 ］

Gao L C ， Li Y B ， Du L ， Zhang X P ， Zhu Z Y ， Lu N ， Jin L W ， Huang Y S and Tang Z . 2022 . A survey on table recognition technology . Journal of Image and Graphics ， 27 （ 6 ）： 1898 - 1917

高良才，李一博，都林，张新鹏，朱子仪，卢宁，金连文，黄永帅，汤帜 . 2022 . 表格识别技术研究进展 . 中国图象图形学报， 27 （ 6 ）： 1898 - 1917 ［ DOI： 10.11834/jig.220152 http://dx.doi.org/10.11834/jig.220152 ］

Gao L C ， Yi X H ， Jiang Z R ， Hao L P and Tang Z . 2017 . ICDAR2017 competition on page object detection // Proceedings of the 14th International Conference on Document Analysis and Recognition . Kyoto， Japan ： IEEE： 1417 - 1422 ［ DOI： 10.1109/ICDAR.2017.231 http://dx.doi.org/10.1109/ICDAR.2017.231 ］

Gorski N ， Anisimov V ， Augustin E ， Baret O ， Price D and Simon J C . 1999 . A2iA check reader： a family of bank check recognition systems // Proceedings of the 5th International Conference on Document Analysis and Recognition . Bangalore， India ： IEEE： 523 - 526 ［ DOI： 10.1109/ICDAR.1999.791840 http://dx.doi.org/10.1109/ICDAR.1999.791840 ］

Graves A ， Fernandez S ， Gomez F and Schmidhuber J . 2006 . Connectionist temporal classification： labelling unsegmented sequence data with recurrent neural networks // Proceedings of the 23rd International Conference on Machine Learning . Pittsburgh， USA ： ACM： 369 - 376 ［ DOI： 10.1145/1143844.1143891 http://dx.doi.org/10.1145/1143844.1143891 ］

Graves A ， Liwicki M ， Fern􀆦ndez S ， Bertolami R ， Bunke H and Schmidhuber J . 2009 . A novel connectionist system for unconstrained handwriting recognition . IEEE Transactions on Pattern Analysis and Machine Intelligence ， 31 （ 5 ）： 855 - 868 ［ DOI： 10.1109/TPAMI.2008.137 http://dx.doi.org/10.1109/TPAMI.2008.137 ］

He M H ， Liao M H ， Yang Z B ， Zhong H M ， Tang J ， Cheng W Q ， Yao C ， Wang Y P and Bai X . 2021 . MOST： a multi-oriented scene text detector with localization refinement // Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Nashville， USA ： IEEE： 8809 - 8818 ［ DOI： 10.1109/CVPR46437.2021.00870 http://dx.doi.org/10.1109/CVPR46437.2021.00870 ］

He W H ， Zhang X Y ， Yin F and Liu C L . 2018 . Multi-oriented and multi-lingual scene text detection with direct regression . IEEE Transactions on Image Processing ， 27 （ 11 ）： 5406 - 5419 ［ DOI： 10.1109/TIP.2018.2855399 http://dx.doi.org/10.1109/TIP.2018.2855399 ］

Hinton G E ， Osindero S and Teh Y W . 2006 . A fast learning algorithm for deep belief nets . Neural Computation ， 18 （ 7 ）： 1527 - 1554 ［ DOI： 10.1162/neco.2006.18.7.1527 http://dx.doi.org/10.1162/neco.2006.18.7.1527 ］

Huang L ， Yin F ， Chen Q H and Liu C L . 2013 . Keyword spotting in unconstrained handwritten Chinese documents using contextual word model . Image and Vision Computing ， 31 （ 12 ）： 958 - 968 ［ DOI： 10.1016/j.imavis.2013.10.003 http://dx.doi.org/10.1016/j.imavis.2013.10.003 ］

Huang M X ， Liu Y L ， Peng Z H ， Liu C Y ， Lin D H ， Zhu S G ， Yuan N ， Ding K and Jin L W . 2022a . SwinTextSpotter： scene text spotting via better synergy between text detection and text recognition // Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition . New Orleans， USA ： IEEE： 4583 - 4593 ［ DOI： 10.1109/CVPR52688.2022.00455 http://dx.doi.org/10.1109/CVPR52688.2022.00455 ］

Huang Y P ， Lv T C ， Cui L ， Lu Y T and Wei F R . 2022b . LayoutLMv3： pre-training for document AI with unified text and image masking // Proceedings of the 30th ACM International Conference on Multimedia . Lisboa， Portugal ： ACM： 4083 - 4091 ［ DOI： 10.1145/3503161.3548112 http://dx.doi.org/10.1145/3503161.3548112 ］

Huang Z H ， Xu W and Yu K . 2015 . Bidirectional LSTM-CRF models for sequence tagging ［EB/OL］. ［ 2022-11-01 ］. https://arxiv.org/pdf/1508.01991.pdf https://arxiv.org/pdf/1508.01991.pdf

Jain K ， Namboodiri A M and Subrahmonia J . 2001 . Structure in on-line documents // Proceedings of the 6th International Conference on Document Analysis and Recognition . Seattle， USA ： IEEE： 844 - 848 ［ DOI： 10.1109/ICDAR.2001.953906 http://dx.doi.org/10.1109/ICDAR.2001.953906 ］

Julca-Aguilar F ， Mouchère H ， Viard-Gaudin C and Hirata N S T . 2020 . A general framework for the recognition of online handwritten graphics . International Journal on Document Analysis and Recognition （IJDAR）， 23 （ 2 ）： 143 - 160 ［ DOI： 10.1007/s10032-019-00349-6 http://dx.doi.org/10.1007/s10032-019-00349-6 ］

Julca-Aguilar F D and Hirata N S T . 2018 . Symbol detection in online handwritten graphics using faster R-CNN // Proceedings of the 13th IAPR International Workshop on Document Analysis Systems . Vienna， Austria ： IEEE： 151 - 156 ［ DOI： 10.1109/DAS.2018.79 http://dx.doi.org/10.1109/DAS.2018.79 ］

Kang L ， Riba P ， Rusiñol M ， Fornés A and Villegas M . 2022 . Pay attention to what you read： non-recurrent handwritten text-line recognition ， Pattern Recognition ， 129 ： # 108766 ［ DOI： 10.1016/j.patcog.2022.108766 http://dx.doi.org/10.1016/j.patcog.2022.108766 ］

Karatzas D ， Gomez-Bigorda L ， Nicolaou A ， Ghosh S ， Bagdanov A ， Iwamura M ， Matas J ， Neumann L ， Chandrasekhar V R ， Lu S J ， Shafait F ， Uchida S and Valveny E . 2015 . ICDAR 2015 competition on robust reading // Proceedings of the 13th International Conference on Document Analysis and Recognition . Tunis， Tunisia ： IEEE： 1156 - 1160 ［ DOI： 10.1109/ICDAR.2015.7333942 http://dx.doi.org/10.1109/ICDAR.2015.7333942 ］

Katti A R ， Reisswig C ， Guder C ， Brarda S ， Bickel S ， Höhne J and Faddoul J B . 2018 . Chargrid： towards understanding 2D documents // Proceedings of 2018 Conference on Empirical Methods in Natural Language Processing . Brussels， Belgium ： Association for Computational Linguistics： 4459 - 4469 ［ DOI： 10.18653/v1/D18-1476 http://dx.doi.org/10.18653/v1/D18-1476 ］

Kim G ， Hong T ， Yim M ， Nam J ， Park J ， Yim J ， Hwang W ， Yun S ， Han D and Park S . 2022 . OCR-free document understanding transformer // Proceedings of the 17th European Conference on Computer Vision . Tel Aviv， Israel ： Springer： 498 - 517 ［ DOI： 10.1007/978-3-031-19815-1_29 http://dx.doi.org/10.1007/978-3-031-19815-1_29 ］

Kimura F ， Takashina K ， Tsuruoka S and Miyake Y . 1987 . Modified quadratic discriminant functions and the application to Chinese character recognition . IEEE Transactions on Pattern Analysis and Machine Intelligence， PAMI-9 （ 1 ）： 149 - 153 ［ DOI： 10.1109/TPAMI.1987.4767881 http://dx.doi.org/10.1109/TPAMI.1987.4767881 ］

Kise K ， Sato A and Iwata M . 1998 . Segmentation of page images using the area Voronoi diagram . Computer Vision and Image Understanding ， 70 （ 3 ）： 370 - 382 ［ DOI： 10.1006/cviu.1998.0684 http://dx.doi.org/10.1006/cviu.1998.0684 ］

Krishnamoorthy M ， Nagy G ， Seth S and Viswanathan M . 1993 . Syntactic segmentation and labeling of digitized pages from technical journals . IEEE Transactions on Pattern Analysis and Machine Intelligence ， 15 （ 7 ）： 737 - 747 ［ DOI： 10.1109/34.221173 http://dx.doi.org/10.1109/34.221173 ］

Le A D ， Indurkhya B and Nakagawa M . 2019 . Pattern generation strategies for improving recognition of handwritten mathematical expressions . Pattern Recognition Letters ， 128 ： 255 - 262 ［ DOI： 10.1016/j.patrec.2019.09.002 http://dx.doi.org/10.1016/j.patrec.2019.09.002 ］

LeCun Y ， Bottou L ， Bengio Y and Haffner P . 1998 . Gradient-based learning applied to document recognition . Proceedings of the IEEE ， 86 （ 11 ）： 2278 - 2324 ［ DOI： 10.1109/5.726791 http://dx.doi.org/10.1109/5.726791 ］

Li B H ， Yuan Y ， Liang D K ， Liu X ， Ji Z L ， Bai J F ， Liu W Y and Bai Z . 2022a . When counting meets HMER： counting-aware network for handwritten mathematical expression recognition // Proceedings of the 17th European Conference on Computer Vision . Tel Aviv， Israel ： Springer： 197 - 214 ［ DOI： 10.1007/978-3-031-19815-1_12 http://dx.doi.org/10.1007/978-3-031-19815-1_12 ］

Li H ， Wang P and Shen C H . 2017 . Towards end-to-end text spotting with convolutional recurrent neural networks // Proceedings of 2017 IEEE International Conference on Computer Vision . Venice， Italy ： IEEE： 5248 - 5256 ［ DOI： 10.1109/ICCV.2017.560 http://dx.doi.org/10.1109/ICCV.2017.560 ］

Li H ， Wang P ， Shen C H and Zhang G Y . 2019a . Show， attend and read： a simple and strong baseline for irregular text recognition // Proceedings of the 33rd AAAI Conference on Artificial Intelligence . Honolulu， USA ： AAAI： 8610 - 8617 ［ DOI： 10.1609/aaai.v33i01.33018610 http://dx.doi.org/10.1609/aaai.v33i01.33018610 ］

Li X H ， Yin F ， Dai H S and Liu C L . 2022b . Table structure recognition and form parsing by end-to-end object detection and relation parsing . Pattern Recognition ， 132 ： # 108946 ［ DOI： 10.1016/j.patcog.2022.108946 http://dx.doi.org/10.1016/j.patcog.2022.108946 ］

Li X H ， Yin F and Liu C L . 2020 . Page segmentation using convolutional neural network and graphical model // Proceedings of the 14th IAPR International Workshop on Document Analysis Systems . Wuhan， China ： Springer： 231 - 245 ［ DOI： 10.1007/978-3-030-57058-3_17 http://dx.doi.org/10.1007/978-3-030-57058-3_17 ］

Li X H ， Yin F ， Xue T ， Liu L ， Ogier J M and Liu C L . 2019b . Instance aware document image segmentation using label pyramid networks and deep watershed transformation // Proceedings of 2019 International Conference on Document Analysis and Recognition . Sydney， Australia ： IEEE： 514 - 519 ［ DOI： 10.1109/ICDAR.2019.00088 http://dx.doi.org/10.1109/ICDAR.2019.00088 ］

Li Y B ， Huang Y L ， Zhu Z Y ， Pan L M ， Huang Y S ， Du L ， Tang Z and Gao L C . 2021 . Rethinking table structure recognition using sequence labeling methods // Proceedings of the 16th International Conference on Document Analysis and Recognition . Lausanne， Switzerland ： Springer： 541 - 553 ［ DOI： 10.1007/978-3-030-86331-9_35 http://dx.doi.org/10.1007/978-3-030-86331-9_35 ］

Liao M H ， Lyu P Y ， He M H ， Yao C ， Wu W H and Bai X . 2021 . Mask TextSpotter： an end-to-end trainable neural network for spotting text with arbitrary shapes . IEEE Transactions on Pattern Analysis and Machine Intelligence ， 43 （ 2 ）： 532 - 548 ［ DOI： 10.1109/TPAMI.2019.2937086 http://dx.doi.org/10.1109/TPAMI.2019.2937086 ］

Liao M H ， Pang G ， Huang J ， Hassner T and Bai X . 2020a . Mask TextSpotter v3： segmentation proposal network for robust scene text spotting // Proceedings of the 16th European Conference on Computer Vision . Glasgow， UK ： Springer： 706 - 722 ［ DOI： 10.1007/978-3-030-58621-8_41 http://dx.doi.org/10.1007/978-3-030-58621-8_41 ］

Liao M H ， Shi B G and Bai X . 2018 . Textboxes++： a single-shot oriented scene text detector . IEEE Transactions on Image Processing ， 27 （ 8 ）： 3676 - 3690 ［ DOI： 10.1109/TIP.2018.2825107 http://dx.doi.org/10.1109/TIP.2018.2825107 ］

Liao M H ， Shi B G ， Bai X ， Wang W G and Liu W Y . 2017 . Textboxes： a fast text detector with a single deep neural network // Proceedings of the 31st AAAI Conference on Artificial Intelligence ， San Francisco， USA ： AAAI： 4161 - 4167 ［ DOI： 10.1609/aaai.v31i1.11196 http://dx.doi.org/10.1609/aaai.v31i1.11196 ］

Liao M H ， Wan Z Y ， Yao C ， Chen K and Bai X . 2020b . Real-time scene text detection with differentiable binarization // Proceedings of the 34th AAAI Conference on Artificial Intelligence ， New York， USA ： AAAI： 11474 - 11481 ［ DOI： 10.1609/aaai.v34i07.6812 http://dx.doi.org/10.1609/aaai.v34i07.6812 ］

Liao M H ， Zou Z S ， Wan Z Y ， Yao C and Bai X . 2023 . Real-time scene text detection with differentiable binarization and adaptive scale fusion . IEEE Transactions on Pattern Analysis and Machine Intelligence ， 45 （ 1 ）： 919 - 931 ［ DOI： 10.1109/TPAMI.2022.3155612 http://dx.doi.org/10.1109/TPAMI.2022.3155612 ］

Lin W H ， Gao Q F ， Sun L ， Zhong Z Y ， Hu K ， Ren Q and Huo Q . 2021 . ViBERTgrid： a jointly trained multi-modal 2D document representation for key information extraction from documents // Proceedings of the 16th International Conference on Document Analysis and Recognition . Lausanne， Switzerland ： Springer： 548 - 563 ［ DOI： 10.1007/978-3-030-86549-8_35 http://dx.doi.org/10.1007/978-3-030-86549-8_35 ］

Lin W H ， Sun Z ， Ma C X ， Li M Z ， Wang J W ， Sun L and Huo Q . 2022 . TSRFormer： Table structure recognition with Transformers // Proceedings of the 30th ACM International Conference on Multimedia . Lisboa， Portugal ： ACM： 6473 - 6482 ［ DOI： 10.1145/3503161.3548038 http://dx.doi.org/10.1145/3503161.3548038 ］

Liu C ， Yang C and Yin X C . 2022a . Open-set text recognition via character-context decoupling // Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition . New Orleans， USA ： IEEE： 4513 - 4522 ［ DOI： 10.1109/CVPR52688.2022.00448 http://dx.doi.org/10.1109/CVPR52688.2022.00448 ］

Liu C L . 2019 . Document image recognition： retrospective and perspective of technology . Frontiers of Data and Computing ， 1 （ 2 ）： 17 - 25

刘成林 . 2019 . 文档图像识别技术回顾与展望 . 数据与计算发展前沿， 1 （ 2 ）： 17 - 25 ［ DOI： 10.11871/jfdc.issn.2096-742X.2019.02.002 http://dx.doi.org/10.11871/jfdc.issn.2096-742X.2019.02.002 ］

Liu C L ， Yin F ， Wang D H and Wang Q F . 2011 . CASIA online and offline Chinese handwriting databases // Proceedings of the 11th International Conference on Document Analysis and Recognition ， Beijing， China ： IEEE： 37 - 41 ［ DOI： 10.1109/ICDAR.2011.17 http://dx.doi.org/10.1109/ICDAR.2011.17 ］

Liu C L ， Yin F ， Wang D H and Wang Q F . 2013 . Online and offline handwritten Chinese character recognition： benchmarking on new databases . Pattern Recognition ， 46 （ 1 ）： 155 - 162 ［ DOI： 10.1016/j.patcog.2012.06.021 http://dx.doi.org/10.1016/j.patcog.2012.06.021 ］

Liu C Y ， Chen X X ， Luo C J ， Jin L W ， Xue Y and Liu Y L . 2021 . Deep learning methods for scene text detection and recognition . Journal of Image and Graphics ， 26 （ 6 ）： 1330 - 1367

刘崇宇，陈晓雪，罗灿杰，金连文，薛洋，刘禹良 . 2021 . 自然场景文本检测与识别的深度学习方法 . 中国图象图形学报， 26 （ 6 ）： 1330 - 1367 ［ DOI： 10.11834/jig.210044 http://dx.doi.org/10.11834/jig.210044 ］

Liu H ， Li X ， Liu B ， Jiang D Q ， Liu Y S ， Ren B and Ji R R . 2021 . Show， read and reason： table structure recognition with flexible context aggregator // Proceedings of the 29th ACM International Conference on Multimedia . 2021： 1084 - 1092 ［ DOI： 10.1145/3474085.3481534 http://dx.doi.org/10.1145/3474085.3481534 ］

Liu X B ， Liang D ， Yan S ， Chen D G ， Qiao Y and Yan J J . 2018 . FOTS： fast oriented text spotting with a unified network // Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Salt Lake City， USA ： IEEE： 5676 - 5685 ［ DOI： 10.1109/CVPR.2018.00595 http://dx.doi.org/10.1109/CVPR.2018.00595 ］

Liu X J ， Gao F Y ， Zhang Q and Zhao H S . 2019a . Graph convolution for multimodal information extraction from visually rich documents // Proceedings of 2019 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies . Minneapolis， Minnesota， USA ： Association for Computational Linguistics ， 32 - 39 ［ DOI： 10.18653/v1/N19-2005 http://dx.doi.org/10.18653/v1/N19-2005 ］

Liu Y L ， Chen H ， Shen C H ， He T ， Jin L W and Wang L W . 2020 . ABCNet： real-time scene text spotting with adaptive Bezier-curve network // Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle， USA ： IEEE： 9806 - 9815 ［ DOI： 10.1109/CVPR42600.2020.00983 http://dx.doi.org/10.1109/CVPR42600.2020.00983 ］

Liu Y L and Jin L W . 2017 . Deep matching prior network： toward tighter multi-oriented text detection // Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition . Honolulu， USA ： IEEE： 3454 - 3461 ［ DOI： 10.1109/CVPR.2017.368 http://dx.doi.org/10.1109/CVPR.2017.368 ］

Liu Y L ， Jin L W ， Zhang S T ， Luo C J and Zhang S . 2019b . Curved scene text detection via transverse and longitudinal sequence connection . Pattern Recognition ， 90 ： 337 - 345 ［ DOI： 10.1016/j.patcog.2019.02.002 http://dx.doi.org/10.1016/j.patcog.2019.02.002 ］

Liu Y L ， Shen C H ， Jin L W ， He T ， Chen P ， Liu C Y and Chen H . 2022b . ABCNet v2： adaptive Bezier-curve network for real-time end-to-end text spotting . IEEE Transactions on Pattern Analysis and Machine Intelligence ， 44 （ 11 ）： 8048 - 8064 ［ DOI： 10.1109/TPAMI.2021.3107437 http://dx.doi.org/10.1109/TPAMI.2021.3107437 ］

Long S B ， He X and Yao C . 2021 . Scene text detection and recognition： the deep learning era . International Journal of Computer Vision ， 129 ： 161 - 184 ［ DOI： 10.1007/s11263-020-01369-0 http://dx.doi.org/10.1007/s11263-020-01369-0 ］

Lucas S M ， Panaretos A ， Sosa L ， Tang A ， Wong S and Young R . 2003 . ICDAR 2003 robust reading competitions // Proceedings of the 7th International Conference on Document Analysis and Recognition . Edinburgh， UK ： IEEE： 682 - 687 ［ DOI： 10.1109/ICDAR.2003.1227749 http://dx.doi.org/10.1109/ICDAR.2003.1227749 ］

Luo C J ， Jin L W and Chen J D . 2022 . SimAN： exploring self-supervised representation learning of scene text via similarity-aware normalization // Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition . New Orleans， USA ： IEEE： 1029 - 1038 ［ DOI： 10.1109/CVPR52688.2022.00111 http://dx.doi.org/10.1109/CVPR52688.2022.00111 ］

Luo C J ， Jin L W and Sun Z H . 2019 . MORAN： a multi-object rectified attention network for scene text recognition . Pattern Recognition ， 90 ： 109 - 118 ［ DOI： 10.1016/j.patcog.2019.01.020 http://dx.doi.org/10.1016/j.patcog.2019.01.020 ］

Luong M T ， Nguyen T D and Kan M Y . 2010 . Logical structure recovery in scholarly articles with rich document features . International Journal of Digital Library Systems ， 1 （ 4 ）： 1 - 23

Ma K ， Das S ， Shu Z X and Samaras D . 2022 . Learning from documents in the wild to improve document unwarping // Proceedings of ACM SIGGRAPH 2022 Conference Proceedings . Vancouver， Canada ： ACM： #34 ［ DOI： 10.1145/3528233.3530756 http://dx.doi.org/10.1145/3528233.3530756 ］

MacLean S and Labahn G . 2013 . A new approach for recognizing handwritten mathematics using relational grammars and fuzzy sets . International Journal on Document Analysis and Recognition （IJDAR）， 16 （ 2 ）： 139 - 163 ［ DOI： 10.1007/s10032-012-0184-x http://dx.doi.org/10.1007/s10032-012-0184-x ］

Mahdavi M ， Zanibbi R ， Mouchere H ， Viard-Gaudin C and Garain U . 2019 . ICDAR 2019 CROHME + TFD： competition on recognition of handwritten mathematical expressions and typeset formula detection // Proceedings of the 15th International Conference on Document Analysis and Recognition . Sydney， Australia ： IEEE： 1533 - 1538 ［ DOI： 10.1109/ICDAR.2019.00247 http://dx.doi.org/10.1109/ICDAR.2019.00247 ］

Manke S ， Finke M and Waibel A . 1995 . Npen++ ： a writer independent， large vocabulary on-line cursive handwriting recognition system// Proceedings of the 3rd International Conference on Document Analysis and Recognition . Montreal， Canada ： IEEE ， 403 - 408 ［ DOI： 10.1109/ICDAR.1995.599023 http://dx.doi.org/10.1109/ICDAR.1995.599023 ］

Marti U V and Bunke H . 2002 . The IAM-database： an English sentence database for offline handwriting recognition . International Journal on Document Analysis and Recognition ， 5 （ 1 ）： 39 - 46 ［ DOI： 10.1007/s100320200071 http://dx.doi.org/10.1007/s100320200071 ］

Matas J ， Chum O ， Urban M and Pajdla T . 2004 . Robust wide-baseline stereo from maximally stable extremal regions . Image and Vision Computing ， 22 （ 10 ）： 761 - 767 ［ DOI： 10.1016/j.imavis.2004.02.006 http://dx.doi.org/10.1016/j.imavis.2004.02.006 ］

Meng G F ， Pan C H ， Xiang S M ， Duan J Y and Zheng N N . 2012 . Metric rectification of curved document images . IEEE Transactions on Pattern Analysis and Machine Intelligence ， 34 （ 4 ）： 707 - 722 ［ DOI： 10.1109/TPAMI.2011.151 http://dx.doi.org/10.1109/TPAMI.2011.151 ］

Meng G F ， Xiang S M ， Zheng N N and Pan C H . 2013 . Nonparametric illumination correction for scanned document images via convex hulls . IEEE Transactions on Pattern Analysis and Machine Intelligence ， 35 （ 7 ）： 1730 - 1743 ［ DOI： 10.1109/TPAMI.2012.251 http://dx.doi.org/10.1109/TPAMI.2012.251 ］

Mermelstein P and Eden M . 1964 . Experiments on computer recognition of connected handwritten words . Information and Control ， 7 （ 2 ）： 255 - 270 ［ DOI： 10.1016/S0019-9958（64）90142-1 http://dx.doi.org/10.1016/S0019-9958（64）90142-1 ］

Mitra M and Chaudhuri B B . 2000 . Information retrieval from documents： a survey . Information Retrieval ， 2 （ 2 ）： 141 - 163 ［ DOI： 10.1023/A：1009950525500 http://dx.doi.org/10.1023/A：1009950525500 ］

Mori S ， Suen C Y and Yamamoto K . 1992 . Historical review of OCR research and development . Proceedings of the IEEE ， 80 （ 7 ）： 1029 - 1058 ［ DOI： 10.1109/5.156468 http://dx.doi.org/10.1109/5.156468 ］

Motahari H ， Duffy N ， Bennett P and Bedrax-Weiss T . 2020 . A report on the first workshop on document intelligence （DI） at NeurIPS 2019 . ACM SIGKDD Explorations Newsletter ， 22 （ 2 ）： 8 - 11 ［ DOI： 10.1145/3447556.3447563 http://dx.doi.org/10.1145/3447556.3447563 ］

Moysset B ， Kermorvant C and Wolf C . 2017 . Full-page text recognition： learning where to start and when to stop // Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition . Kyoto， Japan ： IEEE 871 - 876 ［ DOI： 10.1109/ICDAR.2017.147 http://dx.doi.org/10.1109/ICDAR.2017.147 ］

Murase H . 1988 . Online recognition of free-format Japanese handwritings // Proceedings of the 9th International Conference on Pattern Recognition . Rome， Italy ： IEEE： 1143 - 1147 ［ DOI： 10.1109/ICPR.1988.28462 http://dx.doi.org/10.1109/ICPR.1988.28462 ］

Nagy G . 2000 . Twenty years of document image analysis in PAMI . IEEE Transactions on Pattern Analysis and Machine Intelligence ， 22 （ 1 ）： 38 - 62 ［ DOI： 10.1109/34.824820 http://dx.doi.org/10.1109/34.824820 ］

Nagy G . 2016 . Disruptive developments in document recognition . Pattern Recognition Letters ， 79 ： 106 - 112 ［ DOI： 10.1016/j.patrec.2015.11.024 http://dx.doi.org/10.1016/j.patrec.2015.11.024 ］

Nayef N ， Yin F ， Bizid I ， Choi H ， Feng Y ， Karatzas D ， Luo Z B ， Pal U ， Rigaud C ， Chazalon J ， Khlif W ， Luqman M M ， Burie J C ， Liu C L and Ogier J M . 2017 . ICDAR2017 robust reading challenge on multi-lingual scene text detection and script identification – RRC-MLT // Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition . Kyoto， Japan ： IEEE： 1454 - 1459 ［ DOI： 10.1109/ICDAR.2017.237 http://dx.doi.org/10.1109/ICDAR.2017.237 ］

O'Gorman L . 1993 . The document spectrum for page layout analysis . IEEE Transactions on Pattern Analysis and Machine Intelligence ， 15 （ 11 ）： 1162 - 1173 ［ DOI： 10.1109/34.244677 http://dx.doi.org/10.1109/34.244677 ］

Pan Y F ， Hou X W and Liu C L . 2011 . A hybrid approach to detect and localize texts in natural scene images . IEEE Transactions on Image Processing ， 20 （ 3 ）： 800 - 813 ［ DOI： 10.1109/TIP.2010.2070803 http://dx.doi.org/10.1109/TIP.2010.2070803 ］

Peng D Z ， Jin L W ， Liu Y L ， Luo C J and Lai S X . 2022c . PageNet： towards end-to-end weakly supervised page-level handwritten Chinese text recognition . International Journal of Computer Vision ， 130 （ 11 ）： 2623 - 2645 ［ DOI： 10.1007/s11263-022-01654-0 http://dx.doi.org/10.1007/s11263-022-01654-0 ］

Peng D Z ， Jin L ， Ma W H ， Xie C Y ， Zhang H S ， Zhu S G and Li J . 2022a . Recognition of handwritten Chinese text by segmentation： a segment-annotation-free approach . IEEE Transactions on Multimedia ［ DOI： 10.1109/TMM.2022.3146771 http://dx.doi.org/10.1109/TMM.2022.3146771 ］

Peng D Z ， Wang X Y ， Liu Y L ， Zhang J X ， Huang M X ， Lai S X ， Li J ， Zhu S G ， Lin D H ， Shen C H ， Bai X and Jin L W . 2022b . SPTS： single-point text spotting // Proceedings of the 30th ACM International Conference on Multimedia . Lisboa， Portugal ： ACM ， 4272 - 4281 ［ DOI： 10.1145/3503161.3547942 http://dx.doi.org/10.1145/3503161.3547942 ］

Peng D Z ， Xie C Y ， Li H L ， Jin L W ， Xie Z C ， Ding K ， Huang Y C and Wu Y Q . 2021 . Towards Fast， Accurate and compact online handwritten Chinese text recognition // Proceedings of the 16th International Conference on Document Analysis and Recognition . Lausanne， Switzerland ： Springer： 157 - 171 ［ DOI： 10.1007/978-3-030-86334-0_11 http://dx.doi.org/10.1007/978-3-030-86334-0_11 ］

Plamondon R and Srihari S N . 2000 . Online and off-line handwriting recognition： a comprehensive survey . IEEE Transactions on Pattern Analysis and Machine Intelligence ， 22 （ 1 ）： 63 - 84 ［ DOI： 10.1109/34.824821 http://dx.doi.org/10.1109/34.824821 ］

Qiao L ， Chen Y ， Cheng Z Z ， Xu L L ， Niu Y ， Pu S L and Wu F . 2021 . MANGO： a mask attention guided one-stage scene text spotter // Proceedings of the 35th AAAI Conference on Artificial Intelligence . 35 （ 3 ）： 2467 - 2476 ［ DOI： 10.1609/aaai.v35i3.16348 http://dx.doi.org/10.1609/aaai.v35i3.16348 ］

Renton G ， Soullard Y ， Chatelain C ， Adam S ， Kermorvant C and Paquet T . 2018 . Fully convolutional network with dilated convolutions for handwritten text line segmentation . International Journal on Document Analysis and Recognition （IJDAR）， 21 （ 3 ）： 177 - 186 ［ DOI： 10.1007/s10032-018-0304-3 http://dx.doi.org/10.1007/s10032-018-0304-3 ］

Riba P ， Dutta A ， Goldmann L ， Fornés A ， Ramos O and Lladós J . 2019 . Table detection in invoice documents by graph neural networks // Proceedings of the 15th International Conference on Document Analysis and Recognition . Sydney， Australia ： IEEE： 122 - 127 ［ DOI： 10.1109/ICDAR.2019.00028 http://dx.doi.org/10.1109/ICDAR.2019.00028 ］

Saitoh T ， Tachikawa M and Yamaai T . 1993 . Document image segmentation and text area ordering // Proceedings of the 2nd International Conference on Document Analysis and Recognition . Tsukuba， Japan ： IEEE： 1993 ： 323 - 329 ［ DOI： 10.1109/ICDAR.1993.395722 http://dx.doi.org/10.1109/ICDAR.1993.395722 ］

Sauvola J and Pietikäinen M . 2000 . Adaptive document image binarization . Pattern Recognition ， 33 （ 2 ）： 225 - 236 ［ DOI： 10.1016/S0031-3203（99）00055-2 http://dx.doi.org/10.1016/S0031-3203（99）00055-2 ］

Sayre K M . 1973 . Machine recognition of handwritten words： a project report . Pattern Recognition ， 5 （ 3 ）： 213 - 228 ［ DOI： 10.1016/0031-3203（73）90044-7 http://dx.doi.org/10.1016/0031-3203（73）90044-7 ］

Schäfer B ， Keuper M and Stuckenschmidt H . 2021 . Arrow R-CNN for handwritten diagram recognition . International Journal on Document Analysis and Recognition （IJDAR）， 24 （ 1 ）： 3 - 17 ［ DOI： 10.1007/s10032-020-00361-1 http://dx.doi.org/10.1007/s10032-020-00361-1 ］

Shafait F ， Keysers D and Breuel T . 2008 . Performance evaluation and benchmarking of six-page segmentation algorithms . IEEE Transactions on Pattern Analysis and Machine Intelligence ， 30 （ 6 ）： 941 - 954 ［ DOI： 10.1109/TPAMI.2007.70837 http://dx.doi.org/10.1109/TPAMI.2007.70837 ］

Shi B G ， Bai X and Belongie S . 2017a . Detecting oriented text in natural images by linking segments // Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition . Honolulu， USA ： IEEE： 3482 - 3490 ［ DOI： 10.1109/CVPR.2017.371 http://dx.doi.org/10.1109/CVPR.2017.371 ］

Shi B G ， Bai X and Yao C . 2017b . An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition . IEEE Transactions on Pattern Analysis and Machine Intelligence ， 39 （ 11 ）： 2298 - 2304 ［ DOI： 10.1109/TPAMI.2016.2646371 http://dx.doi.org/10.1109/TPAMI.2016.2646371 ］

Shi B G ， Wang X G ， Lyu P ， Yao C and Bai X . 2016 . Robust scene text recognition with automatic rectification. Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition . Las Vegas， USA ： IEEE ： 4168 - 4176 ［ DOI： 10.1109/CVPR.2016.452 http://dx.doi.org/10.1109/CVPR.2016.452 ］

Shi B G ， Yang M K ， Wang X G ， Lyu P Y ， Yao C and Bai X . 2019 . ASTER： an attentional scene text recognizer with flexible rectification . IEEE Transactions on Pattern Analysis and Machine Intelligence ， 41 （ 9 ）： 2035 - 2048 ［ DOI： 10.1109/TPAMI.2018.2848939 http://dx.doi.org/10.1109/TPAMI.2018.2848939 ］

Shi C ， Xu C H ， Bi H Y ， Cheng Y Z ， Li Y T and Zhang H H . 2022 . Lateral feature enhancement network for page object detection . IEEE Transactions on Instrumentation and Measurement ， 71 ： # 5020310 ［ DOI： 10.1109/TIM.2022.3201546 http://dx.doi.org/10.1109/TIM.2022.3201546 ］

Siddiqui S A ， Fateh I A ， Rizvi S T R ， Dengel A and Ahmed S . 2019 . DeepTabStr： deep learning based table structure recognition // Proceedings of the 15th International Conference on Document Analysis and Recognition . Sydney， Australia ： IEEE： 1403 - 1409 ［ DOI： 10.1109/ICDAR.2019.00226 http://dx.doi.org/10.1109/ICDAR.2019.00226 ］

Song S B ， Wan J Q ， Yang Z B ， Tang J ， Cheng W Q ， Bai X and Yao C . 2022 . Vision-language pre-training for boosting scene text detectors // Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition . New Orleans， USA ： IEEE： 15660 - 15670 ［ DOI： 10.1109/CVPR52688.2022.01523 http://dx.doi.org/10.1109/CVPR52688.2022.01523 ］

Srihari S N . 1993 . Recognition of handwritten and machine-printed text for postal address interpretation . Pattern Recognition Letters ， 14 （ 4 ）： 291 - 302 ［ DOI： 10.1016/0167-8655（93）90095-U http://dx.doi.org/10.1016/0167-8655（93）90095-U ］

Suen C Y ， Berthod M and Mori S . 1980 . Automatic recognition of handprinted characters —— the state of the art . Proceedings of the IEEE ， 68 （ 4 ）： 469 - 487 ［ DOI： 10.1109/PROC.1980.11675 http://dx.doi.org/10.1109/PROC.1980.11675 ］

Suen C Y ， Lam L ， Guillevic G ， Strathy N W ， Cheriet M ， Said J N and Fan R . 1996 . Bank check processing system . International Journal of Imaging Systems and Technology ， 7 （ 4 ）： 392 - 403 ［ DOI： 10.1002/（SICI）1098-1098（199624）7：4<392：：AID-IMA14>3.0.CO；2-Y http://dx.doi.org/10.1002/（SICI）1098-1098（199624）7：4<392：：AID-IMA14>3.0.CO；2-Y ］

Sun Y P ， Ni Z H ， Chng C K ， Liu Y L ， Luo C J ， Ng C C ， Han J Y ， Ding E ， Liu J T ， Karatzas D ， Chan C S and Jin L W . 2019 . ICDAR 2019 competition on large-scale street view text with partial labeling-RRC-LSVT // Proceedings of the 15th International Conference on Document Analysis and Recognition . Sydney， Australia ： IEEE： 1557 - 1562 ［ DOI： 10.1109/ICDAR.2019.00250 http://dx.doi.org/10.1109/ICDAR.2019.00250 ］

Tang G Z ， Xie L L ， Jin L W ， Wang J P ， Chen J D ， Xu Z ， Wang Q Y ， Wu Y Q and Li H . 2021 . MatchVIE： exploiting match relevancy between entities for visual information extraction // Proceedings of the 30th International Joint Conference on Artificial Intelligence . Montreal， Canada ： ijcai.org： 1039 - 1045 ［ DOI： 10.24963/ijcai.2021/144 http://dx.doi.org/10.24963/ijcai.2021/144 ］

Tang J ， Yang Z B ， Wang Y P ， Zheng Q ， Xu Y C and Bai X . 2019 . SegLink++： detecting dense and arbitrary-shaped scene text by instance-aware component grouping . Pattern Recognition ， 96 ： # 106954 ［ DOI： 10.1016/j.patcog.2019.06.020 http://dx.doi.org/10.1016/j.patcog.2019.06.020 ］

Tang J Q ， Zhang W Q ， Liu H Y ， Yang M K ， Jiang B ， Hu G L and Bai X . 2022 . Few could be better than all： feature sampling and grouping for scene text detection // Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition . New Orleans， USA ： IEEE： 4553 - 4562 ［ DOI： 10.1109/CVPR52688.2022.00452 http://dx.doi.org/10.1109/CVPR52688.2022.00452 ］

Tensmeyer C and Wigington C . 2019 . Training full-page handwritten text recognition models without annotated line breaks // Proceedings of the 15th International Conference on Document Analysis and Recognition . Sydney， Australia ： IEEE： 1 - 8 ［ DOI： 10.1109/ICDAR.2019.00011 http://dx.doi.org/10.1109/ICDAR.2019.00011 ］

Tian Z ， Huang W L ， He T ， He P and Qiao Y . 2016 . Detecting text in natural image with connectionist text proposal network // Proceedings of the 14th European Conference on Computer Vision . Amsterdam， the Netherlands ： Springer： 56 - 72 ［ DOI： 10.1007/978-3-319-46484-8_4 http://dx.doi.org/10.1007/978-3-319-46484-8_4 ］

Tito R ， Mathew M ， Jawahar C V ， Valveny E and Karatzas D . 2021 . ICDAR 2021 competition on document visual question answering // Proceedings of the 16th International Conference on Document Analysis and Recognition . Springer， Lausanne， Switzerland ： Springer： 635 - 649 ［ DOI： 10.1007/978-3-030-86337-1_42 http://dx.doi.org/10.1007/978-3-030-86337-1_42 ］

Wakahara T ， Murase H and Odaka K . 1992 . On-line handwriting recognition . Proceedings of the IEEE ， 80 （ 7 ）： 1181 - 1194 ［ DOI： 10.1109/5.156478 http://dx.doi.org/10.1109/5.156478 ］

Wang J O ， Liu C Y ， Jin L W ， Tang G Z ， Zhang J X ， Zhang S T ， Wang Q Y ， Wu Y Q and Cai M X . 2021a . Towards robust visual information extraction in real world： new dataset and novel solution //Proceedings of the 35th AAAI Conference on Artificial Intelligence. ［s.l.］： AAAI： 2738 - 2745 ［ DOI： 10.1609/aaai.v35i4.16378 http://dx.doi.org/10.1609/aaai.v35i4.16378 ］

Wang J P ， Jin L W and Ding K . 2022a . LiLT： a simple yet effective language-independent layout transformer for structured document understanding // Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics . Dublin， Ireland ： Association for Computational Linguistics： 7747 - 7757 ［ DOI： 10.18653/v1/2022.acl-long.534 http://dx.doi.org/10.18653/v1/2022.acl-long.534 ］

Wang Q F ， Yin F and Liu C L . 2012 . Handwritten Chinese text recognition by integrating multiple contexts . IEEE Transactions on Pattern Analysis and Machine Intelligence ， 34 （ 8 ）： 1469 - 1481 ［ DOI： 10.1109/TPAMI.2011.264 http://dx.doi.org/10.1109/TPAMI.2011.264 ］

Wang T Q ， Jiang X Y and Liu C L . 2022b . Query pixel guided stroke extraction with model-based matching for offline handwritten Chinese characters . Pattern Recognition ， 123 ： # 108416 ［ DOI： 10.1016/j.patcog.2021.108416 http://dx.doi.org/10.1016/j.patcog.2021.108416 ］

Wang T W ， Zhu Y Z ， Jin L W ， Luo C J ， Chen X X ， Wu Y Q ， Wang Q Y and Cai M X . 2020a . Decoupled attention network for text recognition // Proceedings of the 634th AAAI Conference on Artificial Intelligence . New York， USA ： AAAI： 12216 - 12224 ［ DOI： 10.1609/aaai.v34i07.6903 http://dx.doi.org/10.1609/aaai.v34i07.6903 ］

Wang W H ， Xie E Z ， Li X ， Hou W B ， Lu T ， Yu G and Shao S . 2019 . Shape robust text detection with progressive scale expansion network // Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Long Beach， USA ： IEEE： 9328 - 9337 ［ DOI： 10.1109/CVPR.2019.00956 http://dx.doi.org/10.1109/CVPR.2019.00956 ］

Wang Y X ， Xie H T ， Zha Z J ， Xing M T ， Fu Z L and Zhang Y D . 2020b . ContourNet： taking a further step toward accurate arbitrary-shaped scene text detection // Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle， USA ： IEEE： 11750 - 11759 ［ DOI： 10.1109/CVPR42600.2020.01177 http://dx.doi.org/10.1109/CVPR42600.2020.01177 ］

Wang Z L ， Xu Y H ， Cui L ， Shang J B and Wei F R . 2021b . LayoutReader： pre-training of text and layout for reading order detection // Proceedings of 2021 Conference on Empirical Methods in Natural Language Processing . Punta Cana， Dominican Republic ： Association for Computational Linguistics： 4735 - 4744 ［ DOI： 10.18653/v1/2021.emnlp-main.389 http://dx.doi.org/10.18653/v1/2021.emnlp-main.389 ］

Wigington C ， Tensmeyer C ， Davis B ， Barrett W ， Price B and Cohen S . 2018 . Start， follow， read： end-to-end full-page handwriting recognition // Proceedings of the 15th European Conference on Computer Vision . Munich， Germany ： Springer： 372 - 388 ［ DOI： 10.1007/978-3-030-01231-1_23 http://dx.doi.org/10.1007/978-3-030-01231-1_23 ］

Wong K Y ， Casey R G and Wahl F M . 1982 . Document analysis system . IBM Journal of Research and Development ， 26 （ 6 ）： 647 - 656 ［ DOI： 10.1147/rd.266.0647 http://dx.doi.org/10.1147/rd.266.0647 ］

Wu J W ， Yin F ， Zhang Y M ， Zhang X Y and Liu C L . 2020a . Handwritten mathematical expression recognition via paired adversarial learning . International Journal of Computer Vision ， 128 （ 10 ）： 2386 - 2401 ［ DOI： 10.1007/s11263-020-01291-5 http://dx.doi.org/10.1007/s11263-020-01291-5 ］

Wu J W ， Yin F ， Zhang Y M ， Zhang X Y and Liu C L . 2021 . Graph-to-graph： towards accurate and interpretable online handwritten mathematical expression recognition //Proceedings of the 35th AAAI Conference on Artificial Intelligence. ［s.l.］： AAAI： 2925 - 2933 ［ DOI： 10.1609/aaai.v35i4.16399 http://dx.doi.org/10.1609/aaai.v35i4.16399 ］

Wu S H ， Wang J P ， Ma W H and Jin L W . 2020b . Precise detection of Chinese characters in historical documents with deep reinforcement learning . Pattern Recognition ， 107 ： # 107503 ［ DOI： 10.1016/j.patcog.2020.107503 http://dx.doi.org/10.1016/j.patcog.2020.107503 ］

Wu Y C ， Yin F and Liu C L . 2017 . Improving handwritten Chinese text recognition using neural network language models and convolutional neural network shape models . Pattern Recognition ， 65 ： 251 - 264 ［ DOI： 10.1016/j.patcog.2016.12.026 http://dx.doi.org/10.1016/j.patcog.2016.12.026 ］

Xiao X F ， Jin L W ， Yang Y F ， Yang W X ， Sun J and Chang T H . 2017 . Building fast and compact convolutional neural networks for offline handwritten Chinese character recognition . Pattern Recognition ， 72 ： 72 - 81 ［ DOI： 10.1016/j.patcog.2017.06.032 http://dx.doi.org/10.1016/j.patcog.2017.06.032 ］

Xie G W ， Yin F ， Zhang X Y and Liu C L . 2021 . Document dewarping with control points // Proceedings of the 16th International Conference on Document Analysis and Recognition . Lausanne， Switzerland ： Springer： 466 - 480 ［ DOI： 10.1007/978-3-030-86549-8_30 http://dx.doi.org/10.1007/978-3-030-86549-8_30 ］

Xie X D ， Fu L ， Zhang Z F ， Wang Z W and Bai X . 2022 . Toward understanding WordArt： corner-guided transformer for scene text recognition // Proceedings of the 17th European Conference on Computer Vision . Tel Aviv， Israel ： Springer： 303 - 321 ［ DOI： 10.1007/978-3-031-19815-1_18 http://dx.doi.org/10.1007/978-3-031-19815-1_18 ］

Xie Z C ， Sun Z H ， Jin L W ， Ni H and Lyons T . 2018 . Learning spatial-semantic context with fully convolutional recurrent network for online handwritten Chinese text recognition . IEEE Transactions on Pattern Analysis and Machine Intelligence ， 40 （ 8 ）： 1903 - 1917 ［ DOI： 10.1109/TPAMI.2017.2732978 http://dx.doi.org/10.1109/TPAMI.2017.2732978 ］

Xing L J ， Tian Z ， Huang W L and Scott M . 2019 . Convolutional character networks // Proceedings of 2019 IEEE/CVF International Conference on Computer Vision . Seoul， Korea （South）： IEEE： 9125 - 9135 ［ DOI： 10.1109/ICCV.2019.00922 http://dx.doi.org/10.1109/ICCV.2019.00922 ］

Xu Y ， Xu Y H ， Lv T C ， Cui L ， Wei F R ， Wang G X ， Lu Y J ， Florencio D ， Zhang C ， Che W X ， Zhang M and Zhou L D . 2021b . LayoutLMv2： multi-modal pre-training for visually-rich document understanding //Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. ［s.l.］： Association for Computational Linguistics： 2579 - 2591 ［ DOI： 10.18653/v1/2021.acl-long.201 http://dx.doi.org/10.18653/v1/2021.acl-long.201 ］

Xu Y C ， Wang Y K ， Zhou W ， Wang Y P ， Yang Z B and Bai X . 2019 . TextField： learning a deep direction field for irregular scene text detection . IEEE Transactions on Image Processing ， 28 （ 11 ）： 5566 - 5579 ［ DOI： 10.1109/TIP.2019.2900589 http://dx.doi.org/10.1109/TIP.2019.2900589 ］

Xu Y H ， Li M H ， Cui L ， Huang S H ， Wei F R and Zhou M . 2020 . LayoutLM： pre-training of text and layout for document image understanding //Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ［s.l.］： ACM： 1192 - 1200 ［ DOI： 10.1145/3394486.3403172 http://dx.doi.org/10.1145/3394486.3403172 ］

Xu Y H ， Lv T C ， Cui L ， Wang G X ， Lu Y J ， Florencio D ， Zhang C and Wei F R . 2021a . LayoutXLM： multimodal pre-training for multilingual visually-rich document understanding ［EB/OL］. ［ 2022-11-01 ］. https：//arxiv.org/pdf/2104.08836.pdf https://arxiv.org/pdf/2104.08836.pdf

Xue C H ， Tian Z C ， Zhan F N ， Lu S J and Bai S . 2022 . Fourier document restoration for robust document dewarping and recognition // Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition . New Orleans， USA ： IEEE： 4563 - 4572 ［ DOI： 10.1109/CVPR52688.2022.00453 http://dx.doi.org/10.1109/CVPR52688.2022.00453 ］

Yaeger L S ， Webb B J and Lyon R F . 1998 . Combining neural networks and context-driven search for online， printed handwriting recognition in the Newton . AI Magazine ， 19 （ 1 ）： 73 - 89 ［ DOI： 10.1609/aimag.v19i1.1355 http://dx.doi.org/10.1609/aimag.v19i1.1355 ］

Yang M K ， Liao M H ， Lu P ， Wang J ， Zhu S G ， Luo H L ， Tian Q and Bai X . 2022 . Reading and writing： discriminative and generative modeling for self-supervised text recognition // Proceedings of the 30th ACM International Conference on Multimedia . Lisboa， Portugal ： ACM： 4214 - 4223 ［ DOI： 10.1145/3503161.3547784 http://dx.doi.org/10.1145/3503161.3547784 ］

Yang X ， Yumer E ， Asente P ， Kraley M ， Kifer D and Lee Giles C . 2017 . Learning to extract semantic structure from documents using multimodal fully convolutional neural networks // Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition . Honolulu， USA ： IEEE： 4342 - 4351 ［ DOI： 10.1109/CVPR.2017.462 http://dx.doi.org/10.1109/CVPR.2017.462 ］

Yao C ， Bai X ， Liu W Y ， Ma Y and Tu Z W . 2012 . Detecting texts of arbitrary orientations in natural images // Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition . Providence， USA ： IEEE： 1083 - 1090 ［ DOI： 10.1109/CVPR.2012.6247787 http://dx.doi.org/10.1109/CVPR.2012.6247787 ］

Ye Q X and Doermann D . 2015 . Text detection and recognition in imagery： a survey . IEEE Transactions on Pattern Analysis and Machine Intelligence ， 37 （ 7 ）： 1480 - 1500 ［ DOI： 10.1109/TPAMI.2014.2366765 http://dx.doi.org/10.1109/TPAMI.2014.2366765 ］

Yepes A J ， Zhong P and Burdick D . 2021 . ICDAR 2021 competition on scientific literature parsing // Proceedings of the 16th International Conference on Document Analysis and Recognition . Lausanne， Switzerland ： Springer： 605 - 617 ［ DOI： 10.1007/978-3-030-86337-1_40 http://dx.doi.org/10.1007/978-3-030-86337-1_40 ］

Yin F and Liu C L . 2009 . Handwritten Chinese text line segmentation by clustering with distance metric learning . Pattern Recognition ， 42 （ 12 ）： 3146 - 3157 ［ DOI： 10.1016/j.patcog.2008.12.013 http://dx.doi.org/10.1016/j.patcog.2008.12.013 ］

Yin F ， Wu Y C ， Zhang X Y and Liu C L . 2017 . Scene text recognition with sliding convolutional Character models ［EB/OL］. ［ 2022-11-01 ］. https：//arxiv.org/pdf/1709.01727.pdf https://arxiv.org/pdf/1709.01727.pdf

Yin X C ， Yin X W ， Huang K Z and Hao H W . 2014 . Robust text detection in natural scene images . IEEE Transactions on Pattern Analysis and Machine Intelligence ， 36 （ 5 ）： 970 - 983 ［ DOI： 10.1109/TPAMI.2013.182 http://dx.doi.org/10.1109/TPAMI.2013.182 ］

Yousef M and Bishop T E . 2020 . OrigamiNet： weakly-supervised， segmentation-free， one-step， full page text recognition by learning to unfold // Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle， USA ： IEEE： 14698 - 14707 ［ DOI： 10.1109/CVPR42600.2020.01472 http://dx.doi.org/10.1109/CVPR42600.2020.01472 ］

Yuan Y ， Liu X ， Dikubab W ， Liu H ， Ji Z L ， Wu Z Q and Bai X . 2022 . Syntax-aware network for handwritten mathematical expression recognition // Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition . New Orleans， USA ： IEEE： 4543 - 4552 ［ DOI： 10.1109/CVPR52688.2022.00451 http://dx.doi.org/10.1109/CVPR52688.2022.00451 ］

Yun X L ， Zhang Y M ， Yin F and Liu C L . 2022 . Instance GNN： a learning framework for joint symbol segmentation and recognition in online handwritten diagrams . IEEE Transactions on Multimedia ， 24 ： 2580 - 2594 ［ DOI： 10.1109/TMM.2021.3087000 http://dx.doi.org/10.1109/TMM.2021.3087000 ］

Zahour A ， Taconet B ， Mercy P and Ramdane S . 2001 . Arabic hand-written text-line extraction // Proceedings of the 6th International Conference on Document Analysis and Recognition . Seattle， USA ： IEEE 281 - 285 ［ DOI： 10.1109/ICDAR.2001.953799 http://dx.doi.org/10.1109/ICDAR.2001.953799 ］

Zhang J S ， Du J and Dai L R . 2019b . Track， attend， and parse （TAP）： an end-to-end framework for online handwritten mathematical expression recognition . IEEE Transactions on Multimedia ， 21 （ 1 ）： 221 - 233 ［ DOI： 10.1109/TMM.2018.2844689 http://dx.doi.org/10.1109/TMM.2018.2844689 ］

Zhang J S ， Du J ， Yang Y X ， Song Y Z ， Wei S and Dai L R . 2020a . A tree-structured decoder for image-to-markup generation //Proceedings of the 37th International Conference on Machine Learning. ［s.l.］： PMLR： 11076 - 11085

Zhang J S ， Du J ， Zhang S L ， Liu D ， Hu Y L ， Hu J S ， Wei S and Dai L R . 2017a . Watch， attend and parse： an end-to-end neural network based approach to handwritten mathematical expression recognition . Pattern Recognition ， 71 ： 196 - 206 ［ DOI： 10.1016/j.patcog.2017.06.017 http://dx.doi.org/10.1016/j.patcog.2017.06.017 ］

Zhang J X ， Luo C J ， Jin L W ， Guo F J and Ding K . 2022a . Marior： margin removal and iterative content rectification for document dewarping in the wild // Proceedings of the 30th ACM International Conference on Multimedia . Lisboa， Portugal ： ACM： 2805 - 2815 ［ DOI： 10.1145/3503161.3548214 http://dx.doi.org/10.1145/3503161.3548214 ］

Zhang L ， Zhang Y and Tan C . 2008 . An improved physically-based method for geometric restoration of distorted document images . IEEE Transactions on Pattern Analysis and Machine Intelligence ， 30 （ 4 ）： 728 - 734 ［ DOI： 10.1109/TPAMI.2007.70831 http://dx.doi.org/10.1109/TPAMI.2007.70831 ］

Zhang M L ， Yin F ， Hao Y H and Liu C L . 2022b . Plane geometry diagram parsing // Proceedings of the 31st International Joint Conference on Artificial Intelligence . Vienna， Austria ： ijcai.org： 1636 - 1643 ［ DOI： 10.24963/ijcai.2022/228 http://dx.doi.org/10.24963/ijcai.2022/228 ］

Zhang P ， Xu Y L ， Cheng Z Z ， Pu S L ， Lu J ， Qiao L ， Niu Y and Wu F . 2020b . TRIE： end-to-end text reading and information extraction for document understanding // Proceedings of the 28th ACM International Conference on Multimedia . Seattle， USA ： ACM： 1413 - 1422 ［ DOI： 10.1145/3394171.3413900 http://dx.doi.org/10.1145/3394171.3413900 ］

Zhang R ， Zhou Y S ， Jiang Q Y ， Song Q ， Li N ， Zhou K ， Wang L ， Wang D ， Liao M H ， Yang M K ， Bai X ， Shi B G ， Karatzas D ， Lu S J ， Jawahar C V . 2019a . ICDAR 2019 robust reading challenge on reading Chinese text on signboard // Proceedings of the 15th International Conference on Document Analysis and Recognition . Sydney， Australia ： IEEE： 1577 - 1581 ［ DOI： 10.1109/ICDAR.2019.00253 http://dx.doi.org/10.1109/ICDAR.2019.00253 ］

Zhang S X ， Zhu X B ， Hou J B ， Liu C ， Yang C ， Wang H F and Yin X C . 2020c . Deep relational reasoning graph network for arbitrary shape text detection // Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle， USA ： IEEE： 9696 - 9705 ［ DOI： 10.1109/CVPR42600.2020.00972 http://dx.doi.org/10.1109/CVPR42600.2020.00972 ］

Zhang S X ， Zhu X B ， Yang C ， Wang H F and Yin X C . 2021a . Adaptive boundary proposal network for arbitrary shape text detection // Proceedings of 2021 IEEE/CVF International Conference on Computer Vision . Montreal， Canada ： IEEE： 1285 - 1294 ［ DOI： 10.1109/ICCV48922.2021.00134 http://dx.doi.org/10.1109/ICCV48922.2021.00134 ］

Zhang X ， Su Y W ， Tripathi S and Tu Z W . 2022c . Text spotting transformers // Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition . New Orleans， USA ： IEEE： 9509 - 9518 ［ DOI： 10.1109/CVPR52688.2022.00930 http://dx.doi.org/10.1109/CVPR52688.2022.00930 ］

Zhang X Y ， Bengio Y and Liu C L . 2017b . Online and offline handwritten Chinese character recognition： a comprehensive study and new benchmark . Pattern Recognition ， 61 ： 348 - 360 ［ DOI： 10.1016/j.patcog.2016.08.005 http://dx.doi.org/10.1016/j.patcog.2016.08.005 ］

Zhang Y K ， Zhang H ， Liu Y G and Liu C L . 2021b . Oracle character recognition based on cross-modal deep metric learning . Acta Automatica Sinica ， 47 （ 4 ）： 791 - 800

张颐康，张恒，刘永革，刘成林 . 2021b . 基于跨模态深度度量学习的甲骨文字识别 . 自动化学报， 47 （ 4 ）： 791 - 800 ［ DOI： 10.16383/j.aas.c200443 http://dx.doi.org/10.16383/j.aas.c200443 ］

Zhang Z ， Zhang C Q ， Shen W ， Yao C ， Liu W Y and Bai X . 2016 . Multi-oriented text detection with fully convolutional networks // Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition . Las Vegas， USA ： IEEE： 4159 - 4167 ［ DOI： 10.1109/CVPR.2016.451 http://dx.doi.org/10.1109/CVPR.2016.451 ］

Zhao M B ， Feng W ， Yin F ， Zhang X Y and Liu C L . 2022 . Mixed-supervised scene text detection with expectation-maximization algorithm . IEEE Transactions on Image Processing ， 31 ： 5513 - 5528 ［ DOI： 10.1109/TIP.2022.3197987 http://dx.doi.org/10.1109/TIP.2022.3197987 ］

Zhao W Q ， Gao L C ， Yan Z Y ， Peng S ， Du L and Zhang Z Y . 2021 . Handwritten mathematical expression recognition with bidirectionally trained Transformer // Proceedings of the 16th International Conference on Document Analysis and Recognition . Lausanne， Switzerland ： Springer： 570 - 584 ［ DOI： 10.1007/978-3-030-86331-9_37 http://dx.doi.org/10.1007/978-3-030-86331-9_37 ］

Zhong X ， ShafieiBavani E and Yepes A J . 2020 . Image-based table recognition： data， model， and evaluation // Proceedings of the 16th European Conference on Computer Vision . Glasgow， UK ： Springer： 564 - 580 ［ DOI： 10.1007/978-3-030-58589-1_34 http://dx.doi.org/10.1007/978-3-030-58589-1_34 ］

Zhong X ， Tang J B and Yepes A J . 2019 . PubLayNet： largest dataset ever for document layout analysis // Proceedings of the 15th International Conference on Document Analysis and Recognition . Sydney， Australia ： IEEE： 1015 - 1022 ［ DOI： 10.1109/ICDAR.2019.00166 http://dx.doi.org/10.1109/ICDAR.2019.00166 ］

Zhou X D ， Wang D H ， Tian F ， Liu C L and Nakagawa M . 2013 . Handwritten Chinese/Japanese text recognition using semi-Markov conditional random fields . IEEE Transactions on Pattern Analysis and Machine Intelligence ， 35 （ 10 ）： 2413 - 2426 ［ DOI： 10.1109/TPAMI.2013.49 http://dx.doi.org/10.1109/TPAMI.2013.49 ］

Zhou X Y ， Yao C ， Wen H ， Wang Y Z ， Zhou S C ， He W R and Liang J J . 2017 . EAST： an efficient and accurate scene text detector // Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition . Honolulu， USA ： IEEE： 2642 - 2651 ［ DOI： 10.1109/CVPR.2017.283 http://dx.doi.org/10.1109/CVPR.2017.283 ］

Zhu Y Q ， Chen J Y ， Liang L Y ， Kuang Z H ， Jin L W and Zhang W . 2021 . Fourier contour embedding for arbitrary-shaped text detection // Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Nashville， USA ： IEEE： 3122 - 3130 ［ DOI： 10.1109/CVPR46437.2021.00314 http://dx.doi.org/10.1109/CVPR46437.2021.00314 ］

Zhu Y Y ， Yao C and Bai X . 2016 . Scene text detection and recognition： recent advances and future trends . Frontiers of Computer Science ， 10 （ 1 ）： 19 - 36 ［ DOI： 10.1007/s11704-015-4488-0 http://dx.doi.org/10.1007/s11704-015-4488-0 ］

文章被引用时，请邮件提醒。

提交

TextLLM：基于动态分辨率的文档多模态大模型

少数民族文字文本分析与识别的研究进展

多特征融合的文档图像版面分析

顾及目标关联的自然场景文本检测