文档智能分析与识别前沿:回顾与展望
Frontiers of intelligent document analysis and recognition: review and prospects
- 2023年28卷第8期 页码:2223-2252
纸质出版日期: 2023-08-16
DOI: 10.11834/jig.221112
移动端阅览
浏览全部资源
扫码关注微信
纸质出版日期: 2023-08-16 ,
移动端阅览
刘成林, 金连文, 白翔, 李晓辉, 殷飞. 2023. 文档智能分析与识别前沿:回顾与展望. 中国图象图形学报, 28(08):2223-2252
Liu Chenglin, Jin Lianwen, Bai Xiang, Li Xiaohui, Yin Fei. 2023. Frontiers of intelligent document analysis and recognition: review and prospects. Journal of Image and Graphics, 28(08):2223-2252
文档分析与识别(简称文档识别)技术将各种非结构化文档数据(图像、联机笔迹)转化为结构化数据,便于计算机处理和理解,应用场景十分广阔。20世纪60年代以来,文档识别方法研究与应用受到广泛关注并取得巨大进展。得益于深度学习技术的发展和应用,文档识别的性能快速提升,相关技术在文档数字化、票据处理、笔迹录入、智能交通、文档检索与信息抽取等领域得到广泛应用。首先介绍文档识别的背景和技术范畴,回顾该领域发展历史,然后重点对深度学习方法兴起以来的研究进行综述,分析当前技术存在的不足,并建议未来值得重视的研究方向。研究现状综述部分,按文档分析与识别的几个主要技术环节(文档图像预处理、版面分析、场景文本检测、文本识别、结构化符号和图形识别、文档检索与信息抽取)分别进行介绍,简述传统方法研究的代表性工作,重点介绍深度学习方法研究的新进展。总体上,当前研究对象向深度、广度扩展,处理方法全面转向深度神经网络模型和深度学习方法,识别性能大幅提升且应用场景不断扩展。在现状分析基础上,指出当前技术在识别精度和可靠性、可解释性、学习能力和自适应性等方面还有明显不足。最后从提升性能、应用扩展、提升学习能力几个角度提出一些研究方向。从提升性能角度,研究问题包括文本识别可靠性、可解释性、全要素识别、长尾问题、多语言、复杂版面分割与理解、变形文档分析与识别等。应用扩展包括新应用(如机器人流程自动化(robotic process automation, RPA)、文字信息抄录、考古)和新技术问题(语义信息抽取、跨模态融合、面向应用的推理决策等)两方面。从提升学习能力角度,相关问题包括小样本学习、迁移学习、多任务学习、领域自适应、结构化预测、弱监督学习、自监督学习、开放集学习和跨模态学习等。
Document analysis and recognition (called document recognition in brief) is aimed to covert non-structured documents (typically, document images and online handwriting) into structured texts for facilitating computer processing and understanding. It is needed in wide applications due to the pervasive communication and usage of documents. The field of document recognition has attracted intensive attention and produced enormous progress in research and applications since 1960s. Particularly, the recent development of deep learning technology has boosted the performance of document recognition remarkably compared to traditional methods, and the technology has been applied successfully to document digitization, form processing, handwriting input, intelligent transportation, document retrieval and information extraction.In this article, we first introduce the background and involved techniques of document recognition, give an overview of the history of research (divided into four periods according to the objects of research, the methods and applications), and then review the main research progress with emphasis on deep learning based methods developed in recent years. After identifying the insufficiency of current technology, we finally suggest some important issues for future research.The review of recent progress is divided into sections corresponding to main processing steps, namely image pre-processing, layout analysis, scene text detection, text recognition, structured symbol and graphics recognition, document retrieval and information extraction.The review of recent progress is divided into sections corresponding to the main processing steps, namely image pre-processing, layout analysis, scene text detection, text recognition, structured symbol and graphics recognition, document retrieval and information extraction. 1) Due to the popularity of camera-captured document images, the current main task in image pre-processing is the rectification of distorted image while the task of binarization is still concerned. Recent methods are mostly end-to-end deep learning based transformation methods. 2) Layout analysis is dichotomized into physical layout analysis (page segmentation) and logical layout analysis (semantic region segmentation and reading order prediction). Recent page segmentation methods based on fully convolutional network (FCN) or graph neural network (GNN) have shown promises. Logical layout analysis has been addressed by deep neural networks fusing multi-modal information. Table structure analysis is a special task of layout analysis and has been studied intensively in recent years. 3) Scene text detection is a hot topic in document analysis and computer vision fields. Deep learning based methods for text methods can be divided into regression-based methods, segmentation-based methods and hybrid methods. FCN is prevalently used for extracting visual features, based on which models are built to predict text regions. 4) Text recognition is the core task in document analysis. We review recent works for handwritten text recognition and scene text recognition, which share some common strategies but also show different preferences. There are two main streams of methods: segmentation-based and sequence-to-sequence learning methods. The convolutional recurrent neural network (CRNN) model has received high attention in recent years and is being extended in respect of encoding, decoding or learning strategies, while segmentation-based methods combining deep learning are still performing competitively. A noteworthy tendency is the extension of text line recognition to page-level recognition. Following text recognition, we also review the works of end-to-end scene text recognition (also called as text spotting), for which text detection and recognition models are learned jointly. 5) Among symbol and graphics in documents, mathematical expressions and flowcharts have received increasing attention. Recent methods for mathematical expression recognition are mostly image-to-markup generation methods using encoder-decoder models, while graph-based methods promise in generating both recognition and segmentation results. Flowchart recognition is addressed using structured prediction models such as GNN. 6) Document retrieval concerned mainly keyword spotting in pre-deep learning era, while recent works focus on information extraction (spotting semantic entities) by fusing layout and language information. Pre-trained layout and multi-modal language models are showing promises, while visual information is not considered adequately.Overall, the recent progress shows that the objects of recognition are expanded in breadth and depth, the methods are getting closer to deep neural networks and deep learning, the recognition performance is improved constantly, and the technology is applied to extensive scenes. The review also reveals the insufficiencies of the current technology in accuracy and reliability on various tasks, the interpretability, the learning ability and adaptability.Future works are suggested in respect of performance promotion, application extension, and improved learning. Issues of performance promotion include the reliability of recognition, interpretability, omni-element recognition, long-tailed recognition, multi-lingual documents, complex layout analysis and understanding, recognition of distorted documents. Issues related to applications include new applications (such as robotic process automation (RPA), text scription in natural scenes, archeology), new technical problems involved in applications (such as semantic information extraction, cross-modal fusion, reasoning and decision related to application scenes). Aiming to improve the automatic system design, learning ability and adaptability, the involved learning problems/methods include small sample learning, transfer learning, multi-task learning, domain adaptation, structured prediction, weakly-supervised learning, self-supervised learning, open set learning, and cross-modal learning.
文档分析与识别文档智能版面分析文本检测文本识别图形符号识别语义信息抽取
document analysis and recognitiondocument intelligencelayout analysistext detectiontext recognitiongraphics and symbol recognitiondocument information extraction
Aberdam A, Litman R, Tsiper S, Anschel O, Slossberg R, Mazor S, Manmatha R and Perona P. 2021. Sequence-to-sequence contrastive learning for text recognition//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 15297-15307 [DOI: 10.1109/CVPR46437.2021.01505http://dx.doi.org/10.1109/CVPR46437.2021.01505]
Almazn J, Gordo A, Fornés A and Valveny E. 2014. Word spotting and recognition with embedded attributes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(12): 2552-2566 [DOI: 10.1109/TPAMI.2014.2339814http://dx.doi.org/10.1109/TPAMI.2014.2339814]
Álvaro F, Snchez J A and Benedí J M. 2014. Recognition of on-line handwritten mathematical expressions using 2D stochastic context-free grammars and hidden Markov models. Pattern Recognition Letters, 35: 58-67 [DOI: 10.1016/j.patrec.2012.09.023http://dx.doi.org/10.1016/j.patrec.2012.09.023]
Álvaro F, Snchez J A and Benedí J M. 2016. An integrated grammar-based approach for mathematical expression recognition. Pattern Recognition, 51: 135-147 [DOI: 10.1016/j.patcog.2015.09.013http://dx.doi.org/10.1016/j.patcog.2015.09.013]
Ao X, Zhang X Y and Liu C L. 2022. Cross-modal prototype learning for zero-shot handwritten character recognition. Pattern Recognition, 131: #108859 [DOI: 10.1016/j.patcog.2022.108859http://dx.doi.org/10.1016/j.patcog.2022.108859]
Baek Y, Lee B, Han D, Yun S and Lee H. 2019. Character region awareness for text detection//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 9357-9366 [DOI: 10.1109/CVPR.2019.00959http://dx.doi.org/10.1109/CVPR.2019.00959]
Baek Y, Shin S, Baek J, Park S, Lee J, Nam D and Lee H. 2020. Character region attention for text spotting//Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer: 504-521 [DOI: 10.1007/978-3-030-58526-6_30http://dx.doi.org/10.1007/978-3-030-58526-6_30]
Bautista D and Atienza R. 2022. Scene text recognition with permuted autoregressive sequence models//Proceedings of the 17th European Conference on Computer Vision. Tel Aviv, Israel: Springer: 178-196 [DOI: 10.1007/978-3-031-19815-1_11http://dx.doi.org/10.1007/978-3-031-19815-1_11]
Bian X H, Qin B, Xin X Z, Li J W, Su X F and Wang Y F. 2022. Handwritten mathematical expression recognition via attention aggregation based bi-directional mutual learning//Proceedings of the 36th AAAI Conference on Artificial Intelligence. [s.l.]: AAAI: 113-121 [DOI: 10.1609/aaai.v36i1.19885http://dx.doi.org/10.1609/aaai.v36i1.19885]
Binmakhashen G M and Mahmoud S A. 2019. Document layout analysis: a comprehensive survey. ACM Computing Surveys, 52(6): #109 [DOI: 10.1145/3355610http://dx.doi.org/10.1145/3355610]
Bissacco A, Cummins M, Netzer Y and Neven H. 2013. PhotoOCR: reading text in uncontrolled conditions//Proceedings of 2013 IEEE International Conference on Computer Vision. Sydney, Australia: IEEE: 785-792 [DOI: 10.1109/ICCV.2013.102http://dx.doi.org/10.1109/ICCV.2013.102]
Bluche T, Louradour J and Messina R O. 2017. Scan, attend and read: end-to-end handwritten paragraph recognition with MDLSTM attention//Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition. Kyoto, Japan: IEEE: 1050-1055 [DOI: 10.1109/ICDAR.2017.174http://dx.doi.org/10.1109/ICDAR.2017.174]
Bozinovic R and Srihari S N. 1982. A string correction algorithm for cursive script recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-4(6): 655-663 [DOI: 10.1109/TPAMI.1982.4767321http://dx.doi.org/10.1109/TPAMI.1982.4767321]
Bresler M, Průša D and Hlavč V. 2016. Online recognition of sketched arrow-connected diagrams. International Journal on Document Analysis and Recognition (IJDAR), 19(3): 253-267 [DOI: 10.1007/s10032-016-0269-zhttp://dx.doi.org/10.1007/s10032-016-0269-z]
Brown M S and Seales W B. 2001. Document restoration using 3D shape: a general deskewing algorithm for arbitrarily warped documents//Proceedings of the 8th IEEE International Conference on Computer Vision. Vancouver, Canada: IEEE: 367-374 [DOI: 10.1109/ICCV.2001.937649http://dx.doi.org/10.1109/ICCV.2001.937649]
Brunessaux S, Giroux P, Grilhères B, Manta M, Bodin M, Choukri K, Galibert O and Kahn J. 2014. The Maurdor project: improving automatic processing of digital documents//Proceedings of the 11th IAPR International Workshop on Document Analysis Systems. Tours, France: IEEE: 349-354 [DOI: 10.1109/DAS.2014.58http://dx.doi.org/10.1109/DAS.2014.58]
Cao H G, Bhardwaj A and Govindaraju V. 2009a. A probabilistic method for keyword retrieval in handwritten document images. Pattern Recognition, 42(12): 3374-3382 [DOI: 10.1016/j.patcog.2009.02.003http://dx.doi.org/10.1016/j.patcog.2009.02.003]
Cao H G. and Govindaraju V. 2009b. Preprocessing of low-quality handwritten documents using Markov random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(7): 1184-1194 [DOI: 10.1109/TPAMI.2008.126http://dx.doi.org/10.1109/TPAMI.2008.126]
Carbune V, Gonnet P, Deselaers T, Rowley H A, Daryin A, Calvo M, Wang L L, Keysers D, Feuz S and Gervais P. 2020. Fast multi-language LSTM-based online handwriting recognition. International Journal on Document Analysis and Recognition (IJDAR), 23(2): 89-102 [DOI: 10.1007/s10032-020-00350-4http://dx.doi.org/10.1007/s10032-020-00350-4]
Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A and Zagoruyko S. 2020. End-to-end object detection with transformers//Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer: 213-229 [DOI: 10.1007/978-3-030-58452-8_13http://dx.doi.org/10.1007/978-3-030-58452-8_13]
Casey R G and Lecolinet E. 1996. A survey of methods and strategies in character segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(7): 690-706 [DOI: 10.1109/34.506792http://dx.doi.org/10.1109/34.506792]
Casey R G and Nagy G. 1982. Recursive segmentation and classification of composite character patterns//Proceedings of the 6th International Conference on Pattern Recognition. Munich, Germany: [s.n.]: 1023-1026
Ch’ng C K, Chan C S and Liu C L. 2020. Total-text: toward orientation robustness in scene text detection. International Journal on Document Analysis and Recognition (IJDAR), 23(1): 31-52 [DOI: 10.1007/s10032-019-00334-zhttp://dx.doi.org/10.1007/s10032-019-00334-z]
Chen X X, Jin L W, Zhu Y Z, Luo C J and Wang T W. 2021. Text recognition in the wild: a survey. ACM Computing Surveys, 54(2): #42 [DOI: 10.1145/3440756http://dx.doi.org/10.1145/3440756]
Chen Z, Yin F, Yang Q and Liu C L. 2022. Cross-lingual text image recognition via multi-hierarchy cross-modal mimic. IEEE Transactions on Multimedia [DOI: 10.1109/TMM.2022.3183386http://dx.doi.org/10.1109/TMM.2022.3183386]
Chen Z, Yin F, Zhang X Y, Yang Q and Liu C L. 2020. MuLTReNets: Multilingual text recognition networks for simultaneous script identification and handwriting recognition. Pattern Recognition, 108: #107555 [DOI: 10.1016/j.patcog.2020.107555http://dx.doi.org/10.1016/j.patcog.2020.107555]
Cheng Z Z, Bai F, Xu L L, Zheng G, Pu S L and Zhou S G. 2017. Focusing attention: towards accurate text recognition in natural images//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 5086-5094 [DOI: 10.1109/ICCV.2017.543http://dx.doi.org/10.1109/ICCV.2017.543]
Chow C K. 1957. An optimum character recognition system using decision functions. IRE Transactions on Electronic Computers, EC-6(4): 247-254 [DOI: 10.1109/TEC.1957.5222035http://dx.doi.org/10.1109/TEC.1957.5222035]
Ch’ng C K, Liu Y L, Sun Y P, Ng C C, Luo C J, Ni Z H, Fang C M, Zhang S T, Han J Y, Ding E, Liu J T, Karatzas D, Chan C S and Jin L W. 2019. ICDAR2019 robust reading challenge on arbitrary-shaped text — RRC-ArT//Proceedings of the 15th International Conference on Document Analysis and Recognition. Sydney, Australia: IEEE: 1571-1576 [DOI: 10.1109/ICDAR.2019.00252http://dx.doi.org/10.1109/ICDAR.2019.00252]
Coquenet D, Chatelain C and Paquet T. 2023. End-to-end handwritten paragraph text recognition using a vertical attention network. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(1): 508-524 [DOI: 10.1109/TPAMI.2022.3144899http://dx.doi.org/10.1109/TPAMI.2022.3144899]
Costagliola G, de Rosa M and Fuccella V. 2014. Local context-based recognition of sketched diagrams. Journal of Visual Languages and Computing, 25(6): 955-962 [DOI: 10.1016/j.jvlc.2014.10.021http://dx.doi.org/10.1016/j.jvlc.2014.10.021]
Cui L, Xu Y H, Lv T C and Wei F R. 2021. Document AI: benchmarks, models and applications [EB/OL]. [2022-11-01]. https://arxiv.org/pdf/2111.08609.pdfhttps://arxiv.org/pdf/2111.08609.pdf
Das S, Ma K, Shu Z X, Samaras D and Shilkrot R. 2019. DewarpNet: single-image document unwarping with stacked 3D and 2D regression networks//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 131-140 [DOI: 10.1109/ICCV.2019.00022http://dx.doi.org/10.1109/ICCV.2019.00022]
Davila L, Setlur S, Doermann D, Kota B U and Govindaraju V. 2021. Chart mining: a survey of methods for automated chart analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(11): 3799-3819 [DOI: 10.1109/TPAMI.2020.2992028http://dx.doi.org/10.1109/TPAMI.2020.2992028]
Deng D, Liu H F, Li X L and Cai D. 2018. PixelLink: detecting scene text via instance segmentation//Proceedings of the 32nd AAAI Conference on Artificial Intelligence. New Orleans, USA: AAAI [DOI: 10.1609/aaai.v32i1.12269http://dx.doi.org/10.1609/aaai.v32i1.12269]
Deng Y T, Kanervisto A, Ling J and Rush A M. 2017. Image-to-markup generation with coarse-to-fine attention//Proceedings of the 34th International Conference on Machine Learning. Sydney, Australia: JMLR.org: 980-989 [DOI: 10.5555/3305381.3305483http://dx.doi.org/10.5555/3305381.3305483]
Denk T I and Reisswig C. 2019. BERTgrid: contextualized embedding for 2D document representation and understanding//Proceedings of the 33rd Conference on Neural Information Processing Systems. Vancouver, Canada: NeurIPS
Devlin J, Chang M W, Lee K and Toutanova K. 2019. BERT: pre-training of deep bidirectional transformers for language understanding//Proceedings of 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Minneapolis, USA: Association for Computational Linguistics: 4171-4186 [DOI: 10.18653/v1/N19-1423http://dx.doi.org/10.18653/v1/N19-1423]
Ding H S, Chen K and Huo Q. 2021. An encoder-decoder approach to handwritten mathematical expression recognition with multi-head attention and stacked decoder//Proceedings of the 16th International Conference on Document Analysis and Recognition. Lausanne, Switzerland: Springer: 602-616 [DOI: 10.1007/978-3-030-86331-9_39http://dx.doi.org/10.1007/978-3-030-86331-9_39]
Doermann D. 1998. The indexing and retrieval of document images: a survey. Computer Vision and Image Understanding, 70(3): 287-298 [DOI: 10.1006/cviu.1998.0692http://dx.doi.org/10.1006/cviu.1998.0692]
Du Y K, Chen Z N, Jia C Y, Yin X T, Zheng T L, Li C X, Du Y N and Jiang Y G. 2022. SVTR: scene text recognition with a single visual model//Proceedings of the 31st International Joint Conference on Artificial Intelligence. Vienna, Austria: ijcai.org [DOI: 10.24963/ijcai.2022/124http://dx.doi.org/10.24963/ijcai.2022/124]
Epshtein B, Ofek E and Wexler Y. 2010. Detecting text in natural scenes with stroke width transform//Proceedings of 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Francisco, USA: IEEE: 2963-2970 [DOI: 10.1109/CVPR.2010.5540041http://dx.doi.org/10.1109/CVPR.2010.5540041]
Fang S C, Xie H T, Wang Y X, Mao Z D and Zhang Y D. 2021. Read like humans: autonomous, bidirectional and iterative language modeling for scene text recognition//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: #702 [DOI: 10.1109/CVPR46437.2021.00702http://dx.doi.org/10.1109/CVPR46437.2021.00702]
Feng W, He W H, Yin F, Zhang X Y and Liu C L. 2019. TextDragon: an end-to-end framework for arbitrary shaped text spotting//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 9075-9084 [DOI: 10.1109/ICCV.2019.00917http://dx.doi.org/10.1109/ICCV.2019.00917]
Fujisawa H. 2008. Forty years of research in character and document recognition —— an industrial perspective. Pattern Recognition, 41(8): 2435-2446 [DOI: 10.1016/j.patcog.2008.03.015http://dx.doi.org/10.1016/j.patcog.2008.03.015]
Gao L C, Huang Y L, Déjean H, Meunier J L, Yan Q Q, Fang Y, Kleber F and Lang E. 2019. ICDAR 2019 competition on table detection and recognition (cTDaR)//Proceedings of the 15th International Conference on Document Analysis and Recognition. Sydney, Australia: IEEE: 1510-1515 [DOI: 10.1109/ICDAR.2019.00243http://dx.doi.org/10.1109/ICDAR.2019.00243]
Gao L C, Li Y B, Du L, Zhang X P, Zhu Z Y, Lu N, Jin L W, Huang Y S and Tang Z. 2022. A survey on table recognition technology. Journal of Image and Graphics, 27(6): 1898-1917
高良才, 李一博, 都林, 张新鹏, 朱子仪, 卢宁, 金连文, 黄永帅, 汤帜. 2022. 表格识别技术研究进展. 中国图象图形学报, 27(6): 1898-1917 [DOI: 10.11834/jig.220152http://dx.doi.org/10.11834/jig.220152]
Gao L C, Yi X H, Jiang Z R, Hao L P and Tang Z. 2017. ICDAR2017 competition on page object detection//Proceedings of the 14th International Conference on Document Analysis and Recognition. Kyoto, Japan: IEEE: 1417-1422 [DOI: 10.1109/ICDAR.2017.231http://dx.doi.org/10.1109/ICDAR.2017.231]
Gorski N, Anisimov V, Augustin E, Baret O, Price D and Simon J C. 1999. A2iA check reader: a family of bank check recognition systems//Proceedings of the 5th International Conference on Document Analysis and Recognition. Bangalore, India: IEEE: 523-526 [DOI: 10.1109/ICDAR.1999.791840http://dx.doi.org/10.1109/ICDAR.1999.791840]
Graves A, Fernandez S, Gomez F and Schmidhuber J. 2006. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks//Proceedings of the 23rd International Conference on Machine Learning. Pittsburgh, USA: ACM: 369-376 [DOI: 10.1145/1143844.1143891http://dx.doi.org/10.1145/1143844.1143891]
Graves A, Liwicki M, Fernndez S, Bertolami R, Bunke H and Schmidhuber J. 2009. A novel connectionist system for unconstrained handwriting recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(5): 855-868 [DOI: 10.1109/TPAMI.2008.137http://dx.doi.org/10.1109/TPAMI.2008.137]
He M H, Liao M H, Yang Z B, Zhong H M, Tang J, Cheng W Q, Yao C, Wang Y P and Bai X. 2021. MOST: a multi-oriented scene text detector with localization refinement//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 8809-8818 [DOI: 10.1109/CVPR46437.2021.00870http://dx.doi.org/10.1109/CVPR46437.2021.00870]
He W H, Zhang X Y, Yin F and Liu C L. 2018. Multi-oriented and multi-lingual scene text detection with direct regression. IEEE Transactions on Image Processing, 27(11): 5406-5419 [DOI: 10.1109/TIP.2018.2855399http://dx.doi.org/10.1109/TIP.2018.2855399]
Hinton G E, Osindero S and Teh Y W. 2006. A fast learning algorithm for deep belief nets. Neural Computation, 18(7): 1527-1554 [DOI: 10.1162/neco.2006.18.7.1527http://dx.doi.org/10.1162/neco.2006.18.7.1527]
Huang L, Yin F, Chen Q H and Liu C L. 2013. Keyword spotting in unconstrained handwritten Chinese documents using contextual word model. Image and Vision Computing, 31(12): 958-968 [DOI: 10.1016/j.imavis.2013.10.003http://dx.doi.org/10.1016/j.imavis.2013.10.003]
Huang M X, Liu Y L, Peng Z H, Liu C Y, Lin D H, Zhu S G, Yuan N, Ding K and Jin L W. 2022a. SwinTextSpotter: scene text spotting via better synergy between text detection and text recognition//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE: 4583-4593 [DOI: 10.1109/CVPR52688.2022.00455http://dx.doi.org/10.1109/CVPR52688.2022.00455]
Huang Y P, Lv T C, Cui L, Lu Y T and Wei F R. 2022b. LayoutLMv3: pre-training for document AI with unified text and image masking//Proceedings of the 30th ACM International Conference on Multimedia. Lisboa, Portugal: ACM: 4083-4091 [DOI: 10.1145/3503161.3548112http://dx.doi.org/10.1145/3503161.3548112]
Huang Z H, Xu W and Yu K. 2015. Bidirectional LSTM-CRF models for sequence tagging [EB/OL]. [2022-11-01]. https://arxiv.org/pdf/1508.01991.pdfhttps://arxiv.org/pdf/1508.01991.pdf
Jain K, Namboodiri A M and Subrahmonia J. 2001. Structure in on-line documents//Proceedings of the 6th International Conference on Document Analysis and Recognition. Seattle, USA: IEEE: 844-848 [DOI: 10.1109/ICDAR.2001.953906http://dx.doi.org/10.1109/ICDAR.2001.953906]
Julca-Aguilar F, Mouchère H, Viard-Gaudin C and Hirata N S T. 2020. A general framework for the recognition of online handwritten graphics. International Journal on Document Analysis and Recognition (IJDAR), 23(2): 143-160 [DOI: 10.1007/s10032-019-00349-6http://dx.doi.org/10.1007/s10032-019-00349-6]
Julca-Aguilar F D and Hirata N S T. 2018. Symbol detection in online handwritten graphics using faster R-CNN//Proceedings of the 13th IAPR International Workshop on Document Analysis Systems. Vienna, Austria: IEEE: 151-156 [DOI: 10.1109/DAS.2018.79http://dx.doi.org/10.1109/DAS.2018.79]
Kang L, Riba P, Rusiñol M, Fornés A and Villegas M. 2022. Pay attention to what you read: non-recurrent handwritten text-line recognition, Pattern Recognition, 129: #108766 [DOI: 10.1016/j.patcog.2022.108766http://dx.doi.org/10.1016/j.patcog.2022.108766]
Karatzas D, Gomez-Bigorda L, Nicolaou A, Ghosh S, Bagdanov A, Iwamura M, Matas J, Neumann L, Chandrasekhar V R, Lu S J, Shafait F, Uchida S and Valveny E. 2015. ICDAR 2015 competition on robust reading//Proceedings of the 13th International Conference on Document Analysis and Recognition. Tunis, Tunisia: IEEE: 1156-1160 [DOI: 10.1109/ICDAR.2015.7333942http://dx.doi.org/10.1109/ICDAR.2015.7333942]
Katti A R, Reisswig C, Guder C, Brarda S, Bickel S, Höhne J and Faddoul J B. 2018. Chargrid: towards understanding 2D documents//Proceedings of 2018 Conference on Empirical Methods in Natural Language Processing. Brussels, Belgium: Association for Computational Linguistics: 4459-4469 [DOI: 10.18653/v1/D18-1476http://dx.doi.org/10.18653/v1/D18-1476]
Kim G, Hong T, Yim M, Nam J, Park J, Yim J, Hwang W, Yun S, Han D and Park S. 2022. OCR-free document understanding transformer//Proceedings of the 17th European Conference on Computer Vision. Tel Aviv, Israel: Springer: 498-517 [DOI: 10.1007/978-3-031-19815-1_29http://dx.doi.org/10.1007/978-3-031-19815-1_29]
Kimura F, Takashina K, Tsuruoka S and Miyake Y. 1987. Modified quadratic discriminant functions and the application to Chinese character recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-9(1): 149-153 [DOI: 10.1109/TPAMI.1987.4767881http://dx.doi.org/10.1109/TPAMI.1987.4767881]
Kise K, Sato A and Iwata M. 1998. Segmentation of page images using the area Voronoi diagram. Computer Vision and Image Understanding, 70(3): 370-382 [DOI: 10.1006/cviu.1998.0684http://dx.doi.org/10.1006/cviu.1998.0684]
Krishnamoorthy M, Nagy G, Seth S and Viswanathan M. 1993. Syntactic segmentation and labeling of digitized pages from technical journals. IEEE Transactions on Pattern Analysis and Machine Intelligence, 15(7): 737-747 [DOI: 10.1109/34.221173http://dx.doi.org/10.1109/34.221173]
Le A D, Indurkhya B and Nakagawa M. 2019. Pattern generation strategies for improving recognition of handwritten mathematical expressions. Pattern Recognition Letters, 128: 255-262 [DOI: 10.1016/j.patrec.2019.09.002http://dx.doi.org/10.1016/j.patrec.2019.09.002]
LeCun Y, Bottou L, Bengio Y and Haffner P. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11): 2278-2324 [DOI: 10.1109/5.726791http://dx.doi.org/10.1109/5.726791]
Li B H, Yuan Y, Liang D K, Liu X, Ji Z L, Bai J F, Liu W Y and Bai Z. 2022a. When counting meets HMER: counting-aware network for handwritten mathematical expression recognition//Proceedings of the 17th European Conference on Computer Vision. Tel Aviv, Israel: Springer: 197-214 [DOI: 10.1007/978-3-031-19815-1_12http://dx.doi.org/10.1007/978-3-031-19815-1_12]
Li H, Wang P and Shen C H. 2017. Towards end-to-end text spotting with convolutional recurrent neural networks//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 5248-5256 [DOI: 10.1109/ICCV.2017.560http://dx.doi.org/10.1109/ICCV.2017.560]
Li H, Wang P, Shen C H and Zhang G Y. 2019a. Show, attend and read: a simple and strong baseline for irregular text recognition//Proceedings of the 33rd AAAI Conference on Artificial Intelligence. Honolulu, USA: AAAI: 8610-8617 [DOI: 10.1609/aaai.v33i01.33018610http://dx.doi.org/10.1609/aaai.v33i01.33018610]
Li X H, Yin F, Dai H S and Liu C L. 2022b. Table structure recognition and form parsing by end-to-end object detection and relation parsing. Pattern Recognition, 132: #108946 [DOI: 10.1016/j.patcog.2022.108946http://dx.doi.org/10.1016/j.patcog.2022.108946]
Li X H, Yin F and Liu C L. 2020. Page segmentation using convolutional neural network and graphical model//Proceedings of the 14th IAPR International Workshop on Document Analysis Systems. Wuhan, China: Springer: 231-245 [DOI: 10.1007/978-3-030-57058-3_17http://dx.doi.org/10.1007/978-3-030-57058-3_17]
Li X H, Yin F, Xue T, Liu L, Ogier J M and Liu C L. 2019b. Instance aware document image segmentation using label pyramid networks and deep watershed transformation//Proceedings of 2019 International Conference on Document Analysis and Recognition. Sydney, Australia: IEEE: 514-519 [DOI: 10.1109/ICDAR.2019.00088http://dx.doi.org/10.1109/ICDAR.2019.00088]
Li Y B, Huang Y L, Zhu Z Y, Pan L M, Huang Y S, Du L, Tang Z and Gao L C. 2021. Rethinking table structure recognition using sequence labeling methods//Proceedings of the 16th International Conference on Document Analysis and Recognition. Lausanne, Switzerland: Springer: 541-553 [DOI: 10.1007/978-3-030-86331-9_35http://dx.doi.org/10.1007/978-3-030-86331-9_35]
Liao M H, Lyu P Y, He M H, Yao C, Wu W H and Bai X. 2021. Mask TextSpotter: an end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(2): 532-548 [DOI: 10.1109/TPAMI.2019.2937086http://dx.doi.org/10.1109/TPAMI.2019.2937086]
Liao M H, Pang G, Huang J, Hassner T and Bai X. 2020a. Mask TextSpotter v3: segmentation proposal network for robust scene text spotting//Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer: 706-722 [DOI: 10.1007/978-3-030-58621-8_41http://dx.doi.org/10.1007/978-3-030-58621-8_41]
Liao M H, Shi B G and Bai X. 2018. Textboxes++: a single-shot oriented scene text detector. IEEE Transactions on Image Processing, 27(8): 3676-3690 [DOI: 10.1109/TIP.2018.2825107http://dx.doi.org/10.1109/TIP.2018.2825107]
Liao M H, Shi B G, Bai X, Wang W G and Liu W Y. 2017. Textboxes: a fast text detector with a single deep neural network//Proceedings of the 31st AAAI Conference on Artificial Intelligence, San Francisco, USA: AAAI: 4161-4167 [DOI: 10.1609/aaai.v31i1.11196http://dx.doi.org/10.1609/aaai.v31i1.11196]
Liao M H, Wan Z Y, Yao C, Chen K and Bai X. 2020b. Real-time scene text detection with differentiable binarization//Proceedings of the 34th AAAI Conference on Artificial Intelligence, New York, USA: AAAI: 11474-11481 [DOI: 10.1609/aaai.v34i07.6812http://dx.doi.org/10.1609/aaai.v34i07.6812]
Liao M H, Zou Z S, Wan Z Y, Yao C and Bai X. 2023. Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(1): 919-931 [DOI: 10.1109/TPAMI.2022.3155612http://dx.doi.org/10.1109/TPAMI.2022.3155612]
Lin W H, Gao Q F, Sun L, Zhong Z Y, Hu K, Ren Q and Huo Q. 2021. ViBERTgrid: a jointly trained multi-modal 2D document representation for key information extraction from documents//Proceedings of the 16th International Conference on Document Analysis and Recognition. Lausanne, Switzerland: Springer: 548-563 [DOI: 10.1007/978-3-030-86549-8_35http://dx.doi.org/10.1007/978-3-030-86549-8_35]
Lin W H, Sun Z, Ma C X, Li M Z, Wang J W, Sun L and Huo Q. 2022. TSRFormer: Table structure recognition with Transformers//Proceedings of the 30th ACM International Conference on Multimedia. Lisboa, Portugal: ACM: 6473-6482 [DOI: 10.1145/3503161.3548038http://dx.doi.org/10.1145/3503161.3548038]
Liu C, Yang C and Yin X C. 2022a. Open-set text recognition via character-context decoupling//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE: 4513-4522 [DOI: 10.1109/CVPR52688.2022.00448http://dx.doi.org/10.1109/CVPR52688.2022.00448]
Liu C L. 2019. Document image recognition: retrospective and perspective of technology. Frontiers of Data and Computing, 1(2): 17-25
刘成林. 2019. 文档图像识别技术回顾与展望. 数据与计算发展前沿, 1(2): 17-25 [DOI: 10.11871/jfdc.issn.2096-742X.2019.02.002http://dx.doi.org/10.11871/jfdc.issn.2096-742X.2019.02.002]
Liu C L, Yin F, Wang D H and Wang Q F. 2011. CASIA online and offline Chinese handwriting databases//Proceedings of the 11th International Conference on Document Analysis and Recognition, Beijing, China: IEEE: 37-41 [DOI: 10.1109/ICDAR.2011.17http://dx.doi.org/10.1109/ICDAR.2011.17]
Liu C L, Yin F, Wang D H and Wang Q F. 2013. Online and offline handwritten Chinese character recognition: benchmarking on new databases. Pattern Recognition, 46(1): 155-162 [DOI: 10.1016/j.patcog.2012.06.021http://dx.doi.org/10.1016/j.patcog.2012.06.021]
Liu C Y, Chen X X, Luo C J, Jin L W, Xue Y and Liu Y L. 2021. Deep learning methods for scene text detection and recognition. Journal of Image and Graphics, 26(6): 1330-1367
刘崇宇, 陈晓雪, 罗灿杰, 金连文, 薛洋, 刘禹良. 2021. 自然场景文本检测与识别的深度学习方法. 中国图象图形学报, 26(6): 1330-1367 [DOI: 10.11834/jig.210044http://dx.doi.org/10.11834/jig.210044]
Liu H, Li X, Liu B, Jiang D Q, Liu Y S, Ren B and Ji R R. 2021. Show, read and reason: table structure recognition with flexible context aggregator//Proceedings of the 29th ACM International Conference on Multimedia. 2021: 1084-1092 [DOI: 10.1145/3474085.3481534http://dx.doi.org/10.1145/3474085.3481534]
Liu X B, Liang D, Yan S, Chen D G, Qiao Y and Yan J J. 2018. FOTS: fast oriented text spotting with a unified network//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 5676-5685 [DOI: 10.1109/CVPR.2018.00595http://dx.doi.org/10.1109/CVPR.2018.00595]
Liu X J, Gao F Y, Zhang Q and Zhao H S. 2019a. Graph convolution for multimodal information extraction from visually rich documents//Proceedings of 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Minneapolis, Minnesota, USA: Association for Computational Linguistics, 32-39 [DOI: 10.18653/v1/N19-2005http://dx.doi.org/10.18653/v1/N19-2005]
Liu Y L, Chen H, Shen C H, He T, Jin L W and Wang L W. 2020. ABCNet: real-time scene text spotting with adaptive Bezier-curve network//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 9806-9815 [DOI: 10.1109/CVPR42600.2020.00983http://dx.doi.org/10.1109/CVPR42600.2020.00983]
Liu Y L and Jin L W. 2017. Deep matching prior network: toward tighter multi-oriented text detection//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 3454-3461 [DOI: 10.1109/CVPR.2017.368http://dx.doi.org/10.1109/CVPR.2017.368]
Liu Y L, Jin L W, Zhang S T, Luo C J and Zhang S. 2019b. Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition, 90: 337-345 [DOI: 10.1016/j.patcog.2019.02.002http://dx.doi.org/10.1016/j.patcog.2019.02.002]
Liu Y L, Shen C H, Jin L W, He T, Chen P, Liu C Y and Chen H. 2022b. ABCNet v2: adaptive Bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(11): 8048-8064 [DOI: 10.1109/TPAMI.2021.3107437http://dx.doi.org/10.1109/TPAMI.2021.3107437]
Long S B, He X and Yao C. 2021. Scene text detection and recognition: the deep learning era. International Journal of Computer Vision, 129: 161-184 [DOI: 10.1007/s11263-020-01369-0http://dx.doi.org/10.1007/s11263-020-01369-0]
Lucas S M, Panaretos A, Sosa L, Tang A, Wong S and Young R. 2003. ICDAR 2003 robust reading competitions//Proceedings of the 7th International Conference on Document Analysis and Recognition. Edinburgh, UK: IEEE: 682-687 [DOI: 10.1109/ICDAR.2003.1227749http://dx.doi.org/10.1109/ICDAR.2003.1227749]
Luo C J, Jin L W and Chen J D. 2022. SimAN: exploring self-supervised representation learning of scene text via similarity-aware normalization//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE: 1029-1038 [DOI: 10.1109/CVPR52688.2022.00111http://dx.doi.org/10.1109/CVPR52688.2022.00111]
Luo C J, Jin L W and Sun Z H. 2019. MORAN: a multi-object rectified attention network for scene text recognition. Pattern Recognition, 90: 109-118 [DOI: 10.1016/j.patcog.2019.01.020http://dx.doi.org/10.1016/j.patcog.2019.01.020]
Luong M T, Nguyen T D and Kan M Y. 2010. Logical structure recovery in scholarly articles with rich document features. International Journal of Digital Library Systems, 1(4): 1-23
Ma K, Das S, Shu Z X and Samaras D. 2022. Learning from documents in the wild to improve document unwarping//Proceedings of ACM SIGGRAPH 2022 Conference Proceedings. Vancouver, Canada: ACM: #34 [DOI: 10.1145/3528233.3530756http://dx.doi.org/10.1145/3528233.3530756]
MacLean S and Labahn G. 2013. A new approach for recognizing handwritten mathematics using relational grammars and fuzzy sets. International Journal on Document Analysis and Recognition (IJDAR), 16(2): 139-163 [DOI: 10.1007/s10032-012-0184-xhttp://dx.doi.org/10.1007/s10032-012-0184-x]
Mahdavi M, Zanibbi R, Mouchere H, Viard-Gaudin C and Garain U. 2019. ICDAR 2019 CROHME + TFD: competition on recognition of handwritten mathematical expressions and typeset formula detection//Proceedings of the 15th International Conference on Document Analysis and Recognition. Sydney, Australia: IEEE: 1533-1538 [DOI: 10.1109/ICDAR.2019.00247http://dx.doi.org/10.1109/ICDAR.2019.00247]
Manke S, Finke M and Waibel A. 1995. Npen++: a writer independent, large vocabulary on-line cursive handwriting recognition system//Proceedings of the 3rd International Conference on Document Analysis and Recognition. Montreal, Canada: IEEE, 403-408 [DOI: 10.1109/ICDAR.1995.599023http://dx.doi.org/10.1109/ICDAR.1995.599023]
Marti U V and Bunke H. 2002. The IAM-database: an English sentence database for offline handwriting recognition. International Journal on Document Analysis and Recognition, 5(1): 39-46 [DOI: 10.1007/s100320200071http://dx.doi.org/10.1007/s100320200071]
Matas J, Chum O, Urban M and Pajdla T. 2004. Robust wide-baseline stereo from maximally stable extremal regions. Image and Vision Computing, 22(10): 761-767 [DOI: 10.1016/j.imavis.2004.02.006http://dx.doi.org/10.1016/j.imavis.2004.02.006]
Meng G F, Pan C H, Xiang S M, Duan J Y and Zheng N N. 2012. Metric rectification of curved document images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(4): 707-722 [DOI: 10.1109/TPAMI.2011.151http://dx.doi.org/10.1109/TPAMI.2011.151]
Meng G F, Xiang S M, Zheng N N and Pan C H. 2013. Nonparametric illumination correction for scanned document images via convex hulls. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(7): 1730-1743 [DOI: 10.1109/TPAMI.2012.251http://dx.doi.org/10.1109/TPAMI.2012.251]
Mermelstein P and Eden M. 1964. Experiments on computer recognition of connected handwritten words. Information and Control, 7(2): 255-270 [DOI: 10.1016/S0019-9958(64)90142-1http://dx.doi.org/10.1016/S0019-9958(64)90142-1]
Mitra M and Chaudhuri B B. 2000. Information retrieval from documents: a survey. Information Retrieval, 2(2): 141-163 [DOI: 10.1023/A:1009950525500http://dx.doi.org/10.1023/A:1009950525500]
Mori S, Suen C Y and Yamamoto K. 1992. Historical review of OCR research and development. Proceedings of the IEEE, 80(7): 1029-1058 [DOI: 10.1109/5.156468http://dx.doi.org/10.1109/5.156468]
Motahari H, Duffy N, Bennett P and Bedrax-Weiss T. 2020. A report on the first workshop on document intelligence (DI) at NeurIPS 2019. ACM SIGKDD Explorations Newsletter, 22(2): 8-11 [DOI: 10.1145/3447556.3447563http://dx.doi.org/10.1145/3447556.3447563]
Moysset B, Kermorvant C and Wolf C. 2017. Full-page text recognition: learning where to start and when to stop//Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition. Kyoto, Japan: IEEE 871-876 [DOI: 10.1109/ICDAR.2017.147http://dx.doi.org/10.1109/ICDAR.2017.147]
Murase H. 1988. Online recognition of free-format Japanese handwritings//Proceedings of the 9th International Conference on Pattern Recognition. Rome, Italy: IEEE: 1143-1147 [DOI: 10.1109/ICPR.1988.28462http://dx.doi.org/10.1109/ICPR.1988.28462]
Nagy G. 2000. Twenty years of document image analysis in PAMI. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(1): 38-62 [DOI: 10.1109/34.824820http://dx.doi.org/10.1109/34.824820]
Nagy G. 2016. Disruptive developments in document recognition. Pattern Recognition Letters, 79: 106-112 [DOI: 10.1016/j.patrec.2015.11.024http://dx.doi.org/10.1016/j.patrec.2015.11.024]
Nayef N, Yin F, Bizid I, Choi H, Feng Y, Karatzas D, Luo Z B, Pal U, Rigaud C, Chazalon J, Khlif W, Luqman M M, Burie J C, Liu C L and Ogier J M. 2017. ICDAR2017 robust reading challenge on multi-lingual scene text detection and script identification – RRC-MLT//Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition. Kyoto, Japan: IEEE: 1454-1459 [DOI: 10.1109/ICDAR.2017.237http://dx.doi.org/10.1109/ICDAR.2017.237]
O'Gorman L. 1993. The document spectrum for page layout analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 15(11): 1162-1173 [DOI: 10.1109/34.244677http://dx.doi.org/10.1109/34.244677]
Pan Y F, Hou X W and Liu C L. 2011. A hybrid approach to detect and localize texts in natural scene images. IEEE Transactions on Image Processing, 20(3): 800-813 [DOI: 10.1109/TIP.2010.2070803http://dx.doi.org/10.1109/TIP.2010.2070803]
Peng D Z, Jin L W, Liu Y L, Luo C J and Lai S X. 2022c. PageNet: towards end-to-end weakly supervised page-level handwritten Chinese text recognition. International Journal of Computer Vision, 130(11): 2623-2645 [DOI: 10.1007/s11263-022-01654-0http://dx.doi.org/10.1007/s11263-022-01654-0]
Peng D Z, Jin L, Ma W H, Xie C Y, Zhang H S, Zhu S G and Li J. 2022a. Recognition of handwritten Chinese text by segmentation: a segment-annotation-free approach. IEEE Transactions on Multimedia [DOI: 10.1109/TMM.2022.3146771http://dx.doi.org/10.1109/TMM.2022.3146771]
Peng D Z, Wang X Y, Liu Y L, Zhang J X, Huang M X, Lai S X, Li J, Zhu S G, Lin D H, Shen C H, Bai X and Jin L W. 2022b. SPTS: single-point text spotting//Proceedings of the 30th ACM International Conference on Multimedia. Lisboa, Portugal: ACM, 4272-4281 [DOI: 10.1145/3503161.3547942http://dx.doi.org/10.1145/3503161.3547942]
Peng D Z, Xie C Y, Li H L, Jin L W, Xie Z C, Ding K, Huang Y C and Wu Y Q. 2021. Towards Fast, Accurate and compact online handwritten Chinese text recognition//Proceedings of the 16th International Conference on Document Analysis and Recognition. Lausanne, Switzerland: Springer: 157-171 [DOI: 10.1007/978-3-030-86334-0_11http://dx.doi.org/10.1007/978-3-030-86334-0_11]
Plamondon R and Srihari S N. 2000. Online and off-line handwriting recognition: a comprehensive survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(1): 63-84 [DOI: 10.1109/34.824821http://dx.doi.org/10.1109/34.824821]
Qiao L, Chen Y, Cheng Z Z, Xu L L, Niu Y, Pu S L and Wu F. 2021. MANGO: a mask attention guided one-stage scene text spotter//Proceedings of the 35th AAAI Conference on Artificial Intelligence. 35(3): 2467-2476 [DOI: 10.1609/aaai.v35i3.16348http://dx.doi.org/10.1609/aaai.v35i3.16348]
Renton G, Soullard Y, Chatelain C, Adam S, Kermorvant C and Paquet T. 2018. Fully convolutional network with dilated convolutions for handwritten text line segmentation. International Journal on Document Analysis and Recognition (IJDAR), 21(3): 177-186 [DOI: 10.1007/s10032-018-0304-3http://dx.doi.org/10.1007/s10032-018-0304-3]
Riba P, Dutta A, Goldmann L, Fornés A, Ramos O and Lladós J. 2019. Table detection in invoice documents by graph neural networks//Proceedings of the 15th International Conference on Document Analysis and Recognition. Sydney, Australia: IEEE: 122-127 [DOI: 10.1109/ICDAR.2019.00028http://dx.doi.org/10.1109/ICDAR.2019.00028]
Saitoh T, Tachikawa M and Yamaai T. 1993. Document image segmentation and text area ordering//Proceedings of the 2nd International Conference on Document Analysis and Recognition. Tsukuba, Japan: IEEE: 1993: 323-329 [DOI: 10.1109/ICDAR.1993.395722http://dx.doi.org/10.1109/ICDAR.1993.395722]
Sauvola J and Pietikäinen M. 2000. Adaptive document image binarization. Pattern Recognition, 33(2): 225-236 [DOI: 10.1016/S0031-3203(99)00055-2http://dx.doi.org/10.1016/S0031-3203(99)00055-2]
Sayre K M. 1973. Machine recognition of handwritten words: a project report. Pattern Recognition, 5(3): 213-228 [DOI: 10.1016/0031-3203(73)90044-7http://dx.doi.org/10.1016/0031-3203(73)90044-7]
Schäfer B, Keuper M and Stuckenschmidt H. 2021. Arrow R-CNN for handwritten diagram recognition. International Journal on Document Analysis and Recognition (IJDAR), 24(1): 3-17 [DOI: 10.1007/s10032-020-00361-1http://dx.doi.org/10.1007/s10032-020-00361-1]
Shafait F, Keysers D and Breuel T. 2008. Performance evaluation and benchmarking of six-page segmentation algorithms. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(6): 941-954 [DOI: 10.1109/TPAMI.2007.70837http://dx.doi.org/10.1109/TPAMI.2007.70837]
Shi B G, Bai X and Belongie S. 2017a. Detecting oriented text in natural images by linking segments//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 3482-3490 [DOI: 10.1109/CVPR.2017.371http://dx.doi.org/10.1109/CVPR.2017.371]
Shi B G, Bai X and Yao C. 2017b. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(11): 2298-2304 [DOI: 10.1109/TPAMI.2016.2646371http://dx.doi.org/10.1109/TPAMI.2016.2646371]
Shi B G, Wang X G, Lyu P, Yao C and Bai X. 2016. Robust scene text recognition with automatic rectification. Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 4168-4176 [DOI: 10.1109/CVPR.2016.452http://dx.doi.org/10.1109/CVPR.2016.452]
Shi B G, Yang M K, Wang X G, Lyu P Y, Yao C and Bai X. 2019. ASTER: an attentional scene text recognizer with flexible rectification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(9): 2035-2048 [DOI: 10.1109/TPAMI.2018.2848939http://dx.doi.org/10.1109/TPAMI.2018.2848939]
Shi C, Xu C H, Bi H Y, Cheng Y Z, Li Y T and Zhang H H. 2022. Lateral feature enhancement network for page object detection. IEEE Transactions on Instrumentation and Measurement, 71: #5020310 [DOI: 10.1109/TIM.2022.3201546http://dx.doi.org/10.1109/TIM.2022.3201546]
Siddiqui S A, Fateh I A, Rizvi S T R, Dengel A and Ahmed S. 2019. DeepTabStr: deep learning based table structure recognition//Proceedings of the 15th International Conference on Document Analysis and Recognition. Sydney, Australia: IEEE: 1403-1409 [DOI: 10.1109/ICDAR.2019.00226http://dx.doi.org/10.1109/ICDAR.2019.00226]
Song S B, Wan J Q, Yang Z B, Tang J, Cheng W Q, Bai X and Yao C. 2022. Vision-language pre-training for boosting scene text detectors//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE: 15660-15670 [DOI: 10.1109/CVPR52688.2022.01523http://dx.doi.org/10.1109/CVPR52688.2022.01523]
Srihari S N. 1993. Recognition of handwritten and machine-printed text for postal address interpretation. Pattern Recognition Letters, 14(4): 291-302 [DOI: 10.1016/0167-8655(93)90095-Uhttp://dx.doi.org/10.1016/0167-8655(93)90095-U]
Suen C Y, Berthod M and Mori S. 1980. Automatic recognition of handprinted characters —— the state of the art. Proceedings of the IEEE, 68(4): 469-487 [DOI: 10.1109/PROC.1980.11675http://dx.doi.org/10.1109/PROC.1980.11675]
Suen C Y, Lam L, Guillevic G, Strathy N W, Cheriet M, Said J N and Fan R. 1996. Bank check processing system. International Journal of Imaging Systems and Technology, 7(4): 392-403 [DOI: 10.1002/(SICI)1098-1098(199624)7:4<392::AID-IMA14>3.0.CO;2-Yhttp://dx.doi.org/10.1002/(SICI)1098-1098(199624)7:4<392::AID-IMA14>3.0.CO;2-Y]
Sun Y P, Ni Z H, Chng C K, Liu Y L, Luo C J, Ng C C, Han J Y, Ding E, Liu J T, Karatzas D, Chan C S and Jin L W. 2019. ICDAR 2019 competition on large-scale street view text with partial labeling-RRC-LSVT//Proceedings of the 15th International Conference on Document Analysis and Recognition. Sydney, Australia: IEEE: 1557-1562 [DOI: 10.1109/ICDAR.2019.00250http://dx.doi.org/10.1109/ICDAR.2019.00250]
Tang G Z, Xie L L, Jin L W, Wang J P, Chen J D, Xu Z, Wang Q Y, Wu Y Q and Li H. 2021. MatchVIE: exploiting match relevancy between entities for visual information extraction//Proceedings of the 30th International Joint Conference on Artificial Intelligence. Montreal, Canada: ijcai.org: 1039-1045 [DOI: 10.24963/ijcai.2021/144http://dx.doi.org/10.24963/ijcai.2021/144]
Tang J, Yang Z B, Wang Y P, Zheng Q, Xu Y C and Bai X. 2019. SegLink++: detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern Recognition, 96: #106954 [DOI: 10.1016/j.patcog.2019.06.020http://dx.doi.org/10.1016/j.patcog.2019.06.020]
Tang J Q, Zhang W Q, Liu H Y, Yang M K, Jiang B, Hu G L and Bai X. 2022. Few could be better than all: feature sampling and grouping for scene text detection//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE: 4553-4562 [DOI: 10.1109/CVPR52688.2022.00452http://dx.doi.org/10.1109/CVPR52688.2022.00452]
Tensmeyer C and Wigington C. 2019. Training full-page handwritten text recognition models without annotated line breaks//Proceedings of the 15th International Conference on Document Analysis and Recognition. Sydney, Australia: IEEE: 1-8 [DOI: 10.1109/ICDAR.2019.00011http://dx.doi.org/10.1109/ICDAR.2019.00011]
Tian Z, Huang W L, He T, He P and Qiao Y. 2016. Detecting text in natural image with connectionist text proposal network//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands: Springer: 56-72 [DOI: 10.1007/978-3-319-46484-8_4http://dx.doi.org/10.1007/978-3-319-46484-8_4]
Tito R, Mathew M, Jawahar C V, Valveny E and Karatzas D. 2021. ICDAR 2021 competition on document visual question answering//Proceedings of the 16th International Conference on Document Analysis and Recognition. Springer, Lausanne, Switzerland: Springer: 635-649 [DOI: 10.1007/978-3-030-86337-1_42http://dx.doi.org/10.1007/978-3-030-86337-1_42]
Wakahara T, Murase H and Odaka K. 1992. On-line handwriting recognition. Proceedings of the IEEE, 80(7): 1181-1194 [DOI: 10.1109/5.156478http://dx.doi.org/10.1109/5.156478]
Wang J O, Liu C Y, Jin L W, Tang G Z, Zhang J X, Zhang S T, Wang Q Y, Wu Y Q and Cai M X. 2021a. Towards robust visual information extraction in real world: new dataset and novel solution//Proceedings of the 35th AAAI Conference on Artificial Intelligence. [s.l.]: AAAI: 2738-2745 [DOI: 10.1609/aaai.v35i4.16378http://dx.doi.org/10.1609/aaai.v35i4.16378]
Wang J P, Jin L W and Ding K. 2022a. LiLT: a simple yet effective language-independent layout transformer for structured document understanding//Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. Dublin, Ireland: Association for Computational Linguistics: 7747-7757 [DOI: 10.18653/v1/2022.acl-long.534http://dx.doi.org/10.18653/v1/2022.acl-long.534]
Wang Q F, Yin F and Liu C L. 2012. Handwritten Chinese text recognition by integrating multiple contexts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(8): 1469-1481 [DOI: 10.1109/TPAMI.2011.264http://dx.doi.org/10.1109/TPAMI.2011.264]
Wang T Q, Jiang X Y and Liu C L. 2022b. Query pixel guided stroke extraction with model-based matching for offline handwritten Chinese characters. Pattern Recognition, 123: #108416 [DOI: 10.1016/j.patcog.2021.108416http://dx.doi.org/10.1016/j.patcog.2021.108416]
Wang T W, Zhu Y Z, Jin L W, Luo C J, Chen X X, Wu Y Q, Wang Q Y and Cai M X. 2020a. Decoupled attention network for text recognition//Proceedings of the 634th AAAI Conference on Artificial Intelligence. New York, USA: AAAI: 12216-12224 [DOI: 10.1609/aaai.v34i07.6903http://dx.doi.org/10.1609/aaai.v34i07.6903]
Wang W H, Xie E Z, Li X, Hou W B, Lu T, Yu G and Shao S. 2019. Shape robust text detection with progressive scale expansion network//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 9328-9337 [DOI: 10.1109/CVPR.2019.00956http://dx.doi.org/10.1109/CVPR.2019.00956]
Wang Y X, Xie H T, Zha Z J, Xing M T, Fu Z L and Zhang Y D. 2020b. ContourNet: taking a further step toward accurate arbitrary-shaped scene text detection//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 11750-11759 [DOI: 10.1109/CVPR42600.2020.01177http://dx.doi.org/10.1109/CVPR42600.2020.01177]
Wang Z L, Xu Y H, Cui L, Shang J B and Wei F R. 2021b. LayoutReader: pre-training of text and layout for reading order detection//Proceedings of 2021 Conference on Empirical Methods in Natural Language Processing. Punta Cana, Dominican Republic: Association for Computational Linguistics: 4735-4744 [DOI: 10.18653/v1/2021.emnlp-main.389http://dx.doi.org/10.18653/v1/2021.emnlp-main.389]
Wigington C, Tensmeyer C, Davis B, Barrett W, Price B and Cohen S. 2018. Start, follow, read: end-to-end full-page handwriting recognition//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 372-388 [DOI: 10.1007/978-3-030-01231-1_23http://dx.doi.org/10.1007/978-3-030-01231-1_23]
Wong K Y, Casey R G and Wahl F M. 1982. Document analysis system. IBM Journal of Research and Development, 26(6): 647-656 [DOI: 10.1147/rd.266.0647http://dx.doi.org/10.1147/rd.266.0647]
Wu J W, Yin F, Zhang Y M, Zhang X Y and Liu C L. 2020a. Handwritten mathematical expression recognition via paired adversarial learning. International Journal of Computer Vision, 128(10): 2386-2401 [DOI: 10.1007/s11263-020-01291-5http://dx.doi.org/10.1007/s11263-020-01291-5]
Wu J W, Yin F, Zhang Y M, Zhang X Y and Liu C L. 2021. Graph-to-graph: towards accurate and interpretable online handwritten mathematical expression recognition//Proceedings of the 35th AAAI Conference on Artificial Intelligence. [s.l.]: AAAI: 2925-2933 [DOI: 10.1609/aaai.v35i4.16399http://dx.doi.org/10.1609/aaai.v35i4.16399]
Wu S H, Wang J P, Ma W H and Jin L W. 2020b. Precise detection of Chinese characters in historical documents with deep reinforcement learning. Pattern Recognition, 107: #107503 [DOI: 10.1016/j.patcog.2020.107503http://dx.doi.org/10.1016/j.patcog.2020.107503]
Wu Y C, Yin F and Liu C L. 2017. Improving handwritten Chinese text recognition using neural network language models and convolutional neural network shape models. Pattern Recognition, 65: 251-264 [DOI: 10.1016/j.patcog.2016.12.026http://dx.doi.org/10.1016/j.patcog.2016.12.026]
Xiao X F, Jin L W, Yang Y F, Yang W X, Sun J and Chang T H. 2017. Building fast and compact convolutional neural networks for offline handwritten Chinese character recognition. Pattern Recognition, 72: 72-81 [DOI: 10.1016/j.patcog.2017.06.032http://dx.doi.org/10.1016/j.patcog.2017.06.032]
Xie G W, Yin F, Zhang X Y and Liu C L. 2021. Document dewarping with control points//Proceedings of the 16th International Conference on Document Analysis and Recognition. Lausanne, Switzerland: Springer: 466-480 [DOI: 10.1007/978-3-030-86549-8_30http://dx.doi.org/10.1007/978-3-030-86549-8_30]
Xie X D, Fu L, Zhang Z F, Wang Z W and Bai X. 2022. Toward understanding WordArt: corner-guided transformer for scene text recognition//Proceedings of the 17th European Conference on Computer Vision. Tel Aviv, Israel: Springer: 303-321 [DOI: 10.1007/978-3-031-19815-1_18http://dx.doi.org/10.1007/978-3-031-19815-1_18]
Xie Z C, Sun Z H, Jin L W, Ni H and Lyons T. 2018. Learning spatial-semantic context with fully convolutional recurrent network for online handwritten Chinese text recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(8): 1903-1917 [DOI: 10.1109/TPAMI.2017.2732978http://dx.doi.org/10.1109/TPAMI.2017.2732978]
Xing L J, Tian Z, Huang W L and Scott M. 2019. Convolutional character networks//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 9125-9135 [DOI: 10.1109/ICCV.2019.00922http://dx.doi.org/10.1109/ICCV.2019.00922]
Xu Y, Xu Y H, Lv T C, Cui L, Wei F R, Wang G X, Lu Y J, Florencio D, Zhang C, Che W X, Zhang M and Zhou L D. 2021b. LayoutLMv2: multi-modal pre-training for visually-rich document understanding//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. [s.l.]: Association for Computational Linguistics: 2579-2591 [DOI: 10.18653/v1/2021.acl-long.201http://dx.doi.org/10.18653/v1/2021.acl-long.201]
Xu Y C, Wang Y K, Zhou W, Wang Y P, Yang Z B and Bai X. 2019. TextField: learning a deep direction field for irregular scene text detection. IEEE Transactions on Image Processing, 28(11): 5566-5579 [DOI: 10.1109/TIP.2019.2900589http://dx.doi.org/10.1109/TIP.2019.2900589]
Xu Y H, Li M H, Cui L, Huang S H, Wei F R and Zhou M. 2020. LayoutLM: pre-training of text and layout for document image understanding//Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. [s.l.]: ACM: 1192-1200 [DOI: 10.1145/3394486.3403172http://dx.doi.org/10.1145/3394486.3403172]
Xu Y H, Lv T C, Cui L, Wang G X, Lu Y J, Florencio D, Zhang C and Wei F R. 2021a. LayoutXLM: multimodal pre-training for multilingual visually-rich document understanding [EB/OL]. [2022-11-01]. https://arxiv.org/pdf/2104.08836.pdfhttps://arxiv.org/pdf/2104.08836.pdf
Xue C H, Tian Z C, Zhan F N, Lu S J and Bai S. 2022. Fourier document restoration for robust document dewarping and recognition//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE: 4563-4572 [DOI: 10.1109/CVPR52688.2022.00453http://dx.doi.org/10.1109/CVPR52688.2022.00453]
Yaeger L S, Webb B J and Lyon R F. 1998. Combining neural networks and context-driven search for online, printed handwriting recognition in the Newton. AI Magazine, 19(1): 73-89 [DOI: 10.1609/aimag.v19i1.1355http://dx.doi.org/10.1609/aimag.v19i1.1355]
Yang M K, Liao M H, Lu P, Wang J, Zhu S G, Luo H L, Tian Q and Bai X. 2022. Reading and writing: discriminative and generative modeling for self-supervised text recognition//Proceedings of the 30th ACM International Conference on Multimedia. Lisboa, Portugal: ACM: 4214-4223 [DOI: 10.1145/3503161.3547784http://dx.doi.org/10.1145/3503161.3547784]
Yang X, Yumer E, Asente P, Kraley M, Kifer D and Lee Giles C. 2017. Learning to extract semantic structure from documents using multimodal fully convolutional neural networks//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 4342-4351 [DOI: 10.1109/CVPR.2017.462http://dx.doi.org/10.1109/CVPR.2017.462]
Yao C, Bai X, Liu W Y, Ma Y and Tu Z W. 2012. Detecting texts of arbitrary orientations in natural images//Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence, USA: IEEE: 1083-1090 [DOI: 10.1109/CVPR.2012.6247787http://dx.doi.org/10.1109/CVPR.2012.6247787]
Ye Q X and Doermann D. 2015. Text detection and recognition in imagery: a survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(7): 1480-1500 [DOI: 10.1109/TPAMI.2014.2366765http://dx.doi.org/10.1109/TPAMI.2014.2366765]
Yepes A J, Zhong P and Burdick D. 2021. ICDAR 2021 competition on scientific literature parsing//Proceedings of the 16th International Conference on Document Analysis and Recognition. Lausanne, Switzerland: Springer: 605-617 [DOI: 10.1007/978-3-030-86337-1_40http://dx.doi.org/10.1007/978-3-030-86337-1_40]
Yin F and Liu C L. 2009. Handwritten Chinese text line segmentation by clustering with distance metric learning. Pattern Recognition, 42(12): 3146-3157 [DOI: 10.1016/j.patcog.2008.12.013http://dx.doi.org/10.1016/j.patcog.2008.12.013]
Yin F, Wu Y C, Zhang X Y and Liu C L. 2017. Scene text recognition with sliding convolutional Character models [EB/OL]. [2022-11-01]. https://arxiv.org/pdf/1709.01727.pdfhttps://arxiv.org/pdf/1709.01727.pdf
Yin X C, Yin X W, Huang K Z and Hao H W. 2014. Robust text detection in natural scene images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(5): 970-983 [DOI: 10.1109/TPAMI.2013.182http://dx.doi.org/10.1109/TPAMI.2013.182]
Yousef M and Bishop T E. 2020. OrigamiNet: weakly-supervised, segmentation-free, one-step, full page text recognition by learning to unfold//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 14698-14707 [DOI: 10.1109/CVPR42600.2020.01472http://dx.doi.org/10.1109/CVPR42600.2020.01472]
Yuan Y, Liu X, Dikubab W, Liu H, Ji Z L, Wu Z Q and Bai X. 2022. Syntax-aware network for handwritten mathematical expression recognition//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE: 4543-4552 [DOI: 10.1109/CVPR52688.2022.00451http://dx.doi.org/10.1109/CVPR52688.2022.00451]
Yun X L, Zhang Y M, Yin F and Liu C L. 2022. Instance GNN: a learning framework for joint symbol segmentation and recognition in online handwritten diagrams. IEEE Transactions on Multimedia, 24: 2580-2594 [DOI: 10.1109/TMM.2021.3087000http://dx.doi.org/10.1109/TMM.2021.3087000]
Zahour A, Taconet B, Mercy P and Ramdane S. 2001. Arabic hand-written text-line extraction//Proceedings of the 6th International Conference on Document Analysis and Recognition. Seattle, USA: IEEE 281-285 [DOI: 10.1109/ICDAR.2001.953799http://dx.doi.org/10.1109/ICDAR.2001.953799]
Zhang J S, Du J and Dai L R. 2019b. Track, attend, and parse (TAP): an end-to-end framework for online handwritten mathematical expression recognition. IEEE Transactions on Multimedia, 21(1): 221-233 [DOI: 10.1109/TMM.2018.2844689http://dx.doi.org/10.1109/TMM.2018.2844689]
Zhang J S, Du J, Yang Y X, Song Y Z, Wei S and Dai L R. 2020a. A tree-structured decoder for image-to-markup generation//Proceedings of the 37th International Conference on Machine Learning. [s.l.]: PMLR: 11076-11085
Zhang J S, Du J, Zhang S L, Liu D, Hu Y L, Hu J S, Wei S and Dai L R. 2017a. Watch, attend and parse: an end-to-end neural network based approach to handwritten mathematical expression recognition. Pattern Recognition, 71: 196-206 [DOI: 10.1016/j.patcog.2017.06.017http://dx.doi.org/10.1016/j.patcog.2017.06.017]
Zhang J X, Luo C J, Jin L W, Guo F J and Ding K. 2022a. Marior: margin removal and iterative content rectification for document dewarping in the wild//Proceedings of the 30th ACM International Conference on Multimedia. Lisboa, Portugal: ACM: 2805-2815 [DOI: 10.1145/3503161.3548214http://dx.doi.org/10.1145/3503161.3548214]
Zhang L, Zhang Y and Tan C. 2008. An improved physically-based method for geometric restoration of distorted document images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(4): 728-734 [DOI: 10.1109/TPAMI.2007.70831http://dx.doi.org/10.1109/TPAMI.2007.70831]
Zhang M L, Yin F, Hao Y H and Liu C L. 2022b. Plane geometry diagram parsing//Proceedings of the 31st International Joint Conference on Artificial Intelligence. Vienna, Austria: ijcai.org: 1636-1643 [DOI: 10.24963/ijcai.2022/228http://dx.doi.org/10.24963/ijcai.2022/228]
Zhang P, Xu Y L, Cheng Z Z, Pu S L, Lu J, Qiao L, Niu Y and Wu F. 2020b. TRIE: end-to-end text reading and information extraction for document understanding//Proceedings of the 28th ACM International Conference on Multimedia. Seattle, USA: ACM: 1413-1422 [DOI: 10.1145/3394171.3413900http://dx.doi.org/10.1145/3394171.3413900]
Zhang R, Zhou Y S, Jiang Q Y, Song Q, Li N, Zhou K, Wang L, Wang D, Liao M H, Yang M K, Bai X, Shi B G, Karatzas D, Lu S J, Jawahar C V. 2019a. ICDAR 2019 robust reading challenge on reading Chinese text on signboard//Proceedings of the 15th International Conference on Document Analysis and Recognition. Sydney, Australia: IEEE: 1577-1581 [DOI: 10.1109/ICDAR.2019.00253http://dx.doi.org/10.1109/ICDAR.2019.00253]
Zhang S X, Zhu X B, Hou J B, Liu C, Yang C, Wang H F and Yin X C. 2020c. Deep relational reasoning graph network for arbitrary shape text detection//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 9696-9705 [DOI: 10.1109/CVPR42600.2020.00972http://dx.doi.org/10.1109/CVPR42600.2020.00972]
Zhang S X, Zhu X B, Yang C, Wang H F and Yin X C. 2021a. Adaptive boundary proposal network for arbitrary shape text detection//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE: 1285-1294 [DOI: 10.1109/ICCV48922.2021.00134http://dx.doi.org/10.1109/ICCV48922.2021.00134]
Zhang X, Su Y W, Tripathi S and Tu Z W. 2022c. Text spotting transformers//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE: 9509-9518 [DOI: 10.1109/CVPR52688.2022.00930http://dx.doi.org/10.1109/CVPR52688.2022.00930]
Zhang X Y, Bengio Y and Liu C L. 2017b. Online and offline handwritten Chinese character recognition: a comprehensive study and new benchmark. Pattern Recognition, 61: 348-360 [DOI: 10.1016/j.patcog.2016.08.005http://dx.doi.org/10.1016/j.patcog.2016.08.005]
Zhang Y K, Zhang H, Liu Y G and Liu C L. 2021b. Oracle character recognition based on cross-modal deep metric learning. Acta Automatica Sinica, 47(4): 791-800
张颐康, 张恒, 刘永革, 刘成林. 2021b. 基于跨模态深度度量学习的甲骨文字识别. 自动化学报, 47(4): 791-800 [DOI: 10.16383/j.aas.c200443http://dx.doi.org/10.16383/j.aas.c200443]
Zhang Z, Zhang C Q, Shen W, Yao C, Liu W Y and Bai X. 2016. Multi-oriented text detection with fully convolutional networks//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 4159-4167 [DOI: 10.1109/CVPR.2016.451http://dx.doi.org/10.1109/CVPR.2016.451]
Zhao M B, Feng W, Yin F, Zhang X Y and Liu C L. 2022. Mixed-supervised scene text detection with expectation-maximization algorithm. IEEE Transactions on Image Processing, 31: 5513-5528 [DOI: 10.1109/TIP.2022.3197987http://dx.doi.org/10.1109/TIP.2022.3197987]
Zhao W Q, Gao L C, Yan Z Y, Peng S, Du L and Zhang Z Y. 2021. Handwritten mathematical expression recognition with bidirectionally trained Transformer//Proceedings of the 16th International Conference on Document Analysis and Recognition. Lausanne, Switzerland: Springer: 570-584 [DOI: 10.1007/978-3-030-86331-9_37http://dx.doi.org/10.1007/978-3-030-86331-9_37]
Zhong X, ShafieiBavani E and Yepes A J. 2020. Image-based table recognition: data, model, and evaluation//Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer: 564-580 [DOI: 10.1007/978-3-030-58589-1_34http://dx.doi.org/10.1007/978-3-030-58589-1_34]
Zhong X, Tang J B and Yepes A J. 2019. PubLayNet: largest dataset ever for document layout analysis//Proceedings of the 15th International Conference on Document Analysis and Recognition. Sydney, Australia: IEEE: 1015-1022 [DOI: 10.1109/ICDAR.2019.00166http://dx.doi.org/10.1109/ICDAR.2019.00166]
Zhou X D, Wang D H, Tian F, Liu C L and Nakagawa M. 2013. Handwritten Chinese/Japanese text recognition using semi-Markov conditional random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(10): 2413-2426 [DOI: 10.1109/TPAMI.2013.49http://dx.doi.org/10.1109/TPAMI.2013.49]
Zhou X Y, Yao C, Wen H, Wang Y Z, Zhou S C, He W R and Liang J J. 2017. EAST: an efficient and accurate scene text detector//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 2642-2651 [DOI: 10.1109/CVPR.2017.283http://dx.doi.org/10.1109/CVPR.2017.283]
Zhu Y Q, Chen J Y, Liang L Y, Kuang Z H, Jin L W and Zhang W. 2021. Fourier contour embedding for arbitrary-shaped text detection//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 3122-3130 [DOI: 10.1109/CVPR46437.2021.00314http://dx.doi.org/10.1109/CVPR46437.2021.00314]
Zhu Y Y, Yao C and Bai X. 2016. Scene text detection and recognition: recent advances and future trends. Frontiers of Computer Science, 10(1): 19-36 [DOI: 10.1007/s11704-015-4488-0http://dx.doi.org/10.1007/s11704-015-4488-0]
相关作者
相关机构