表格识别技术研究进展
A survey on table recognition technology
- 2022年27卷第6期 页码:1898-1917
纸质出版日期: 2022-06-16 ,
录用日期: 2022-03-30
DOI: 10.11834/jig.220152
移动端阅览
浏览全部资源
扫码关注微信
纸质出版日期: 2022-06-16 ,
录用日期: 2022-03-30
移动端阅览
高良才, 李一博, 都林, 张新鹏, 朱子仪, 卢宁, 金连文, 黄永帅, 汤帜. 表格识别技术研究进展[J]. 中国图象图形学报, 2022,27(6):1898-1917.
Liangcai Gao, Yibo Li, Lin Du, Xinpeng Zhang, Ziyi Zhu, Ning Lu, Lianwen Jin, Yongshuai Huang, Zhi Tang. A survey on table recognition technology[J]. Journal of Image and Graphics, 2022,27(6):1898-1917.
表格广泛存在于科技文献、财务报表、报纸杂志等各类文档中,用于紧凑地存储和展现数据,蕴含着大量有用信息。表格识别是表格信息再利用的基础,具有重要的应用价值,也一直是模式识别领域的研究热点之一。随着深度学习的发展,针对表格识别的新研究和新方法纷纷涌现。然而,由于表格应用场景广泛、样式众多、图像质量参差不齐等因素,表格识别领域仍然存在着大量问题亟需解决。为了更好地总结前人工作,为后续研究提供支持,本文围绕表格区域检测、结构识别和内容识别等3个表格识别子任务,从传统方法、深度学习方法等方面,综述该领域国内外的发展历史和最新进展。梳理了表格识别相关数据集及评测标准,并基于主流数据集和标准,分别对表格区域检测、结构识别、表格信息抽取的典型方法进行了性能比较。然后,对比分析了国内相对于国外,在表格识别方面的研究进展与水平。最后,结合表格识别领域目前面临的主要困难与挑战,对未来的研究趋势和技术发展目标进行了展望。
Optimal data access and massive data derived information extraction has become an essential technology nowadays. Table-related paradigm is a kind of efficient structure for the clustered data designation
display and analysis. It has been widely used on Internet and vertical fields due to its simplicity and intuitiveness. Computer based tables
pictures or portable document format(PDF) files as the carrier will cause structural information loss. It is challenged to trace the original tables back. Inefficient manual based input has more errors. Therefore
two decadal researches have focused on the computer automatic recognition of tables issues originated from documents or PDF files and multiple tasks loop. To obtain the table structure and content and extract specific information
table recognition aims to detect the table via the image or PDF and other electronic files automatically. It is composed of three tasks recognition types like table area detection
table structure recognition and table content recognition. There are two types of existed table recognition methods in common. One is based on optical character recognition (OCR) technology to recognize the characters in the table directly
and then analyze and identify the position of the characters. The other one is to obtain the key intersections and the positions of each frameline of the table through digital image processing to analyze the relationship between cells in the table. However
most of these methods are only applicable to a single field and have poor generalization ability. At the same time
it is constrained of some experience-based threshold design. Thanks to the development of deep learning technology
semantic segmentation algorithm
object detection algorithm
text sequence generation algorithm
pre training model and related technologies facilitates technical problem solving for table recognition. Most deep learning algorithms have carried out adaptive transformation according to the characteristics of tables
which can improve the effect of table recognition. It uses object detection algorithm for table detection task. Object detection and text sequence generation algorithms are mainly used for table structure recognition. Most pre training models have played a good effect on the aspect of table content recognition. But many table structure recognition algorithms still cannot handle these well for wireless tables and less line tables. On the aspects of table images of natural scenes
the relevant algorithms have challenged to achieve the annotation in practice due to the influence of brightness and inclination. A large number of datasets provide sufficient data support for the training of table recognition model and improve the effect of the model currently. However
there are some challenging issues between these datasets multiple annotation formats and different evaluation indicators. Some datasets provide the hyper text markup language(HTML) code of the structure only in the field of table structure recognition and some datasets provide the location of cells in the table and the corresponding row and column attributes. Some datasets are based on the position of cells or the content of cells in accordance with evaluation indicators. Some datasets are based on the adjacent relationship between cells or the editing distance between HTML codes for the recognition of table structure. Our research critically reviews the research situation of three sub tasks like table detection
structure recognition and content recognition and try to predict future research direction further.
表格区域检测表格结构识别表格内容识别深度学习单元格识别表格信息抽取
table area detectiontable structure recognitiontable content recognitiondeep learningtable cell recognitiontable information extraction
Abdallah A, Berendeyev A, Nuradin I and Nurseitov D. 2022. TNCR: table net detection and classification dataset. Neurocomputing, 473: 79-97[DOI: 10.1016/j.neucom.2021.11.101]
Agarwal M, Mondal A and Jawahar C V. 2020. CDeC-Net: composite deformable cascade network for table detection in document images//Proceedings of 2020 25th International Conference on Pattern Recognition (ICPR). Milan, Italy: IEEE: 9491-9498[DOI: 10.1109/ICPR48806.2021.9411922http://dx.doi.org/10.1109/ICPR48806.2021.9411922]
Amano A, Asada N, Motoyama T, Sumiyoshi T and Suzuki K. 2001. Table form document synthesis by grammar-based structure analysis//Proceedings of the 6th International Conference on Document Analysis and Recognition. Seattle, USA: IEEE: 533-537[DOI: 10.1109/ICDAR.2001.953846http://dx.doi.org/10.1109/ICDAR.2001.953846]
Cai Z W and Vasconcelos N. 2018. Cascade R-CNN: delving into high quality object detection//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 6154-6162[DOI: 10.1109/CVPR.2018.00644http://dx.doi.org/10.1109/CVPR.2018.00644]
Chen S, Jaisimha M Y, Ha J, Haralick R M and Phillips I T. 1996. User's Reference Manual for the UW English. Washington: Seattle University
Chi Z W, Huang H Y, Xu H D, Yu H J, Yin W X and Mao X L. 2019. Complicated table structure recognition[EB/OL]. [2022-01-25].https://arxiv.org/pdf/1908.04729.pdfhttps://arxiv.org/pdf/1908.04729.pdf
Chiu J P C and Nichols E. 2016. Named entity recognition with bidirectional LSTM-CNNs[EB/OL]. [2022-01-25].https://arxiv.org/pdf/1511.08308.pdfhttps://arxiv.org/pdf/1511.08308.pdf
Clinchant S, Déjean H, Meunier J L, Lang E M and Kleber F. 2018. Comparing machine learning approaches for table recognition in historical register books//Proceedings of the 13th IAPR International Workshop on Document Analysis Systems (DAS). Vienna, Austria: IEEE: 133-138[DOI: 10.1109/DAS.2018.44http://dx.doi.org/10.1109/DAS.2018.44]
Crestan E and Pantel P. 2011. Web-scale table census and classification//Proceedings of the 4th ACM international conference on Web search and data mining (WSDM'11). New York, USA: Association for Computing Machinery: 545-554[DOI: 10.1145/1935826.1935904http://dx.doi.org/10.1145/1935826.1935904]
Dai J F, Qi H Z, Xiong Y W, Li Y, Zhang G D, Hu H and Wei Y C. 2017. Deformable convolutional networks//Proceedings of 2017 IEEE International Conference on Computer Vision (ICCV). Venice, Italy: IEEE: 764-773[DOI: 10.1109/ICCV.2017.89http://dx.doi.org/10.1109/ICCV.2017.89]
Deng Y, Kanervisto A, Ling J and Rush A M. 2017. Image-to-markup generation with coarse-to-fine attention[EB/OL]. [2022-01-25].https://arxiv.org/pdf/1609.04938.pdfhttps://arxiv.org/pdf/1609.04938.pdf
Deng Y T, Rosenberg D and Mann G. 2019. Challenges in end-to-end neural scientific table recognition//Proceedings of 2019 International Conference on Document Analysis and Recognition (ICDAR). Sydney, Australia: IEEE: 894-901[DOI: 10.1109/ICDAR.2019.00148http://dx.doi.org/10.1109/ICDAR.2019.00148]
Denk T I and Reisswig C. 2019. BERTgrid: contextualized embedding for 2D document representation and understanding[EB/OL]. [2022-01-25].https://arxiv.org/pdf/1909.04948.pdfhttps://arxiv.org/pdf/1909.04948.pdf
Desai H, Kayal P and Singh M. 2021. TabLeX: a benchmark dataset for structure and content information extraction from scientific tables//Proceedings of the 16th International Conference on Document Analysis and Recognition—ICDAR 2021. Lausanne, Switzerland: Springer: 554-569[DOI: 10.1007/978-3-030-86331-9_36http://dx.doi.org/10.1007/978-3-030-86331-9_36]
Devlin J, Chang M W, Lee K and Toutanova K. 2018. BERT: pre-training of deep bidirectional transformers for language understanding//Proceedings of 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Minneapolis, Minnesota: Association for Computational Linguistics: 4171-4186[DOI: 10.18653/v1/N19-1423http://dx.doi.org/10.18653/v1/N19-1423]
Dong H Y, Liu S J, Fu Z Y, Han S and Zhang D M. 2019. Semantic structure extraction for spreadsheet tables with a multi-task learning architecture//Proceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS). Vancouver, Canada: [s. n.]
Duan K W, Bai S, Xie L X, Qi H G, Huang Q M and Tian Q. 2019. CenterNet: keypoint triplets for object detection//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Seoul, Korea (South): IEEE: 6568-6577[DOI: 10.1109/ICCV.2019.00667http://dx.doi.org/10.1109/ICCV.2019.00667]
Fang J, Gao L C, Bai K, Qiu R H, Tao X and Tang Z. 2011. A table detection method for multipage PDF documents via visual seperators and tabular structures//Proceedings of 2011 International Conference on Document Analysis and Recognition. Beijing, China: IEEE: 779-783[DOI: 10.1109/ICDAR.2011.304http://dx.doi.org/10.1109/ICDAR.2011.304]
Fang J, Mitra P, Tang Z and Giles C L. 2012a. Table header detection and classification//Proceedings of the 26th AAAI Conference on Artificial Intelligence (AAAI'12). Toronto Ontario, Canada: AAAI Press: 599-605[DOI: 10.5555/2900728.2900814http://dx.doi.org/10.5555/2900728.2900814]
Fang J, Tao X, Tang Z, Qiu R H and Liu Y. 2012b. Dataset, ground-truth and performance metrics for table detection evaluation//Proceedings of the 10th IAPR International Workshop on Document Analysis Systems. Gold Coast, Australia: IEEE: 445-449[DOI: 10.1109/DAS.2012.29http://dx.doi.org/10.1109/DAS.2012.29]
Gao L C, Huang Y L, Dejean H, Meunier J L, Yan Q Q, Fang Y, Kleber F and Lang E. 2019. ICDAR 2019 competition on table detection and recognition (cTDaR)//Proceedings of 2019 International Conference on Document Analysis and Recognition (ICDAR). Sydney, Australia: IEEE: 1510-1515[DOI: 10.1109/ICDAR.2019.00243http://dx.doi.org/10.1109/ICDAR.2019.00243]
Gao L C, Yi X H, Jiang Z R, Hao L P and Tang Z. 2017. ICDAR2017 competition on page object detection//Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). Kyoto, Japan: IEEE: 1417-1422[DOI: 10.1109/ICDAR.2017.231http://dx.doi.org/10.1109/ICDAR.2017.231]
Garncarek Ł, Powalski R, Stanisławek T, Topolski B, Halama P, Turski M and Graliński F. 2020. LAMBERT: layout-aware (language) Modeling for information extraction[EB/OL]. [2022-01-25].https://arxiv.org/pdf/2002.08087.pdfhttps://arxiv.org/pdf/2002.08087.pdf
Gilani A, Qasim S R, Malik I and Shafait F. 2017. Table detection using deep learning//Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). Kyoto, Japan: IEEE: 771-776[DOI: 10.1109/ICDAR.2017.131http://dx.doi.org/10.1109/ICDAR.2017.131]
Göbel M, Hassan T, Oro E and Orsi G. 2013. ICDAR 2013 table competition//Proceedings of the 12th International Conference on Document Analysis and Recognition. Washington, USA: IEEE: 1449-1453[DOI: 10.1109/ICDAR.2013.292http://dx.doi.org/10.1109/ICDAR.2013.292]
Gol M G, Pujara J and Szekely P. 2019. Tabular cell classification using pre-trained cell embeddings//Proceedings of 2019 IEEE International Conference on Data Mining (ICDM). Beijing, China: IEEE: 230-239[DOI: 10.1109/ICDM.2019.00033http://dx.doi.org/10.1109/ICDM.2019.00033]
Graliński F, Stanislawek T, Wróblewska A, Lipiński D, Kaliska A, Rosalska P, Topolski B and Biecek P. 2020. Kleister: a novel task for information extraction involving long documents with complex layout[EB/OL]. [2022-01-25].https://arxiv.org/pdf/2003.02356.pdfhttps://arxiv.org/pdf/2003.02356.pdf
Hassan T and Baumgartner R. 2007. Table recognition and understanding from PDF files//Proceedings of 9th International Conference on Document Analysis and Recognition. Curitiba, Brazil: IEEE: 1143-1147[DOI: 10.1109/ICDAR.2007.4377094http://dx.doi.org/10.1109/ICDAR.2007.4377094]
He D F, Cohen S, Price B, Kifer D and Giles C L. 2017a. Multi-scale multi-task FCN for semantic page segmentation and table detection//Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). Kyoto, Japan: IEEE: 254-261[DOI: 10.1109/ICDAR.2017.50http://dx.doi.org/10.1109/ICDAR.2017.50]
He K M, Gkioxari G, Dollár P and Girshick R. 2017b. Mask R-CNN//Proceedings of 2017 IEEE International Conference on Computer Vision (ICCV). Venice, Italy: IEEE: 2980-2988[DOI: 10.1109/ICCV.2017.322http://dx.doi.org/10.1109/ICCV.2017.322]
He K M, Zhang X Y, Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA: IEEE: 770-778[DOI: 10.1109/CVPR.2016.90http://dx.doi.org/10.1109/CVPR.2016.90]
He Y L, Qi X B, Ye J Q, Gao P, Chen Y H, Li B C, Tang X and Xiao R. 2021. PingAn-VCGroup's solution for ICDAR 2021 competition on scientific table image recognition to latex[EB/OL]. [2022-01-26].https://arxiv.org/pdf/2105.01846.pdfhttps://arxiv.org/pdf/2105.01846.pdf
Hirayama Y. 1995. A method for table structure analysis using DP matching//Proceedings of the 3rd International Conference on Document Analysis and Recognition. Montreal, Canada: IEEE: 583-586[DOI: 10.1109/ICDAR.1995.601964http://dx.doi.org/10.1109/ICDAR.1995.601964]
Hochreiter S and Schmidhuber J. 1997. Long short-term memory. Neural Computation, 9(8): 1735-1780[DOI: 10.1162/neco.1997.9.8.1735]
Hong T, Kim D, Ji M G, Hwang W, Nam D and Park S. 2021. BROS: a pre-trained language model focusing on text and layout for better key information extraction from documents[EB/OL]. [2022-01-26].https://arxiv.org/pdf/2108.04539v4.pdfhttps://arxiv.org/pdf/2108.04539v4.pdf
Huang Y L, Yan Q Q, Li Y B, Chen Y F, Wang X, Gao L C and Tang Z. 2019a. A YOLO-based table detection method//Proceedings of 2019 International Conference on Document Analysis and Recognition (ICDAR). Sydney, Australia: IEEE: 813-818[DOI: 10.1109/ICDAR.2019.00135http://dx.doi.org/10.1109/ICDAR.2019.00135]
Huang Z, Chen K, He J H, Bai X, Karatzas D, Lu S J and Jawahar C V. 2019b. ICDAR2019 Competition on scanned receipt OCR and information extraction//Proceedings of 2019 International Conference on Document Analysis and Recognition (ICDAR). Sydney, Australia: IEEE: 1516-1520[DOI: 10.1109/ICDAR.2019.00244http://dx.doi.org/10.1109/ICDAR.2019.00244]
Hwang W, Yim J, Park S, Yang S and Seo M. 2021. Spatial dependency parsing for semi-structured document information extraction//Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. [s. l.]: Association for Computational Linguistics: 330-343[DOI: 10.18653/v1/2021.findings-acl.28http://dx.doi.org/10.18653/v1/2021.findings-acl.28]
Ishitani Y, Fume K and Sumita K. 2005. Table structure analysis based on cell classification and cell modification for XML document transformation//Proceedings of the 8th International Conference on Document Analysis and Recognition (ICDAR'05). Seoul, Korea (South): IEEE: 1247-1252[DOI: 10.1109/ICDAR.2005.225http://dx.doi.org/10.1109/ICDAR.2005.225]
Itonori K. 1993. Table structure recognition based on textblock arrangement and ruled line position//Proceedings of the 2nd International Conference on Document Analysis and Recognition (ICDAR'93). Tsukuba, Japan: IEEE: 765-768[DOI: 10.1109/ICDAR.1993.395625http://dx.doi.org/10.1109/ICDAR.1993.395625]
Jaume G, Ekenel K H and Thiran J P. 2019. FUNSD: a dataset for form understanding in noisy scanned documents//Proceedings of 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW). Sydney, Australia: IEEE: 1-6[DOI: 10.1109/ICDARW.2019.10029http://dx.doi.org/10.1109/ICDARW.2019.10029]
Jia J, Zhai G T, Ren P, Zhang J H, Gao Z P, Min X K and Yang X K. 2020. Tiny-BDN: an efficient and compact barcode detection network. IEEE Journal of Selected Topics in Signal Processing, 14(4): 688-699[DOI: 10.1109/JSTSP.2020.2976566]
Jimeno-Yepes A, Zhong P and Burdick D. 2021. ICDAR 2021 competition on scientific literature parsing//Proceedings of the 16th International Conference on Document Analysis and Recognition (ICDAR). Lausanne, Switzerland: Springer: 605-617[DOI: 10.1007/978-3-030-86337-1_40http://dx.doi.org/10.1007/978-3-030-86337-1_40]
Katti A R, Reisswig C, Guder C, Brarda S, Bickel S, Höhne J and Faddoul J B. 2018. Chargrid: towards understanding 2D documents//Proceedings of 2018 Conference on Empirical Methods in Natural Language Processing. Brussels, Belgium: Association for Computational Linguistics: 4459-4469[DOI: 10.18653/v1/D18-1476http://dx.doi.org/10.18653/v1/D18-1476]
Kavasidis I, Pino C, Palazzo S, Rundo F, Giordano D, Messina P and Spampinato C. 2019. A saliency-based convolutional neural network for table and chart detection in digitized documents//Ricci E, Rota BulòS, Snoek C, Lanz O, Messelodi S and Sebe N,eds. Image Analysis and Processing—ICIAP. Cham: Springer: 292-302[DOI: 10.1007/978-3-030-30645-8_27http://dx.doi.org/10.1007/978-3-030-30645-8_27]
Kerroumi M, Sayem O and Shabou A. 2021. VisualWordGrid: information extraction from scanned documents using a multimodal approach//Barney Smith E H, Pal U, eds. Document Analysis and Recognition-ICDAR 2021 Workshops. Cham: Springer: 389-402[DOI: 10.1007/978-3-030-86159-9_28http://dx.doi.org/10.1007/978-3-030-86159-9_28]
Khan S A, Khalid S M D, Shahzad M A and Shafait F. 2019. Table structure extraction with bi-directional gated recurrent unit networks//Proceedings of 2019 International Conference on Document Analysis and Recognition (ICDAR). Sydney, Australia: IEEE: 1366-1371[DOI: 10.1109/ICDAR.2019.00220http://dx.doi.org/10.1109/ICDAR.2019.00220]
Kieninger T and Dengel A. 2001. Applying the T-Recs table recognition system to the business letter domain//Proceedings of the 6th International Conference on Document Analysis and Recognition. Seattle, USA: IEEE: 518-522[DOI: 10.1109/ICDAR.2001.953843http://dx.doi.org/10.1109/ICDAR.2001.953843]
Kieninger T G. 1998. Table structure recognition based on robust block segmentation//Proceedings of SPIE 3305, Document Recognition V. San Jose, USA: SPIE[DOI: 10.1117/12.304642http://dx.doi.org/10.1117/12.304642]
Koci E, Thiele M, Romero O and Lehner W. 2016. A machine learning approach for layout inference in spreadsheets//Proceedings of the 8th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2016). Lda, Setubal, Portugal: SCITEPRESS-Science and Technology Publications: 77-88[DOI: 10.5220/0006052200770088http://dx.doi.org/10.5220/0006052200770088]
Koci E, Thiele M, Rehak J, Romero O and Lehner W. 2019. DECO: a dataset of annotated spreadsheets for layout and table recognition//Proceedings of 2019 International Conference on Document Analysis and Recognition (ICDAR). Sydney, Australia: IEEE: 1280-1285[DOI: 10.1109/ICDAR.2019.00207http://dx.doi.org/10.1109/ICDAR.2019.00207]
Law H and Deng J. 2018. CornerNet: detecting objects as paired keypoints//Proceedings of the 15th European Conference on Computer Vision-ECCV 2018. Munich, Germany: Springer: 765-781[DOI: 10.1007/978-3-030-01264-9_45http://dx.doi.org/10.1007/978-3-030-01264-9_45]
Li J F, Wang K, Hao S Q and Wang Q R. 2012. Location and recognition of free tables in form//Zhang W, ed. Software Engineering and Knowledge Engineering: Theory and Practice. Berlin, Heidelberg: Springer: 685-692[DOI: 10.1007/978-3-642-29455-6_94http://dx.doi.org/10.1007/978-3-642-29455-6_94]
Li M H, Cui L, Huang S H, Wei F R, Zhou M and Li Z J. 2020. Tablebank: table benchmark for image-based table detection and recognition//Proceedings of the 12th Language Resources and Evaluation Conference (LREC). Marseille, France: European Language Resources Association: 1918-1925
Li X H, Yin F, Zhang X Y and Liu C L. 2021a. Adaptive scaling for archival table structure recognition//Proceedings of the 16th International Conference on Document Analysis and Recognition (ICDAR). Lausanne, Switzerland: Springer: 80-95[DOI: 10.1007/978-3-030-86549-8_6http://dx.doi.org/10.1007/978-3-030-86549-8_6]
Li Y B, Gao L C, Tang Z, Yan Q Q and Huang Y L. 2019. A GAN-based feature generator for table detection//Proceedings of 2019 International Conference on Document Analysis and Recognition (ICDAR). Sydney, Australia: IEEE: 763-768[DOI: 10.1109/ICDAR.2019.00127http://dx.doi.org/10.1109/ICDAR.2019.00127]
Li Y B, Huang Y L, Zhu Z Y, Pan L M, Huang Y S, Du L, Tang Z and Gao L C. 2021b. Rethinking table structure recognition using sequence labeling methods//Proceedings of the 16th International Conference on Document Analysis and Recognition (ICDAR). Lausanne, Switzerland: Springer: 541-553[DOI: 10.1007/978-3-030-86331-9_35http://dx.doi.org/10.1007/978-3-030-86331-9_35]
Li Y L, Qian Y X, Yu Y C, Qin X M, Zhang C Q, Liu Y, Yao K, Han J Y, Liu J T and Ding E R. 2021d. StrucTexT: structured text understanding with multi-modal transformers//Proceedings of the 29th ACM International Conference on Multimedia. [s. l.]: Association for Computing Machinery: 1912-1920[DOI: 10.1145/3474085.3475345http://dx.doi.org/10.1145/3474085.3475345]
Li Y R, Huang Z, Yan J C, Zhou Y, Ye F and Liu X H. 2021c. GFTE: graph-based financial table extraction//Del Bimbo A, Cucchiara R, Sclaroff S, Farinella G M, Mei T, Bertini M, Escalante H J and Vezzani R, eds. International Conference on Pattern Recognition (ICPR). Cham: Springer: 644-658[DOI: 10.1007/978-3-030-68790-8_50http://dx.doi.org/10.1007/978-3-030-68790-8_50]
Liao M H, Wan Z Y, Yao C, Chen K and Bai X. 2020. Real-time scene text detection with differentiable binarization. Proceedings of the AAAI Conference on Artificial Intelligence, 34(7): 11474-11481[DOI: 10.1609/aaai.v34i07.6812]
Lin T Y, Dollár P, Girshick R, He K M, Hariharan B and Belongie S. 2017. Feature pyramid networks for object detection//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE: 936-944[DOI: 10.1109/CVPR.2017.106http://dx.doi.org/10.1109/CVPR.2017.106]
Lin W H, Gao Q F, Sun L, Zhong Z Y, Hu K, Ren Q and Huo Q. 2021. VIBERTgrid: a jointly trained multi-modal 2 d document representation for key information extraction from documents//Proceedings of the 16th International Conference on Document Analysis and Recognition (ICDAR). Lausanne, Switzerland: Springer: 548-563[DOI: 10.1007/978-3-030-86549-8_35http://dx.doi.org/10.1007/978-3-030-86549-8_35]
Liu J H, Ding X Q and Wu Y S. 1995. Description and recognition of form and automated form data entry//Proceedings of the 3rd International Conference on Document Analysis and Recognition. Montreal, Canada: IEEE[DOI: 10.1109/ICDAR.1995.601963http://dx.doi.org/10.1109/ICDAR.1995.601963]
Liu X J, Gao F Y, Zhang Q and Zhao H S. 2019a. Graph convolution for multimodal information extraction from visually rich documents//Proceedings of 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Industry Papers). Minneapolis, Minnesota, USA: Association for Computational Linguistics: 32-39[DOI: 10.18653/v1/N19-2005http://dx.doi.org/10.18653/v1/N19-2005]
Liu Y H, Ott M, Goyal N, Du J F, Joshi M, Chen D Q, Levy O, Lewis M, Zettlemoyer L and Stoyanov V. 2019b. RoBERTa: a robustly optimized BERT pretraining approach[EB/OL]. [2022-01-26].https://arxiv.org/pdf/1907.11692.pdfhttps://arxiv.org/pdf/1907.11692.pdf
Long J, Shelhamer E and Darrell T. 2015. Fully convolutional networks for semantic segmentation//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston, USA: IEEE: 3431-3440[DOI: 10.1109/CVPR.2015.7298965http://dx.doi.org/10.1109/CVPR.2015.7298965]
Long R J, Wang W, Xue N, Gao F Y, Yang Z B, Wang Y P and Xia G S. 2021. Parsing table structures in the wild//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision (ICCV). Montreal, Canada: IEEE: 924-932[DOI: 10.1109/ICCV48922.2021.00098http://dx.doi.org/10.1109/ICCV48922.2021.00098]
Lu N, Yu W W, Qi X B, Chen Y H, Gong P, Xiao R and Bai X. 2021. MASTER: multi-aspect non-local network for scene text recognition. Pattern Recognition, 117: #107980[DOI: 10.1016/j.patcog.2021.107980]
Ma X Z and Hovy E. 2016. End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Berlin, Germany: Association for Computational Linguistics: 1064-1074[DOI: 10.18653/v1/P16-1101http://dx.doi.org/10.18653/v1/P16-1101]
Melinda L and Bhagvati C. 2019. Parameter-free table detection method//Proceedings of 2019 International Conference on Document Analysis and Recognition (ICDAR). Sydney, Australia: IEEE: 454-460[DOI: 10.1109/ICDAR.2019.00079http://dx.doi.org/10.1109/ICDAR.2019.00079]
Paliwal S S, Vishwanath D, Rahul R, Sharma M and Vig L. 2019. TableNet: deep learning model for end-to-end table detection and tabular data extraction from scanned document images//Proceedings of 2019 International Conference on Document Analysis and Recognition (ICDAR). Sydney, Australia: IEEE: 128-133[DOI: 10.1109/ICDAR.2019.00029http://dx.doi.org/10.1109/ICDAR.2019.00029]
Park S, Shin S, Lee B, Lee J, Surh J, Seo M and Lee H. 2019. CORD: a consolidated receipt dataset for post-OCR parsing//Proceedings of the 33rd Conference on Neural Information Processing Systems. Vancouver, Canada: [s. n.]
Powalski R, Borchmann Ł, Jurkiewicz D, Dwojak T, Pietruszka M and Pałka G. 2021. Going full-TILT boogie on document understanding with text-image-layout transformer//Proceedings of the 16th International Conference on Document Analysis and Recognition—ICDAR 2021. Lausanne, Switzerland: Springer: 732-747[DOI: 10.1007/978-3-030-86331-9_47http://dx.doi.org/10.1007/978-3-030-86331-9_47]
Prasad D, Gadpal A, Kapadni K, Visave M and Sultanpure K. 2020. CascadeTabNet: an approach for end to end table detection and structure recognition from image-based documents//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Seattle, USA: IEEE: 2439-2447[DOI: 10.1109/CVPRW50498.2020.00294http://dx.doi.org/10.1109/CVPRW50498.2020.00294]
Qasim S R, Mahmood H and Shafait F. 2019. Rethinking table recognition using graph neural networks//Proceedings of 2019 International Conference on Document Analysis and Recognition (ICDAR). Sydney, Australia: IEEE: 142-147[DOI: 10.1109/ICDAR.2019.00031http://dx.doi.org/10.1109/ICDAR.2019.00031]
Qian Y J, Santus E, Jin Z J, Guo J and Barzilay R. 2019. GraphIE: a graph-based framework for information extraction//Proceedings of 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Minneapolis, Minnesota: Association for Computational Linguistics: 751-761[DOI: 10.18653/v1/N19-1082http://dx.doi.org/10.18653/v1/N19-1082]
Qiao L, Li Z S, Cheng Z Z, Zhang P, Pu S L, Niu Y, Ren W Q, Tan W M and Wu F. 2021. LGPMA: complicated table structure recognition with local and global pyramid mask alignment//Proceedings of the 16th International Conference on Document Analysis and Recognition (ICDAR). Lausanne, Switzerland: Springer: 99-114[DOI: 10.1007/978-3-030-86549-8_7http://dx.doi.org/10.1007/978-3-030-86549-8_7]
Radford A and Narasimhan K. 2018. Improving language understanding by generative pre-training[EB/OL]. [2022-01-26].http://www.nlpir.org/wordpress/wp-content/uploads/2019/06/Improving-language-understanding-by-generative-pre-training.pdfhttp://www.nlpir.org/wordpress/wp-content/uploads/2019/06/Improving-language-understanding-by-generative-pre-training.pdf
Rahgozar M A, Fan Z G and Rainero E V. 1994. Tabular document recognition//Proceedings of SPIE 2181, Document Recognition. San Jose, USA: SPIE: #171096[DOI: 10.1117/12.171096http://dx.doi.org/10.1117/12.171096]
Raja S, Mondal A and Jawahar C V. 2020. Table structure recognition using top-down and bottom-up cues[EB/OL]. [2022-01-25].https://arxiv.org/pdf/2010.04565.pdfhttps://arxiv.org/pdf/2010.04565.pdf
Ramel J Y, Crucianu M, Vincent N and Faure C. 2003. Detection, extraction and representation of tables//Proceedings of the 7th International Conference on Document Analysis and Recognition, 2003. Proceedings. Edinburgh, UK: IEEE: 374-378[DOI: 10.1109/ICDAR.2003.1227692http://dx.doi.org/10.1109/ICDAR.2003.1227692]
Ren S Q, He K M, Girshick R and Sun J. 2015. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(6): 1137-1149[DOI: 10.1109/TPAMI.2016.2577031]
Riba P, Dutta A, Goldmann L, Fornés A, Ramos O and Lladós J. 2019. Table detection in invoice documents by graph neural networks//Proceedings of 2019 International Conference on Document Analysis and Recognition (ICDAR). Sydney, Australia: IEEE: 122-127[DOI: 10.1109/ICDAR.2019.00028http://dx.doi.org/10.1109/ICDAR.2019.00028]
Ronneberger O, Fischer P and Brox T. 2015. U-Net: convolutional networks for biomedical image segmentation//Navab N, Hornegger J, Wells W and Frangi A, eds. Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015. Cham: Springer: 234-241[DOI: 10.1007/978-3-319-24574-4_28http://dx.doi.org/10.1007/978-3-319-24574-4_28]
Saha R, Mondal A and Jawahar C V. 2019. Graphical object detection in document images//Proceedings of 2019 International Conference on Document Analysis and Recognition (ICDAR). Sydney, Australia: IEEE: 51-58[DOI: 10.1109/ICDAR.2019.00018http://dx.doi.org/10.1109/ICDAR.2019.00018]
Scarselli F, Gori M, Tsoi A C, Hagenbuchner M and Monfardini G. 2009. The graph neural network model. IEEE Transactions on Neural Networks, 20(1): 61-80[DOI: 10.1109/TNN.2008.2005605]
Schreiber S, Agne S, Wolf I, Dengel A and Ahmed S. 2017. DeepDeSRT: deep learning for detection and structure recognition of tables in document images//Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). Kyoto, Japan: IEEE: 1162-1167[DOI: 10.1109/ICDAR.2017.192http://dx.doi.org/10.1109/ICDAR.2017.192]
Seth S and Nagy G. 2013. Segmenting tables via indexing of value cells by table headers//Proceedings of the 12th International Conference on Document Analysis and Recognition. Washington, USA: IEEE: 887-891[DOI: 10.1109/ICDAR.2013.181http://dx.doi.org/10.1109/ICDAR.2013.181]
Shahab A, Shafait F, Kieninger T and Dengel A. 2010. An open approach towards the benchmarking of table structure recognition systems//Proceedings of the 9th IAPR International Workshop on Document Analysis Systems. Boston, USA: ACM: 113-120[DOI: 10.1145/1815330.1815345http://dx.doi.org/10.1145/1815330.1815345]
Shi B G, Bai X and Yao C. 2017. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(11): 2298-2304[DOI: 10.1109/TPAMI.2016.2646371]
Shigarov A, Mikhailov A and Altaev A. 2016. Configurable table structure recognition in untagged PDF documents//Proceedings of 2016 ACM Symposium on Document Engineering (DocEng'16). New York, USA: Association for Computing Machinery: 119-122[DOI: 10.1145/2960811.2967152http://dx.doi.org/10.1145/2960811.2967152]
Siddiqui S A, Fateh I A, Rizvi S T R, Dengel A and Ahmed S. 2019. DeepTabStR: deep learning based table structure recognition//Proceedings of2019 International Conference on Document Analysis and Recognition (ICDAR). Sydney, Australia: IEEE: 1403-1409[DOI: 10.1109/ICDAR.2019.00226http://dx.doi.org/10.1109/ICDAR.2019.00226]
Siddiqui S A, Malik M I, Agne S, Dengel A and Ahmed S. 2018. DeCNT: deep deformable CNN for table detection. IEEE Access, 6: 74151-74161[DOI: 10.1109/ACCESS.2018.2880211]
Siegel N, Lourie N, Power R and Ammar W. 2018. Extracting scientific figures with distantly supervised neural networks//Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries. Fort Worth, USA: ACM: 223-232[DOI: 10.1145/3197026.3197040http://dx.doi.org/10.1145/3197026.3197040]
Simonyan K and Zisserman A. 2014. Very deep convolutional networks for large-scale image recognition[EB/OL]. [2022-01-25].https://arxiv.org/pdf/1409.1556.pdfhttps://arxiv.org/pdf/1409.1556.pdf
Smock B, Pesala R and Abraham R. 2021. PubTables-1M: Towards a universal dataset and metrics for training and evaluating table extraction models[EB/OL]. [2022-01-25].https://arxiv.org/pdf/2110.00061v2.pdfhttps://arxiv.org/pdf/2110.00061v2.pdf
Sun N N, Zhu Y P and Hu X M. 2019. Faster R-CNN based table detection combining corner locating//Proceedings of 2019 International Conference on Document Analysis and Recognition (ICDAR). Sydney, Australia: IEEE: 1314-1319[DOI: 10.1109/ICDAR.2019.00212http://dx.doi.org/10.1109/ICDAR.2019.00212]
Szegedy C, Liu W, Jia Y Q, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V and Rabinovich A. 2015. Going deeper with convolutions//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston, USA: IEEE: 1-9[DOI: 10.1109/CVPR.2015.7298594http://dx.doi.org/10.1109/CVPR.2015.7298594]
Tang G Z, Xie L L, Jin L W, Wang J P, Chen J D, Xu Z, Wang Q Y, Wu Y Q and Li H. 2021. MatchVIE: exploiting match relevancy between entities for visual information extraction//Proceedings of the 30th International Joint Conference on Artificial Intelligence. [s. l.]: IJCAI: 1039-1045[DOI: 10.24963/ijcai.2021/144http://dx.doi.org/10.24963/ijcai.2021/144]
Tensmeyer C, Morariu V I, Price B, Cohen S and Martinez T. 2019. Deep splitting and merging for table structure decomposition//Proceedings of 2019 International Conference on Document Analysis and Recognition (ICDAR). Sydney, Australia: IEEE: 114-121[DOI: 10.1109/ICDAR.2019.00027http://dx.doi.org/10.1109/ICDAR.2019.00027]
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser Ł and Polosukhin I. 2017. Attention is all you need//Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS'17). Red Hook, USA: Curran Associates Inc. : 6000-6010[DOI: 10.5555/3295222.3295349http://dx.doi.org/10.5555/3295222.3295349]
Wang J P, Liu C Y, Jin L W, Tang G Z, Zhang J X, Zhang S T, Wang Q Y, Wu Y Q and Cai W X. 2021a. Towards robust visual information extraction in real world: new dataset and novel solution//Proceedings of the 35th AAAI Conference on Artificial Intelligence (AAAI'21). [s. l.]: AAAI Press, 2738-2745
Wang J P, Wang T W, Tang G Z, Jin L W, Ma W H, Ding K and Huang Y C. 2021b. Tag, copy or predict: a unified weakly-supervised learning framework for visual information extraction using sequences//Proceedings of the 30th International Joint Conference on Artificial Intelligence. [s. l.]: IJCAI: 1082-1090[DOI: 10.24963/ijcai.2021/150http://dx.doi.org/10.24963/ijcai.2021/150]
Wang W, Bi B, Yan M, Wu C, Bao Z Y, Xia J N, Peng L W and Si L. 2020. StructBERT: incorporating language structures into pre-training for deep language understanding[EB/OL]. [2022-01-25].https://arxiv.org/pdf/1908.04577.pdfhttps://arxiv.org/pdf/1908.04577.pdf
Wang W H, Xie E Z, Li X, Hou W B, Lu T, Yu G and Shao S. 2019b. Shape robust text detection with progressive scale expansion network//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE: 9328-9337[DOI: 10.1109/CVPR.2019.00956http://dx.doi.org/10.1109/CVPR.2019.00956]
Wang Y L and Hu J Y. 2002. Detecting tables in HTML documents//Lopresti D, Hu J Y and Kashi R, eds. Document Analysis Systems V. Berlin, Heidelberg: Springer: 249-260[DOI: 10.1007/3-540-45869-7_29http://dx.doi.org/10.1007/3-540-45869-7_29]
Wang Y L, Phillips I T and Haralick R M. 2004. Table structure understanding and its performance evaluation. Pattern Recognition, 37(7): 1479-1497[DOI: 10.1016/j.patcog.2004.01.012]
Wangt Y, Phillipst I T and Haralick R. 2001. Automatic table ground truth generation and a background-analysis-based table structure extraction method//Proceedings of the 6th International Conference on Document Analysis and Recognition. Seattle, USA: IEEE: 528-532[DOI: 10.1109/ICDAR.2001.953845http://dx.doi.org/10.1109/ICDAR.2001.953845]
Watanabe T and Luo Q. 1996 A multilayer recognition method for understanding table-form documents. International Journal of Imaging Systems and Technology, 7(4): 279-288
Watanabe T, Luo Q and Sugie N. 1993a. Structure recognition methods for various types of documents. Machine Vision and Applications, 6(2/3): 163-176[DOI: 10.1007/BF01211939]
Watanabe T, Luo Q and Sugie N. 1993b. Toward a practical document understanding of table-form documents: its framework and knowledge representation//Proceedings of the 2nd International Conference on Document Analysis and Recognition (ICDAR'93). Tsukuba Science City, Japan: IEEE: 510-515[DOI: 10.1109/ICDAR.1993.395684http://dx.doi.org/10.1109/ICDAR.1993.395684]
Wei M X, He Y F and Zhang Q. 2020. Robust layout-aware IE for visually rich documents with pre-trained language models//Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'20). New York, USA: Association for Computing Machinery: 2367-2376[DOI: 10.1145/3397271.3401442http://dx.doi.org/10.1145/3397271.3401442]
Xu Y, Xu Y H, Lv T C, Cui L, Wei F R, Wang G X, Lu Y J, Florencio D, Zhang C, Che W X, Zhang M and Zhou L D. 2022. LayoutLMv2: multi-modal pre-training for visually-rich document understanding[EB/OL]. [2022-01-25].https://arxiv.org/pdf/2012.14740v1.pdfhttps://arxiv.org/pdf/2012.14740v1.pdf
Xu Y H, Li M H, Cui L, Huang S H, Wei F R and Zhou M. 2020. LayoutLM: pre-training of text and layout for document image understanding//Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'20). Virtual Event, USA: Association for Computing Machinery, 1192-1200[DOI: 10.1145/3394486.3403172http://dx.doi.org/10.1145/3394486.3403172]
Xue W Y, Li Q Y and Tao D C. 2019. ReS2TIM: reconstruct syntactic structures from table images//Proceedings of 2019 International Conference on Document Analysis and Recognition (ICDAR). Sydney, Australia: IEEE: 749-755[DOI: 10.1109/ICDAR.2019.00125http://dx.doi.org/10.1109/ICDAR.2019.00125]
Xue W Y, Yu B S, Wang W, Tao D C and Li Q Y. 2021. TGRNet: a table graph reconstruction network for table structure recognition[EB/OL]. [2022-01-25].https://arxiv.org/pdf/2106.10598.pdfhttps://arxiv.org/pdf/2106.10598.pdf
Yu F and Koltun V. 2015. Multi-scale context aggregation by dilated convolutions[EB/OL]. [2022-01-25].https://arxiv.org/pdf/1511.07122.pdfhttps://arxiv.org/pdf/1511.07122.pdf
Yu W W, Lu N, Qi X B, Gong P and Xiao R. 2020. PICK: processing key information extraction from documents using improved graph learning-convolutional networks[EB/OL]. [2022-01-25].https://arxiv.org/pdf/2004.07464.pdfhttps://arxiv.org/pdf/2004.07464.pdf
Zhang P, Li C, Qiao L, Cheng Z Z, Pu S L, Niu Y and Wu F. 2021. VSR: a unified framework for document layout analysis combining vision, semantics and relations[EB/OL]. [2022-01-25].https://arxiv.org/pdf/2105.06220.pdfhttps://arxiv.org/pdf/2105.06220.pdf
Zhang P, Xu Y L, Cheng Z Z, Pu S L, Lu J, Qiao L, Niu Y and Wu F. 2020. TRIE: end-to-end text reading and information extraction for document understanding//Proceedings of the 28th ACM International Conference on Multimedia (MM'20). Seattle, USA: ACM: 1413-1422[DOI: 10.1145/3394171.3413900http://dx.doi.org/10.1145/3394171.3413900]
Zheng X Y, Burdick D, Popa L, Zhong X and Wang N X R. 2020. Global table extractor (GTE): a framework for joint table identification and cell structure recognition using visual context//Proceedings of 2021 IEEE Winter Conference on Applications of Computer Vision (WACV). Waikoloa, USA: IEEE: 697-706[DOI: 10.1109/WACV48630.2021.00074http://dx.doi.org/10.1109/WACV48630.2021.00074]
Zhong X, ShafieiBavani E and Jimeno Yepes A. 2020. Image-based table recognition: data, model, and evaluation//Vedaldi A, Bischof H, Brox T and Frahm J M, eds. Computer Vision-ECCV 2020. Cham: Springer: 564-580[DOI: 10.1007/978-3-030-58589-1_34http://dx.doi.org/10.1007/978-3-030-58589-1_34]
Zhong X, Tang J B and Yepes A J. 2019. PubLayNet: largest dataset ever for document layout analysis//Proceedings of 2019 International Conference on Document Analysis and Recognition (ICDAR). Sydney, Australia: IEEE: 1015-1022[DOI: 10.1109/ICDAR.2019.00166http://dx.doi.org/10.1109/ICDAR.2019.00166]
Zhu Z Y, Gao L C, Li Y B, Huang Y L, Du L, Lu N and Wang X F. 2021. NTable: a dataset for camera-based table detection//Proceedings of the 16th International Conference on Document Analysis and Recognition (ICDAR). Lausanne, Switzerland: Springer: 117-129[DOI: 10.1007/978-3-030-86331-9_8http://dx.doi.org/10.1007/978-3-030-86331-9_8]
Zuyev K. 1997. Table image segmentation//Proceedings of the 4th International Conference on Document Analysis and Recognition. Ulm, Germany: IEEE: 705-708[DOI: 10.1109/ICDAR.1997.620599http://dx.doi.org/10.1109/ICDAR.1997.620599]
相关作者
相关机构