分离复杂背景下的文档图像二值化方法

王红霞; 武甲礼; 陈德山

doi:10.11834/jig.220098

图像处理和编码 | 浏览量 : 0 下载量: 0 CSCD: 0

PDF
导出
分享
收藏
专辑

分离复杂背景下的文档图像二值化方法
A complex background-related binarization method for document-contextual information processing
2023年28卷第7期页码：2011-2025
纸质出版日期： 2023-07-16 ，
DOI： 10.11834/jig.220098
稿件说明：

移动端阅览

王红霞，武甲礼，陈德山. 2023. 分离复杂背景下的文档图像二值化方法. 中国图象图形学报， 28(07):2011-2025

Wang Hongxia， Wu Jiali， Chen Deshan. 2023. A complex background-related binarization method for document-contextual information processing. Journal of Image and Graphics， 28(07):2011-2025
王红霞，武甲礼，陈德山. 2023. 分离复杂背景下的文档图像二值化方法. 中国图象图形学报， 28(07):2011-2025 DOI： 10.11834/jig.220098.

Wang Hongxia， Wu Jiali， Chen Deshan. 2023. A complex background-related binarization method for document-contextual information processing. Journal of Image and Graphics， 28(07):2011-2025 DOI： 10.11834/jig.220098.

摘要

目的

二值化方法的主要依据是像素的颜色和对比度等低级语义特征，辨别出与文字具有相似低级特征的复杂背景是二值化亟待解决的问题。针对文档图像二值化复杂背景分离问题，提出一种分离文档图像复杂背景的二阶段二值化方法。

方法

该方法分为易误判像素筛选和二值化分割两个处理阶段，根据两个阶段的分工构建不同结构的两个网络，前者强化对复杂背景中易误判像素识别和分离能力，后者着重文字像素准确预测，以此提升整个二值化方法在复杂背景图像上的处理效果；两个网络各司其职，可在压缩参数量的前提下出色完成各自任务，进一步提高网络效率。同时，为了增强文字目标细节处理能力，提出一种非对称编码—解码结构，给出两种组合方式。

结果

实验在文本图像二值化比赛（competition on document image binarization，DIBCO）的DIBCO2016、DIBCO2017以及DIBCO2018数据集上与其他方法进行比较，本文方法在DIBCO2018中FM（F-measure）为92.35%，仅比经过特殊预处理的方法差0.17%，综合效果均优于其他方法；在DIBCO2017和DIBCO2016中FM分别为93.46%和92.13%，综合效果在所有方法中最好。实验结果表明，非对称编码—解码结构二值化分割的各项指标均有不同程度的提升。

结论

提出的二阶段方法能够有效区分复杂背景，进一步提升二值化效果，并在DIBCO数据集上取得了优异成绩。开源代码网址为

https://github.com/wjlbnw/Mask_Detail_Net

。

Abstract

Objective

The optical character recognition （OCR） based binarization technique is essential for dealing with complex backgrounds recently. To recognize text in the image faster and more accurately， document information-contextual image binarization is oriented to segment captured color image or generated grayscale image， and a text information only-involved image can be output. Big data-driven massive data storage is requried for changeable text information versions from hard copies to electronic copies. However， huge amount of textual information is still stored on hard copies currently. Traditional textual information is completely still restricted by manpower-input electronic storage devices. Document information-contextual image binarization technique is benefited from information technology-based text information carriers. The learning technique has facilitated the growth of text information of images-relevant binarization. Multiple end-to-end convolutional neural network （CNN） models have been applied for the binarization of text images.

Method

Compared to the traditional threshold-based document image binarization methods， deep learning-based methods melted into semantic distribution characteristics of text pixels， and its performance of CNN-based text information of image-related binarization methods is accurate to a certain extent. However， these methods are still challenged for complex backgrounds-derived text information images in relevance to high false positives and insufficient training data. The network model is easily overfitted， the intermediate network layers are not easily activated during training， and its CNN-based features extraction is still focused on low-level semantic features only. The key of binarization methods is focused on the low-level semantic features such as pixel color and contrast. It is required to leak out words-like low-level features of complex backgrounds. We develop a dual method of binarization to resolve the identifiable problem of document information images in complex scenarios. The method is segmented into two processing categories： confusing pixel screening and binary segmentation. According to the division of labor of the two aspects， two networks are constructed with different structures. The former can strengthen the recognition and segmentation ability of coarse pixels in complex backgrounds. Maximum interclass variance based pseudo labels can be generated for false positive regions. These labels can be interlinked with real labels from the binarization results to identify between text pixels and backgrounds that can easily be recognized as text. The latter network can be focused on the accurate prediction of text pixels. The improved encoder-decoder structure can drive the latter model to sort out more textual pixels-related accurate pixel boundaries. The dual networks are successively processed for complex backgrounds-based text images to improve the processing effect of the binary-completed method on complex background images； Dual networks are solely and it can perform their individual tasks under the premise of the shirinked quantity of parameters. The dual processing ability is beneficial to alleviate labor cost of two network models. The labeled data can be reused for the training of multiple structures， and it can be trained efficiently even with less data， and various of structures can be used to deal with a single problem that the model is not required to be expanded into a more complex network structure. To enhance the ability of textual processing in detail， an integrated asymmetric encoder-decoder structure is illustrated as well. The asymmetric encoder-decoder structure can be used to richer the number of convolutional operations to three or four for a single encoder. The multiple convolution operations can increase the perceptual field of view of the network. To be specific， multiple convolutions can be used to learn the non-linear features， while the decoder is streamlined and optimized. The asymmetric encoder-decoder can balance the model size and the network is yielded to improve the binarization effect with additional parameter numbers.

Result

The experiment is carried out and it is in comparison with the latest methods on three contextual datasets of competition on document image binarization （DIBCO）： DIBCO2016， DIBCO2017， and DIBCO2018. In DIBCO2018， the F-measure of this method can be reached to 92.35%； for DIBCO2017 and DIBCO2016， F-measure is reached to 93.46% and 92.13% each.

Conclusion

The experimental results show that the indexes of binary segmentation of asymmetric encoder-decoder structure are improved to some extent. The proposed dual network has the potential to deal with the task of complex backgrounds. The overfitting phenomenon of the network is effectively suppressed， the intermediate network layers are correctly trained， and the model can extract more high-level semantic features to sort text pixels and false positive regions out. The complex backgrounds-oriented dual binarization approach can improve the binarization effect on DIBCO datasets. The code is linked to

https：//github.com/wjlbnw/Mask_Detail_Net

https://github.com/wjlbnw/Mask_Detail_Net

关键词

语义分割U-Net文档图像识别二值化复杂背景编码—解码结构多阶段分割

Keywords

semantic segmentationU-Netdocument image recognitionbinarizationcomplex backgroundencoder-decoder structuremultistage segmentation

references

Chen J. 2019. Research on Degraded Document Image Binarization Methods Based on Fully Convolutional Networks. Wuhan： Wuhan University of Technology

陈健. 2019. 基于全卷积网络的低质量文档图像二值化方法研究. 武汉：武汉理工大学［DOI： 10.27381/d.cnki.gwlgu.2019.001740http://dx.doi.org/10.27381/d.cnki.gwlgu.2019.001740］

Cheng H K， Chung J， Tai Y W and Tang C K. 2020. CascadePSP： toward class-agnostic and very high-resolution segmentation via global and local refinement//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle， USA： IEEE： 8887-8896 ［DOI： 10.1109/CVPR42600.2020.00891http://dx.doi.org/10.1109/CVPR42600.2020.00891］

Han H H， Li W T， Wang J P， Jiao D and Sun B S. 2020. Semantic segmentation of encoder-decoder structure. Journal of Image and Graphics， 25（2）： 255-266

韩慧慧，李帷韬，王建平，焦点，孙百顺. 2020. 编码—解码结构的语义分割. 中国图象图形学报， 25（2）： 255-266 ［DOI： 10.11834/jig.190212http://dx.doi.org/10.11834/jig.190212］

He K M， Zhang X Y， Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas， USA， IEEE： 770-778 ［DOI： 10.1109/CVPR.2016.90http://dx.doi.org/10.1109/CVPR.2016.90］

Huang X， Li L， Liu R， Xu C S and Ye M D. 2020. Binarization of degraded document images with global-local U-Nets. Optik， 203： #164025 ［DOI： 10.1016/j.ijleo.2019.164025http://dx.doi.org/10.1016/j.ijleo.2019.164025］

Jia B F. 2019. Research on Detection and Segmentation Algorithm for Medical Image Based on Deep Learning. Harbin： Harbin Institute of Technology

贾斌峰. 2019. 基于深度学习的医学影像检测分割算法研究. 哈尔滨：哈尔滨工业大学［DOI： 10.27061/d.cnki.ghgdu.2019.002925http://dx.doi.org/10.27061/d.cnki.ghgdu.2019.002925］

Jia F X， Shi C Z， He K， Wang C H and Xiao B H. 2018. Degraded document image binarization using structural symmetry of strokes. Pattern Recognition， 74： 225-240 ［DOI： 10.1016/j.patcog.2017.09.032http://dx.doi.org/10.1016/j.patcog.2017.09.032］

Jing Y， Kong T， Wang W， Wang L， Li L and Tan T N. 2021. Locate then segment： a strong pipeline for referring image segmentation//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville， USA， IEEE： 9853-9862 ［DOI： 10.1109/CVPR46437.2021.00973http://dx.doi.org/10.1109/CVPR46437.2021.00973］

Kang S， Iwana B K and Uchida S. 2021. Complex image processing with less data —— Document image binarization by integrating multiple pre-trained U-Net modules. Pattern Recognition， 109： #107577 ［DOI： 10.1016/j.patcog.2020.107577http://dx.doi.org/10.1016/j.patcog.2020.107577］

Kingma D P and Ba J L. 2017. Adam： a method for stochastic optimization ［EB/OL］. ［2022-01-26］. https://arxiv.org/pdf/1412.6980.pdfhttps://arxiv.org/pdf/1412.6980.pdf

Lee J， Yi J H， Shin C and Yoon S. 2021. BBAM： bounding box attribution map for weakly supervised semantic and instance segmentation//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville， USA： IEEE： 2643-2651 ［DOI： 10.1109/CVPR46437.2021.00267http://dx.doi.org/10.1109/CVPR46437.2021.00267］

Long J， Shelhamer E and Darrell T. 2015. Fully convolutional networks for semantic segmentation//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston， USA： IEEE： 3431-3440 ［DOI： 10.1109/CVPR.2015.7298965http://dx.doi.org/10.1109/CVPR.2015.7298965］

Matas J， Chum O， Urban M and Pajdla T. 2004. Robust wide-baseline stereo from maximally stable extremal regions. Image and Vision Computing， 22（10）： 761-767 ［DOI： 10.1016/j.imavis.2004.02.006http://dx.doi.org/10.1016/j.imavis.2004.02.006］

Ntirogiannis K， Gatos B and Pratikakis I. 2013. Performance evaluation methodology for historical document image binarization. IEEE Transactions on Image Processing， 22（2）： 595-609 ［DOI： 10.1109/TIP.2012.2219550http://dx.doi.org/10.1109/TIP.2012.2219550］

Otsu N. 1979. A threshold selection method from gray-level histograms. IEEE Transactions on Systems， Man， and Cybernetics， 9（1）： 62-66 ［DOI： 10.1109/TSMC.1979.4310076http://dx.doi.org/10.1109/TSMC.1979.4310076］

Paszke A， Gross S， Chintala S， Chanan G， Yang E， DeVito Z， Lin Z M， Desmaison A， Antiga L and Lerer A. 2017. Automatic differentiation in PyTorch ［EB/OL］. ［2022-01-26］. https://openreview.net/pdf?id=BJJsrmfCZhttps://openreview.net/pdf?id=BJJsrmfCZ

Powers D M W. 2021. Evaluation： from precision， recall and F-measure to ROC， informedness， markedness and correlation ［EB/OL］. ［2022-01-26］. https://arxiv.org/pdf/2010.16061.pdfhttps://arxiv.org/pdf/2010.16061.pdf

Pratikakis I， Zagoris K， Barlas G and Gatos B. 2016. ICFHR2016 handwritten document image binarization contest （H-DIBCO 2016）//Proceedings of the 15th International Conference on Frontiers in Handwriting Recognition （ICFHR）. Shenzhen， China： IEEE： 619-623 ［DOI： 10.1109/ICFHR.2016.0118http://dx.doi.org/10.1109/ICFHR.2016.0118］

Pratikakis I， Zagoris K， Barlas G and Gatos B. 2017. ICDAR2017 competition on document image binarization （DIBCO 2017）//Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition （ICDAR）. Kyoto， Japan： IEEE： 1395-1403 ［DOI： 10.1109/ICDAR.2017.228http://dx.doi.org/10.1109/ICDAR.2017.228］

Pratikakis I， Zagori K， Kaddas P and Gatos B. 2018. ICFHR 2018 competition on handwritten document image binarization （H-DIBCO 2018）//Proceedings of the 16th International Conference on Frontiers in Handwriting Recognition （ICFHR）. Niagara Falls， USA， IEEE： 489-493 ［DOI： 10.1109/ICFHR-2018.2018.00091http://dx.doi.org/10.1109/ICFHR-2018.2018.00091］

Pratikakis I， Zagoris K， Karagiannis X， Tsochatzidis L， Mondal T and Marthot-Santaniello I. 2019. ICDAR 2019 competition on document image binarization （DIBCO 2019）//Proceedings of 2019 International Conference on Document Analysis and Recognition （ICDAR）. Sydney， Australia： IEEE： 1547-1556 ［DOI： 10.1109/ICDAR.2019.00249http://dx.doi.org/10.1109/ICDAR.2019.00249］

Qing C， Yu J， Xiao C B and Duan J. 2020. Deep convolutional neural network for semantic image segmentation. Journal of Image and Graphics， 25（6）： 1069-1090

青晨，禹晶，肖创柏，段娟. 2020. 深度卷积神经网络图像语义分割研究进展. 中国图象图形学报， 25（6）： 1069-1090 ［DOI： 10.11834/jig.190355http://dx.doi.org/10.11834/jig.190355］

Ronneberger O， Fischer P and Brox T. 2015. U-Net： convolutional networks for biomedical image segmentation//Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention. Munich， Germany： Springer： 234-241 ［DOI： 10.1007/978-3-319-24574-4_28http://dx.doi.org/10.1007/978-3-319-24574-4_28］

Sauvola J and Pietikäinen M. 2000. Adaptive document image binarization. Pattern Recognition， 33（2）： 225-236 ［DOI： 10.1016/S0031-3203（99）00055-2http://dx.doi.org/10.1016/S0031-3203（99）00055-2］

Simonyan K and Zisserman A. 2015. Very deep convolutional networks for large-scale image recognition ［EB/OL］. ［2022-01-26］. https://arxiv.org/pdf/1409.1556.pdfhttps://arxiv.org/pdf/1409.1556.pdf

Vo Q N， Kim S H， Yang H J and Lee G. 2018. Binarization of degraded document images based on hierarchical deep supervised network. Pattern Recognition， 74： 568-586 ［DOI： 10.1016/j.patcog.2017.08.025http://dx.doi.org/10.1016/j.patcog.2017.08.025］

Wang K， Yang F and Jiang S. 2020. Review of optical character recognition. Application Research of Computers， 37（S2）： 22-24

王珂，杨芳，姜杉. 2020. 光学字符识别综述. 计算机应用研究， 37（S2）： 22-24

Wang L， Li D， Zhu Y S， Tian L and Shan Y. 2020. Dual super-resolution learning for semantic segmentation//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Seattle， USA： IEEE： 3773-3782 ［DOI： 10.1109/CVPR42600.2020.00383http://dx.doi.org/10.1109/CVPR42600.2020.00383］

Wang Y R， Chen Q L and Wu J J. 2019. Research on image semantic segmentation for complex environments. Computer Science， 46（9）： 36-46

王嫣然，陈清亮，吴俊君. 2019. 面向复杂环境的图像语义分割方法综述. 计算机科学， 46（9）： 36-46 ［DOI： 10.11896/j.issn.1002-137X.2019.09.005http://dx.doi.org/10.11896/j.issn.1002-137X.2019.09.005］

Yosinski J， Clune J， Bengio Y and Lipson H. 2014. How transferable are features in deep neural networks？［EB/OL］. ［2022-01-26］. https://arxiv.org/pdf/1411.1792.pdfhttps://arxiv.org/pdf/1411.1792.pdf

Zhai P B， Yang H， Song T T， Yu K， Ma L X and Huang X S. 2020. Two-path semantic segmentation algorithm combining attention mechanism. Journal of Image and Graphics， 25（8）： 1627-1636

翟鹏博，杨浩，宋婷婷，余亢，马龙祥，黄向生. 2020. 结合注意力机制的双路径语义分割. 中国图象图形学报， 25（8）： 1627-1636 ［DOI： 10.11834/jig.190533http://dx.doi.org/10.11834/jig.190533］

Zhang B F， Xiao J M and Qin T. 2021. Self-guided and cross-guided learning for few-shot segmentation//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville， USA： IEEE： 8308-8317 ［DOI： 10.1109/CVPR46437.2021.00821http://dx.doi.org/10.1109/CVPR46437.2021.00821］

Zhang S P， Wu W and Wan Y. 2020. Single image shadow removal method based on multistage generative adversarial network. Journal of Computer Applications， 40（8）： 2378-2385

张淑萍，吴文，万毅. 2020. 基于多阶段生成对抗网络的单幅图像阴影去除方法. 计算机应用， 40（8）： 2378-2385

Zhao J Y， Shi C Z， Jia F X， Wang Y N and Xiao B H. 2019. Document image binarization with cascaded generators of conditional generative adversarial networks. Pattern Recognition， 96： #106968 ［DOI： 10.1016/j.patcog.2019.106968http://dx.doi.org/10.1016/j.patcog.2019.106968］

Zhou Z W， Siddiquee M M R， Tajbakhsh N and Liang J M. 2018. Unet++： a nested U-Net architecture for medical image segmentation//Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support. Granada， Spain： Springer： 3-11 ［DOI： 10.1007/978-3-030-00889-5_1http://dx.doi.org/10.1007/978-3-030-00889-5_1］

Zhuang J T. 2019. Laddernet： multi-path networks based on U-Net for medical image segmentation ［EB/OL］. ［2022-01-26］. https://arxiv.org/pdf/1810.07810.pdfhttps://arxiv.org/pdf/1810.07810.pdf

文章被引用时，请邮件提醒。

提交

U-Net网络医学图像分割应用综述

面向GF-2遥感影像的U-Net城市绿地分类

编码—解码结构的语义分割

结合双边交叉增强与自注意力补偿的点云语义分割

面向无人机海岸带生态系统监测的语义分割基准数据集