分离复杂背景下的文档图像二值化方法

王红霞; 武甲礼; 陈德山

发布时间： 2023-07-19
摘要点击次数： 695
全文下载次数： 468
DOI: 10.11834/jig.220098
2023 | Volume 28 | Number 7

分离复杂背景下的文档图像二值化方法

王红霞¹, 武甲礼¹, 陈德山²(1.武汉理工大学计算机科学与人工智能学院, 武汉 430061;2.武汉理工大学智能交通系统研究中心, 武汉 430061)

摘要

目的二值化方法的主要依据是像素的颜色和对比度等低级语义特征，辨别出与文字具有相似低级特征的复杂背景是二值化亟待解决的问题。针对文档图像二值化复杂背景分离问题，提出一种分离文档图像复杂背景的二阶段二值化方法。方法该方法分为易误判像素筛选和二值化分割两个处理阶段，根据两个阶段的分工构建不同结构的两个网络，前者强化对复杂背景中易误判像素识别和分离能力，后者着重文字像素准确预测，以此提升整个二值化方法在复杂背景图像上的处理效果；两个网络各司其职，可在压缩参数量的前提下出色完成各自任务，进一步提高网络效率。同时，为了增强文字目标细节处理能力，提出一种非对称编码—解码结构，给出两种组合方式。结果实验在文本图像二值化比赛（competition on document image binarization，DIBCO）的DIBCO2016、DIBCO2017以及DIBCO2018数据集上与其他方法进行比较，本文方法在DIBCO2018中FM （F-measure）为92.35%，仅比经过特殊预处理的方法差0.17%，综合效果均优于其他方法；在DIBCO2017和DIBCO2016中FM分别为93.46%和92.13%，综合效果在所有方法中最好。实验结果表明，非对称编码—解码结构二值化分割的各项指标均有不同程度的提升。结论提出的二阶段方法能够有效区分复杂背景，进一步提升二值化效果，并在DIBCO数据集上取得了优异成绩。开源代码网址为https://github.com/wjlbnw/Mask_Detail_Net。

关键词

语义分割 U-Net 文档图像识别二值化复杂背景编码—解码结构多阶段分割

A complex background-related binarization method for document-contextual information processing

Wang Hongxia¹, Wu Jiali¹, Chen Deshan²(1.School of Computer Science and Artificial Intelligence, Wuhan University of Technology, Wuhan 430061, China;2.Intelligent Transportation System Research Center, Wuhan University of Technology, Wuhan 430061, China)

Abstract

Objective The optical character recognition（OCR）based binarization technique is essential for dealing with complex backgrounds recently. To recognize text in the image faster and more accurately，document information-contextual image binarization is oriented to segment captured color image or generated grayscale image，and a text information onlyinvolved image can be output. Big data-driven massive data storage is requried for changeable text information versions from hard copies to electronic copies. However，huge amount of textual information is still stored on hard copies currently. Traditional textual information is completely still restricted by manpower-input electronic storage devices. Document information-contextual image binarization technique is benefited from information technology-based text information carriers. The learning technique has facilitated the growth of text information of images-relevant binarization. Multiple end-toend convolutional neural network（CNN）models have been applied for the binarization of text images. Method Compared to the traditional threshold-based document image binarization methods，deep learning-based methods melted into semantic distribution characteristics of text pixels，and its performance of CNN-based text information of image-related binarization methods is accurate to a certain extent. However，these methods are still challenged for complex backgrounds-derived text information images in relevance to high false positives and insufficient training data. The network model is easily overfitted，the intermediate network layers are not easily activated during training，and its CNN-based features extraction is still focused on low-level semantic features only. The key of binarization methods is focused on the low-level semantic features such as pixel color and contrast. It is required to leak out words-like low-level features of complex backgrounds. We develop a dual method of binarization to resolve the identifiable problem of document information images in complex scenarios. The method is segmented into two processing categories：confusing pixel screening and binary segmentation. According to the division of labor of the two aspects，two networks are constructed with different structures. The former can strengthen the recognition and segmentation ability of coarse pixels in complex backgrounds. Maximum interclass variance based pseudo labels can be generated for false positive regions. These labels can be interlinked with real labels from the binarization results to identify between text pixels and backgrounds that can easily be recognized as text. The latter network can be focused on the accurate prediction of text pixels. The improved encoder-decoder structure can drive the latter model to sort out more textual pixels-related accurate pixel boundaries. The dual networks are successively processed for complex backgrounds-based text images to improve the processing effect of the binary-completed method on complex background images；Dual networks are solely and it can perform their individual tasks under the premise of the shirinked quantity of parameters. The dual processing ability is beneficial to alleviate labor cost of two network models. The labeled data can be reused for the training of multiple structures，and it can be trained efficiently even with less data，and various of structures can be used to deal with a single problem that the model is not required to be expanded into a more complex network structure. To enhance the ability of textual processing in detail，an integrated asymmetric encoder-decoder structure is illustrated as well. The asymmetric encoder-decoder structure can be used to richer the number of convolutional operations to three or four for a single encoder. The multiple convolution operations can increase the perceptual field of view of the network. To be specific，multiple convolutions can be used to learn the non-linear features，while the decoder is streamlined and optimized. The asymmetric encoder-decoder can balance the model size and the network is yielded to improve the binarization effect with additional parameter numbers. Result The experiment is carried out and it is in comparison with the latest methods on three contextual datasets of competition on document image binarization （DIBCO）：DIBCO2016， DIBCO2017，and DIBCO2018. In DIBCO2018，the F-measure of this method can be reached to 92. 35%；for DIBCO2017 and DIBCO2016，F-measure is reached to 93. 46% and 92. 13% each. Conclusion The experimental results show that the indexes of binary segmentation of asymmetric encoder-decoder structure are improved to some extent. The proposed dual network has the potential to deal with the task of complex backgrounds. The overfitting phenomenon of the network is effectively suppressed，the intermediate network layers are correctly trained，and the model can extract more high-level semantic features to sort text pixels and false positive regions out. The complex backgrounds-oriented dual binarization approach can improve the binarization effect on DIBCO datasets. The code is linked to https://github.com/wjlbnw/Mask_Detail_Net.

Keywords

semantic segmentation U-Net document image recognition binarization complex background encoderdecoder structure multistage segmentation

在线采编平台

在线出版

年度会议

下载中心

年度信息