Document Page Segmentation and Classification Based on Pattern-list Analysis[J]. Journal of Image and Graphics, 2005, 10(6): 741. DOI: 10.11834/jig.200506145.
a new algorithm based on pattern-list analysis is introduced for page segmentation and classification of document images with irregular-shaped halftone regions embedded in the text regions. This algorithm is composed of three steps. The first step
all the black pixels are extracted by the bounding-boxes and are stored in a linked rectangle-list. The second step
all connected rectangles are grouped to form patterns and pattern-list. At last
the page images are classified into text regions and halftone regions according to their the statistical features. After above three steps
still uncertain patterns are further classified by the type of contextual patterns. Experimental results show the fastness of the proposed algorithm in segmenting text and halftone regions and its excellent performance for complex document images.