开放集文字识别技术

杨春; 刘畅; 方治屿; 韩铮; 刘成林; 殷绪成

发布时间： 2023-06-20
摘要点击次数： 1524
全文下载次数： 1542
DOI: 10.11834/jig.230018
2023 | Volume 28 | Number 6

开放集文字识别技术

杨春^1,2, 刘畅¹, 方治屿¹, 韩铮¹, 刘成林³, 殷绪成^1,2(1.北京科技大学计算机与通信工程学院, 北京 100083;2.北京科技大学模式识别与人工智能技术创新实验室, 北京 100083;3.中国科学院自动化研究所, 北京 100190)

摘要

开放环境下的模式识别与文字识别应用中，新数据、新模式和新类别不断涌现，要求算法具备应对新类别模式的能力。针对这一问题，研究者们开始聚焦开放集文字识别（open-set text recognition，OSTR）任务。该任务要求，算法在测试（推断）阶段，既能识别训练集见过的文字类别，还能够识别、拒识或发现训练集未见过的新文字。开放集文字识别逐步成为文字识别领域的研究热点之一。本文首先对开放集模式识别技术进行简要总结，然后重点介绍开放集文字识别的研究背景、任务定义、基本概念、研究重点和技术难点。同时，针对开放集文字识别三大问题（未知样本发现、新类别识别和上下文信息偏差），从方法的模型结构、特点优势和应用场景的角度对相关工作进行了综述。最后，对开放集文字识别技术的发展趋势和研究方向进行了分析展望。

关键词

文字识别开放集模式识别开放集文字识别(OSTR) 封闭集文字识别零样本文字识别

Open set text recognition technology

Yang Chun^1,2, Liu Chang¹, Fang Zhiyu¹, Han Zheng¹, Liu Chenglin³, Yin Xucheng^1,2(1.School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, China;2.University of Science and Technology Beijing, Pattern Recognition and Artificial Intelligence Lab, Beijing 100083, China;3.Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China)

Abstract

Text recognition is focused on text transcription-based image processing modeling in relevance to such domains like document digitization, content moderation, scene text translation, automation driving, scene understanding, and other related contexts. Conventional text recognition techniques are often concerned about characters-seen recognition more. However, two factors in the training set of these methods are yet to be well covered, which are novel character categories and out-of-vocabulary (OOV) samples. Newly characters-related samples are often linked with OOV-based samples. However, it may pay attention to seen characters without novel combinations or contexts. For novel character categories, internet-based environments can be mainly used to face unseen ligatures like 1) emoticons and unperceived languages, 2) scene-text recognition environments, and 3) characters from foreign and region-specific languages. For digitization profiling, the undiscovered characters may not be involved in as well. Since the heterogeneity of language format to be balanced, the linguistic statistic data(e. g., n-gram, context, etc.) can be biased the training data gradually, which is challenged for vocabulary-high-correlated text recognition methods. The two factors are required to yield three key scientific problems that affect the costs or efficiency in open-world applications. The novel characters are oriented for the novel spotting capability, whereas characters-unseen are rejected to replace silent seen characters. Furthermore, as the popular open-set recognition problem, three scientific problems can be leaked out as mentioned below. First, the emergence of novel characters is not efficient in many cases, in which re-training upon each occurrence is costly, and an incremental learning capability need to be strengthened after that. Second, an amount of attention is received as the generalized zeroshot learning text recognition task. Third, Linguistic bias robustness is yielded by the OOV samples. Due to the characterbased nature prediction, more popular methods can be used to possess the capability to handle characters-seen OOV samples to some extent. However, such capabilities are constrained to demonstrate strong vocabulary reliance because of the capacity of language models, the open-set text recognition(OSTR) task is feasible since existing tasks like zero-shot text recognition and OOV recognition can be used to model individual aspects of the problems only. This task aims to spot and recognize the novel characters, which is robust to linguistic skews. As an extension of the conventional text recognition task, the OSTR task is used to retain a decent recognition capability on seen contents. In recent years, the OSTR task has been developing intensively in the context of character recognition. The literature review is carried out on the open-set text recognition task and its related domains. It consists of such five aspects of the background, genericity, the concept, implementation, and summary. For the background, we introduce the application background of the OSTR task and analyze the specific OSTR-derived cases. For genericity, the generic open-set recognition is introduced in brief as a preliminary of the OSTR task that is less familiar to some researchers in the text recognition field. For concept, the definition of the OSTR task is introduced, followed by a discussion on its relationship with existing text recognition tasks, e. g., conventional close-set text recognition task and the zero-shot text recognition task. Its implementation-wise, common text recognition frameworks are first introduced. For implementation, it can be recognized as derivations of such frameworks, where the derivation is based on the three key scientific problems as following:new category spotting, incremental recognition of novel classes, and linguistic bias robustness. Specifically, the new category spotting problem refers to rejecting samples that come from an absent class of a given label set. Slightly different from the generic open-set text recognition task, the given label-set is challenged in related to the training data straightfoward. Incremental recognition refers to new categories recognition in terms of the non-retrained side information of the corresponding categories. The definition is slightly different from the common zero-shot learning definition, it can be excluded some generative adversarial network(GAN)-based transductive approaches. The linguistic bias robustness holds its original definition beyond more stressed unseen characters. For each scientific problem, its solution can be covered in text recognition and other modeling-similar related fields. The evaluation is carried out and it can mainly cover the datasets and protocols used in the OSTR task and its contexts as listed:1) multiple protocols based public available datasets, 2) commonly used metric to measure model performance, and 3) several of popular protocols, typical methods, and the performance. Here, a protocol refers to the compositions of training sets, testing sets, and evaluation metrics. For summary, the comparative analysis of the growth and technical preference are demonstrated. Finally, the potnetialss of the trends and future research directions are predicted further.

Keywords

character recognition open set recognition open-set text recognition(OSTR) close-set text recognition zero-set text recognition

在线采编平台

在线出版

年度会议

下载中心

年度信息