开放环境下的模式识别与文字识别应用中,新数据、新模式、新类别不断涌现,要求算法具备应对新类别模式的能力。针对这一问题,研究者们开始聚焦开放集文字识别(Open-Set Text Recognition,OSTR)任务。该任务要求,算法在测试(推断)阶段,既能识别训练集见过的文字类别,还能够识别、拒识或发现训练集未见过的新文字。近年来,开放集文字识别逐步成为文字识别领域的研究热点之一。本报告首先对开放集模式识别技术进行简要总结,然后重点介绍开放集文字识别的研究背景、任务定义、基本概念、研究重点和技术难点。同时,针对开放集文字识别三大问题(未知样本发现、新类别识别、上下文信息偏差),从方法的模型结构、特点优势和应用场景的角度对相关工作进行了综述。最后,对开放集文字识别技术的发展趋势和研究方向进行了分析展望。
Open Set Text Recognition Technology
Yang Chun,Liu Chang,Fang Zhi Yu,Han Zheng,Liu Cheng Lin,Yin Xu Cheng(Institute of Automation,Chinese Academy of Sciences;China;;USTB-PRIR Lab;China;University of Science and Technology Beijing)
Text recognition is a task that models’ image to text transcription process widely utilized in applications like document digitization, content moderation, scene text translation, automation driving, scene understanding, and various other domains. Most existing text recognition technologies focus on recognizing seen characters and have achieved satisfactory success in corresponding applications. However, these methods overlook the challenges posed by two factors, which are novel character categories and out-of-vocabulary (OOV) samples that are not covered by the training set. Note that samples with novel characters are always OOV samples, however, an OOV sample may be as well composed of seen characters, but with novel combinations or contexts. The two factors arise in open environments where the users do not have full knowledge of the incoming data, which holds in many real-life use cases. For novel character categories, internet-oriented environments mainly face unseen ligatures like emoticons and unperceived languages, scene-text recognition environments, and characters from foreign and region-specific languages. For historical document digitization, the data may include undiscovered characters as well. Since language evolves over time and along regions, the linguistic statistic of the data (e.g. n-gram, context, etc.) can gradually grow biased to the training data, which raises doubt on the reliability of text recognition methods showing heavy vocabulary reliance. The two factors consequentially yield three key scientific problems that affect the costs or efficacy in open-world applications. The novel characters first raise the demand for the novel spotting capability, where the models need to reject unseen characters from data instead of silently recognizing them as seen characters. In the field of text recognition, this problem has not received much research by far. However, the scientific problem can be summarized as the popular open-set recognition problem. Furthermore, in many cases, the emergence of novel characters happens over time, which makes re-training upon each occurrence expensive, and thus demands an incremental learning capability. The second problem, however, has received an amount of attention as the (generalized) zero-shot learning text recognition task. Linguistic bias robustness is the third scientific problem yielded by the OOV samples. Due to their character-based predicting nature, most modern methods possess the capability to handle OOV samples formed by seen characters to some extent. However, such capabilities are considerably limited as they tend to demonstrate strong vocabulary reliance due to the heavy utilization of language models. This problem is mainly studied in the OOV text recognition challenges recently gaining another prosperity. Since existing tasks like zero-shot text recognition and OOV recognition only model individual aspects of the problems from real-world applications, researchers have considered the Open-Set Text Recognition (OSTR) task. The task aims to effectively spot and incrementally recognize the novel characters, being robust to linguistic skews. As an extension of the conventional text recognition task, the OSTR task still demands the model to retain a decent recognition capability on seen contents. In recent years, the OSTR task has become one of the hot problems in the field of character recognition. This report summarizes previous research on the Open-Set Text Recognition task and related fields. The main article includes five parts, i.e., the background, the concept, the implementation, and the summarization part. For the background part, this report first briefly introduces the application background of the OSTR task, explaining the specific use cases that the OSTR task originated from. Then, the generic open-set recognition is introduced in brief as an important preliminary of the OSTR task that is less familiar to some researchers in the text recognition field. The concept part first introduces the definition of the OSTR task, followed by a discussion on its relationship with existing text recognition tasks, e.g., the conventional close-set text recognition task and the zero-shot text recognition task. Implementation-wise, common text recognition frameworks are first introduced. The majority of OSTR implementations can be regarded as derivations of such frameworks, where the derivation includes solving the three key scientific problems: new category spotting, incremental recognition of novel classes, and linguistic bias robustness. Specifically, the new category spotting problem refers to rejecting samples that come from an absent class of a given label set. Slightly different from the generic open-set text recognition task, the given label-set does not have to be directly related to the training data. Incremental recognition refers to adapting to recognize new categories according to the side information of the corresponding categories without retraining. The definition is slightly different from the common zero-shot learning definition, the definition excludes some GAN-based transductive approaches. The linguistic bias robustness holds its original definition, however more stressed unseen characters. The following sections in this part survey the three key scientific problems. For each scientific problem, we survey its solution in text recognition and other fields with similar modeling. The evaluation part mainly covers the datasets and protocols used in the OSTR task and related tasks. The first section surveys openly available datasets commonly used in various protocols. The following section in this part introduces the metric commonly used to measure model performance. The last section surveys various of popular protocols, typical methods, and the performance. Here, a protocol refers to the compositions of training sets, testing sets, and evaluation metrics. The last summarization part includes the comparison of the progress and technical preference between domestic and foreign research groups. At last, predictions of the trends and research directions in this field are also given based on the limitations of current methods and the developments of related fields.
Character Recognition Open Set Recognition Open-Set Text Recognition Close-set Text Recognition Zero-set Text Recognition.