Current Issue Cover

陈宇1, 王大寒1, 池雪可1, 江楠峰1, 张煦尧2, 王驰明1, 朱顺痣1(1.厦门理工学院;2.中国科学院自动化研究所)

摘 要
目的 零样本汉字识别(zero-shot Chinese character recognition,ZSCCR)因其能在零或少训练样本下识别未见汉字而受到广泛关注。现有的零样本汉字识别方法大多采用基于部首序列匹配框架,即首先预测部首序列,然后根据表意描述序列(ideographic description sequence,IDS)字典进行最小编辑距离(minimum edit distance,MED)匹配。然而,现有的MED算法默认不同部首的替换代价、插入代价和删除代价相同,导致在匹配时候选字符类别存在距离代价模糊和冗余的问题。为此,提出了一种字符敏感编辑距离(character-aware edit distance,CAED)以正确匹配目标字符类别。方法 通过设计多种部首信息提取方法,获得了更为精细化的部首描述,从而得到更精确的部首替换代价,提高了MED的鲁棒性和有效性;此外,提出部首计数模块预测样本的部首数量,从而形成代价门控以约束和调整插入和删除代价,克服了IDS序列长度预测不准确产生的影响。结果 在手写汉字、场景汉字和古籍汉字等数据集上进行实验验证,与以往的方法相比,本文提出的CAED在识别未见汉字类别的准确率上分别提高了4.64%、1.1%和5.08%,同时对已见汉字类别保持相当的性能,实验结果充分证明了本方法的有效性。结论 本文所提出的字符敏感编辑距离,使得替换、插入和删除三种编辑代价根据字符进行自适应调整,有效提升了对未见汉字的识别性能。
Character-aware edit distance for zero-shot Chinese character recognition

chenyu, wangdahan1, chixueke1, jiangnanfeng1, zhangxuyao2, wangchiming1, zhushunzhi1(1.Xiamen University of Technology;2.Institute of Automation of Chinese Academy of Sciences)

Objective Zero-shot Chinese Character Recognition (ZSCCR) has attracting increasing attentions in recent years due to its importance in recognizing unseen Chinese characters with zero/few training samples. The fundamental concept of zero-shot learning is to solve the new class recognition problem by generalizing semantic knowledge from seen classes to unseen classes, usually represented by auxiliary information such as attribute descriptions shared between different classes. Chinese characters consist of multiple radicals, so radicals are often used as shared attributes between different Chinese character classes. Most existing ZSCCR methods adopt the radical-based sequence matching framework that recognizes the character by predicting the radical sequence followed by a minimum edit distance (MED) matching based on the ideographic description sequence (IDS) dictionary. The MED can quickly compare the predicted radical sequences one by one with the IDS dictionary to measure the difference between the two sequences and thus determine the unseen Chinese character category. However, this algorithm is mainly based on the 0-1 substitution cost and 1-1 insertion-deletion cost frameworks, which assume that the substitution cost is the same between all pairs of radicals. However, in practice, the substitution cost between similar radicals should be lower than between non-similar radicals. Moreover, this approach needs more flexibility as the length of the predicted IDS sequence is too long or too short, resulting in redundant insertion or deletion costs. Consequently, we propose a character-aware edit distance (CAED) to extract refined radical substitution costs and consider the impacts of insertion and deletion costs. Method We proposed that the CAED adaptively adjusts the cost of substitution, insertion, and deletion in edit distance to match the unseen Chinese character category according to the sensitivity of each Chinese character. In ZSCCR, the key to the radical-based approach lies in identifying radical sequences and the metrics between predicted sequences and candidate sequences, and the accuracy of the metrics will directly determine the performance of the final model. Therefore, the metrics between radical sequences need to be refined. Specifically, the CAED proposed in this paper analyzes the cost of distance in edit distance. The similarity probability between different radicals is calculated as the substitution cost by assigning weights to the radical"s structure, number of strokes, partials, and four-corner method information. Thus, the cost of the distance between different radicals is finely adjusted to improve the robustness and performance of MED. In addition, we introduce a radical counting module to predict the number of radicals. Constraints on the cost of insertions and deletions are imposed by comparing the radical counts with the number of radicals in the predicted sequence to help mitigate the problem of a predicted sequence of radicals that is too long or too short. On this basis, we obtain more refined distances between radical sequences. Compared to traditional methods, our method can accurately match the correct character class with the shortest distance, regardless of misrecognition into similar radicals, mismatch of radical sequences, or both simultaneously. Result Experiments are conducted on the handwriting database (HWDB) and the 12th International Conference on Document Analysis and Recognition (ICDAR2013) datasets, the Chinese text in the wild (CTW) datasets, and the ancient handwritten characters database (AHCDB). Initially, on the handwritten and scene Chinese character datasets, our proposed CAED consistently outperformed current state-of-the-art methods in ZSCCR, showcasing the superiority of CAED. Subsequently, CAED was integrated with other networks in the ancient Chinese dataset to emphasize its scalability. Additionally, we evaluated the performance of the radical counting module, recognizing its direct impact on cost gating. In subsequent ablation experiments, we validated the effectiveness of the insertion and deletion cost constraint modules and the substitution cost refinement module. Combinatorial analysis was conducted on the multiple pieces of information contributing to the substitution cost to determine their respective values. Finally, to evaluate the performance of CAED in recognizing purely visible Chinese character categories, we conducted traditional Chinese character recognition experiments, and the accuracy reached 97.02% on ICDAR2013. Although it does not reach the optimal performance, CAED is also very competitive and performs excellently in all comparison results. The experimental outcomes revealed a noteworthy improvement in unseen Chinese character accuracy, with CAED achieving a 4.64%, 1.1%, and 5.08% enhancement compared to other methods on the HWDB, ICDAR2013, CTW, and AHCDB datasets, respectively. Conclusion A character-aware edit distance for zero-shot Chinese character recognition is proposed such that the cost of editing in edit distance depends adaptively on the character. The method refines the substitution cost between radicals with multiple pieces of information, which can correct similar radicals misrecognized as confusing by the model. Moreover, introducing a radical counting module to form a cost gating is used to constrain the cost of insertions and deletions, thus alleviating the problem of mismatched radical sequence lengths. In addition, the method can be combined with any network based on radical sequence recognition to improve the resistance to misrecognition.