Current Issue Cover
类别语义相似性监督的小样本图像识别

徐鹏帮1, 桑基韬1, 路冬媛2(1.北京交通大学计算机与信息技术学院, 北京 100044;2.对外经济贸易大学信息学院, 北京 100029)

摘 要
目的 现有的深度学习模型往往需要大规模的训练数据,而小样本分类旨在识别只有少量带标签样本的目标类别。作为目前小样本学习的主流方法,基于度量的元学习方法在训练阶段大多没有使用小样本目标类的样本,导致这些模型的特征表示不能很好地泛化到目标类。为了提高基于元学习的小样本图像识别方法的泛化能力,本文提出了基于类别语义相似性监督的小样本图像识别方法。方法 采用经典的词嵌入模型GloVe(global vectors for word representation)学习得到图像数据集每个类别英文名称的词嵌入向量,利用类别词嵌入向量之间的余弦距离表示类别语义相似度。通过把类别之间的语义相关性作为先验知识进行整合,在模型训练阶段引入类别之间的语义相似性度量作为额外的监督信息,训练一个更具类别样本特征约束能力和泛化能力的特征表示。结果 在miniImageNet和tieredImageNet两个小样本学习基准数据集上进行了大量实验,验证提出方法的有效性。结果显示在miniImageNet数据集5-way 1-shot和5-way 5-shot设置上,提出的方法相比原型网络(prototypical networks)分类准确率分别提高1.9%和0.32%;在tieredImageNet数据集5-way 1-shot设置上,分类准确率相比原型网络提高0.33%。结论 提出基于类别语义相似性监督的小样本图像识别模型,提高小样本学习方法的泛化能力,提高小样本图像识别的准确率。
关键词
Few shot image recognition based on class semantic similarity supervision

Xu Pengbang1, Sang Jitao1, Lu Dongyuan2(1.School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China;2.School of Information Technology and Management, University of International Business and Economics, Beijing 100029, China)

Abstract
Objective Deep learning has made remarkable achievements in many fields such as image recognition, object detection, and speech recognition. However, most of the extraordinary achievements of these models depend on extraordinary data size. Existing deep-learning models often need large-scale training data. Building large-scale training data sets not only necessitates a large amount of manpower and material resources but are also not feasible in scenarios such as obtaining a large number of rare image class data samples. Inspired by the fact that human children can learn how to distinguish an object through a small number of samples, few-shot image classification aims to identify target categories with only a few labeled samples. Image recognition based on few-shot learning solves the problem in which a deep learning model needs large-scale training data. At present, the mainstream methods of few-shot image recognition are based on meta learning, which mainly includes three methods: meta learning based on metric, meta learning based on optimization, and meta learning based on model. The method of meta learning is divided into two stages: training and testing. However, most of the metric-based meta-learning methods do not use few shots of the target class in the training stage, which leads to a lack of good generalization ability of these models. These metric-based meta-learning models often show high accuracy in the training stage, but the recognition effect for few-shot image categories in the test stage is poor. The deep feature representation learned by the models cannot be effectively generalized to the target class. To improve the generalization ability of the few-shot learning image recognition method, this study proposes a few-shot learning method based on class semantic similarity supervision. Method The method proposed in this paper mainly includes two parts: the first step is to obtain the class similarity matrix between the image dataset classes, and the second step is to use the class similarity matrix as additional supervision information to train the few-shot image recognition model. The details are as follows: a common crawl database containing one billion level webpage data is used to train an unsupervised word-vector learning algorithm GloVe model (global vectors for word representation), which generates 300 dimensional vectors for every word. For classes whose names contain more than one word, we match all the words in the training GloVe model and find their word-embedding vectors. By averaging these word-embedding vectors, we obtain the word embedding vector of the class name. Then, the cosine distance between the word-embedding vectors of classes is used to represent the semantic similarity between classes. In addition to the negative logarithm loss caused by the category labels of the original prototypical networks, this study introduces the semantic similarity measure between categories as the extra supervision information in the training stage of the model to establish the implicit relationship between the source class and few-shot target class. This condition enables the model to have better generalization ability. Furthermore, the loss of class semantic similarity can constrain the features of samples within and between classes learned by the model so that the sample features within each class are more similar, and the distribution of sample features between different classes is more consistent with the semantic similarity between categories. By introducing the loss of class semantic similarity to supervise the training process of the model, our proposed model can implicitly learn the relationship between different classes and obtain a feature representation with more constraint and generalization abilities of class sample features. Result This study compared the proposed model with several state-of-the-art few-shot image classification models, including prototypical, matching, and relation networks and other classic methods. In this study, a large number of experiments are conducted on miniImageNet and tieredImageNet. The results show that the proposed method is effective and competitive with the current advanced methods. To ensure fair comparison with the advanced methods, the classical paradigm of meta learning is used to train and test the model, and many experiments are conducted on the widely used 5-way 1-shot and 5-way 5-shot settings. The experimental results show that on the 5-way 1-shot and 5-way 5-shot settings of the miniImageNet dataset, the classification accuracy of the proposed method is improved by 1.9% and 0.32%, respectively, compared with the classical few-shot image recognition meta-learning method prototypical networks. In the tieredImageNet dataset on the 5-way 1-shot setting, the classification accuracy rate is improved by 0.33% compared with that in the prototypical networks. On the 5-way 5-shot setting of the tieredImageNet dataset, the proposed model achieves a competitive result compared with the prototypical networks. At the same time, several ablation experiments are conducted to verify the effectiveness of the key modules of the proposed method, and the influence of prior information of class semantic similarity on the experimental results is analyzed from multiple perspectives. Conclusion This study proposed a few-shot image recognition model based on class semantic similarity supervision, which improves the generalization ability and class-feature constraint ability of the few-shot image recognition model. Experimental results show that the proposed method improves the accuracy of few-shot image recognition.
Keywords

订阅号|日报