面向元余弦损失的少样本图像分类模型
陶鹏1, 冯林1, 杜彦东1, 龚勋2, 王俊1(1.四川师范大学;2.西南交通大学) 摘 要
目的 度量学习是少样本学习中一种简单且有效的方法,学习一个丰富、具有判别性和泛化性强的嵌入空间是度量学习方法实现优秀分类效果的关键。本文从样本自身的特征以及特征在嵌入空间中的分布出发,结合全局与局部数据增强实现了一种元余弦损失的少样本图像分类方法(A Meta-Cosine Loss for Few-shot Image Classification, AMCL-FSIC)。方法 首先,从数据自身特征出发,将全局与局部的数据增广方法结合起来,利于局部信息提供更具区别性和迁移性的信息,使训练模型更多关注图像的前景信息。同时,利用注意力机制结合全局与局部特征,以得到更丰富更具判别性的特征。其次,从样本特征在嵌入空间中的分布出发,提出一种元余弦损失(Meta-Cosine Loss, MCL)函数,优化少样本图像分类模型。使用样本与类原型间相似性的差调整不同类的原型,扩大类间距,使模型测试新任务时类间距更加明显,提升模型的泛化能力。结果 分别在5个少样本经典数据集上与最新的方法进行了比较,在FC100和CUB数据集中的实验均达到了目前最优分类效果,在MiniImageNet, TieredImageNet和Cifar100数据集上与最新模型的实验结果相当。同时,在MiniImagenet,CUB和 Cifar100数据集上进行对比实验以验证MCL的有效性,对比实验结果证明了提出的MCL提升了余弦分类器的分类效果。结论 本文方法能充分提取少样本图像分类任务中的图像特征,有效提升了度量学习在少样本图像分类中的准确率。
关键词
A Meta-Cosine Loss for Few-shot Image Classification
taopeng, fenglin1, duyandong1, gongxun2, wangjun1(1.Sichuan Normal University;2.Southwest Jiaotong University) Abstract
Objective Few-shot learning(FSL) is a hot and difficult problem in computer vision, which aims to achieve effective classification with a few labeled samples. Recent few-shot learning methods can be divided into three main categories: metric-based, transfer-based, and gradient-based methods. Among them, the metric learning method has received much attention because of its simplicity and excellent performance for few-shot learning problems. Specifically, the metric learning based method consists of a feature extractor based on convolutional neural network (CNN) and a classifier based on spatial distance. By mapping the samples into the embedding space, a simple metric function is used to calculate the similarity between the sample and the class prototype, so as to quickly identify the novel class sample. Metric function is used for classification, which bypasses the optimization problem in the few-shot setting when network learning classifiers. Therefore, a richer, more discriminative and better generalization embedding space is the key for metric learning methods. In order to improve the accuracy of metric learning methods, from the perspective of feature itself and its embedding space, by combining the global and local features of the sample, we propose a meta-cosine loss for few-shot image classification method, termed as AMCL-FSIC. Method On the one hand, our primary objective is to obtain suitable features. The image information is composed of foreground image and background image. The foreground image is beneficial to few-shot classification, while but the background image is harmful. If we can force the model to only focus on the foreground of training and evaluation and ignore the background, it is helpful for image classification, but this is not easy. In fact, we need the prior knowledge of the prospective object. As the previous researchers mentioned, the images are roughly divided into: global features and local features that are randomly cropped portions of each image. Local Features contains cross-category discriminatory and transferable information, which is of great significance for Few-Shot"s image classification. First, we combine global and local data enhancement strategies. Specifically, the local information of the image allows the model to pay more attention to the uniqueness and transfer characteristics of the sample, thereby minimizing the impact of background information. Then, the introduction of self-attention mechanisms helps to combine global and local features, which can also gain richer and more distinguished features. On the other hand, from the features distribution in the embedded space, we meta-training a cosine classifier, and minimize the loss by calculating the strings between the sample and the prototypes. In embedded space, the same category of features are gathering, and different categories of features are far away. However, the previous residue classifiers only paid attention to the same classes of the same class during the training period, and did not completely stretch different types of samples. The direct consequence of this is that when facing new test tasks with similar categories, the generalization capacity of the model decreases. Based on the cosine classifier, we propose the meta-cosine loss (MCL). In the meta-training process, the difference of cosine similarity between the sample and the class prototype is used to adjust the class prototype according to the parallelogram principle. MCL makes the model as far away as possible from the feature clusters of different classes in the task, which ensures that the class is more separable when the model faces a new test task, and improves the generalization ability of the model. Result We conduct extensive experiments to verify the model effectiveness. Experiments are performed on five classical few-shot datasets, such as miniImageNet, TieredImageNet, Cifar100, FC100 and CUB. The input images are resized to 84*84 for training, the momentum parameter is set to 0.95, the learning rate is set to 0.0002, and the weight decay is 0.0001. The model learning procedure is accelerated using a NVIDIA GeForce RTX 3090 GPU device. In order to ensure the fairness of the comparison, during the training and testing phase, we adapt 5-way 1-shot and 5-way 5-shot setting. The experimental results show that on the 5-way 1-shot and 5-way 5-shot settings, and the image classification accuracy of miniImageNet, TieredImageNet, Cifar100, FC100 and CUB datasets are 68.92/84.45, 72.41/87.36, 76.79/88.52, 50.86/67.19, 81.12/91.43, respectively. Compared with other few-shot methods, Compared with the latest few-Shot image classification method, Our model has more advantages. At the same time, we carry out comparative experiments on miniImagenet, CUB and Cifar100 datasets to verify the effectiveness of the meta-cosine loss. From the comparative experimental results, under 1shot and 5shot setting, the introduction of MCL cosine classifier can improve the image classification accuracy by nearly 4% and 2%, respectively. It turns out that MCL has greatly improved the classification ability of the cosine classifier. Conclusion Our work proposes meta-cosine loss and combines global and local data augmentation methods to improve the generalization ability of the model, and suitable for any base-metric method.
Keywords
|