多层自适应聚合的自监督小样本图像分类

吕佳; 巫若愚

发布时间： 2023-04-20
摘要点击次数： 1158
全文下载次数： 723
DOI: 10.11834/jig.211182
2023 | Volume 28 | Number 4

多层自适应聚合的自监督小样本图像分类

吕佳^1,2, 巫若愚^1,2(1. 重庆师范大学计算机与信息科学学院, 重庆 401331;2.
2. 重庆师范大学重庆市数字农业服务工程技术研究中心, 重庆 401331)

摘要

目的在图像分类领域，小样本学习旨在利用从大规模数据集中训练到的知识来处理仅包含少量有标记训练样本的下游分类任务。通常情况下，下游任务只涉及新类样本，由于元训练阶段会构造大量任务随机抽取训练集中不同类别的样本且训练集与测试集类别间存在领域间隙，因此模型训练周期长且可能对训练集过拟合，以致元知识无法迁移到测试集，进而导致模型泛化性差。针对以上问题，提出一种多层自适应聚合的自监督小样本图像分类模型。方法首先使用分组卷积对残差块进行改进，减少神经网络参数量，降低训练难度，缩短训练时间；然后采用多层自适应聚合的方法改进骨干网络，对网络各层语义信息加以提炼聚合，自适应分配各层权重，将聚合后的特征图作为后续分类的依据；最后加入自监督对比学习结合有监督学习挖掘样本自身潜在的信息，从而提升样本特征表达能力。结果在mini-ImageNet数据集和CUB（Caltech-UCSD birds-200-2011）数据集上与当前主流模型进行分类效果对比实验，与baseline相比，所提模型的准确率在mini-ImageNet数据集的5-way 1-shot与5-way 5-shot实验上分别提升了6.31%和6.04%，在CUB数据集的5-way 1-shot与5-way 5-shot实验上分别提升了8.95%和8.77%。结论本文模型能在一定程度上缩短训练时间、增强样本特征表达能力和优化数据分布，并缓解领域间隙所带来的问题，从而提高模型泛化性与分类效果。

关键词

小样本学习图像分类自适应聚合自监督学习对比学习

Multi-layer adaptive aggregation self-supervised few-shot learning image classification

Lyu Jia^1,2, Wu Ruoyu^1,2(1. College of Computer and Information Sciences, Chongqing Normal University, Chongqing 401331, China;2.
2. Chongqing Research Center on Engineer Technology of Digital Agricultural & Services, Chongqing Normal University, Chongqing 401331, China)

Abstract

Objective The emerging deep learning technique has facilitated such artificial intelligence （AI）-related domains like image classification，natural language processing，speech recognition，and reinforcement learning. However， it is being challenged for the over-fitting problem of the models. Effective data is required to be obtained from mass，especially in the tasks of image classification. To tackle this problem，the concept of few-shot learning is developed，which aims at well-generalized knowledge learned from a large-scale dataset to handle downstream classification tasks with little training samples. Currently，most popular methods for few-shot image classification are based on meta-learning，which can be learnt to deal with few-shot tasks via similar classification tasks. The process of meta-learning is divided into two steps： 1）meta training，and 2）meta testing. For the meta training，an embedding network is trained by the meta-training set， and it is used to tackle a few training data-constructed downstream classification tasks from the meta-testing set. There is no intersection between meta-training set and meta-testing set，which means that no reliable prior knowledge is obtained by meta-learner in the meta-training process. Due to category differences between meta-training and meta-testing，some new challenges to meta-learning models are required to be resolved. If it focuses on training tasks only，the effectiveness of models will be affected when the meta-learner meets with the few-shot tasks with brand-new categories. To tackle this challenge with metric-based methods，we develop a multi-layer adaptive aggregation self-supervised few-shot classification model. Method First，to reduce the parameters of the backbone and lower the training difficulty，a group of convolution blocks are used to replace the original convolution. Next，to improve the backbone，the multi-layer adaptive aggregation module is illustrated，which can refine the information of each network layer dynamically and balance the weights of each layer adaptively via the aggregated feature maps of those are the basis for subsequent downstream few-shot classification. Finally，to enhance the transferability of the learned model，the self-supervised contrastive learning is introduced to assist supervised learning to mine the potential information of the data themselves. It does not suffer from over-fitting because contrastive learning is not required to be supervised. It can be as an additional source of regularization as well，which is beneficial for the construction of feature space. The embedding networks can be paid more attention to learn the well-generalized knowledge in terms of the proposed self-supervised contrastive learning method，which makes the distribution of embedding feature maps smoother and the classification model is more suitable for the domain of downstream tasks. Result To validate the effectiveness of the proposed model，comparative analysis is carried out with some popular models，including 1）prototype network， 2）relation network，3）cosine classifier，as well as the 4）mini-ImageNet dataset and 5）Caltech-UCSD birds-200-2011 （CUB）dataset. For the mini-ImageNet dataset，each accuracy of the proposed model can be reached to 63. 13% on 5-way 1-shot and 78. 14% on 5-way 5-shot，it can be optimized by 13. 71% and 9. 94% each for the original Prototype network. For the fine-grained CUB dataset，the accuracies of the proposed model can be reached to 75. 93% on 5-way 1-shot and 87. 56% on 5-way 5-shot，which are 24. 48% and 13. 05% higher than the original Prototype network of each. Compared to the baseline on 5-way 1-shot and 5-way 5-shot，each of model accuracy is increased by 6. 31%，6. 04% on mini-ImageNet， and they are increased by 8. 95%，8. 77% of each as well on CUB. The comparative experiments demonstrate that the parameters of our backbone are optimized in comparison with the parameters of 5 backbones on Prototype network. A couple of ablation experiments are also conducted to verify the proposed model. Additionally，the heat maps-related contrastive experiment between baseline and the proposed model verifies that our model can prevent the embedding network from more background information of images and alleviate the interference of downstream classification tasks. Furthermore，the t-SNE method is used for visualization to sort the distribution of samples out in the feature space. The obtained feature distribution of t-SNE visualization experiment on CUB dataset can demonstrate that our model is capable of differentiating samples well from different categories，which can make the meta-testing set linearly separable. Conclusion To resolve some problems in the field of few-shot learning，we develop a multi-layer adaptive aggregation self-supervised few-shot classification model. To alleviate the problem of training difficulty，the improved group convolution can be used to reduce the parameters of backbone. To optimize over-fitting and domain gap，the multi-layer adaptive aggregation method and the self-supervised contrastive learning method are used to adjust the distribution of embedding feature maps. In particular，the embedding networks are not be affected by the background-redundant of images based on our self-supervised contrastive learning method.

Keywords

few-shot learning image classification adaptive aggregation self-supervised learning contrastive learning

在线采编平台

在线出版

年度会议

下载中心

年度信息