局部特征融合的小样本分类
Local feature fusion network-based few-shot image classification
- 2023年28卷第7期 页码:2093-2104
纸质出版日期: 2023-07-16
DOI: 10.11834/jig.220079
移动端阅览
浏览全部资源
扫码关注微信
纸质出版日期: 2023-07-16 ,
移动端阅览
董杨洋, 宋蓓蓓, 孙文方. 2023. 局部特征融合的小样本分类. 中国图象图形学报, 28(07):2093-2104
Dong Yangyang, Song Beibei, Sun Wenfang. 2023. Local feature fusion network-based few-shot image classification. Journal of Image and Graphics, 28(07):2093-2104
目的
2
小样本学习是一项具有挑战性的任务,旨在利用有限数量的标注样本数据对新的类别数据进行分类。基于度量的元学习方法是当前小样本分类的主流方法,但往往仅使用图像的全局特征,且模型分类效果很大程度上依赖于特征提取网络的性能。为了能够充分利用图像的局部特征以及提高模型的泛化能力,提出一种基于局部特征融合的小样本分类方法。
方法
2
首先,将输入图像进行多尺度网格分块处理后送入特征提取网络以获得局部特征;其次,设计了一个基于Transformer架构的局部特征融合模块来得到包含全局信息的局部增强特征,以提高模型的泛化能力;最后,以欧几里得距离为度量,计算查询集样本特征向量与支持集类原型之间的距离,实现分类。
结果
2
在小样本分类中常用的3个数据集上与当前先进的方法进行比较,在5-way 1-shot和5-way 5-shot的设置下相对次优结果,所提方法在MiniImageNet数据集上的分类精度分别提高了2.96%和2.9%,在CUB(Caltech-UCSD Birds-200-2011)数据集上的分类精度分别提高了3.22%和1.77%,而在TieredImageNet数据集上的分类精度与最优结果相当,实验结果表明了所提方法的有效性。
结论
2
提出的小样本分类方法充分利用了图像的局部特征,同时改善了模型的特征提取能力和泛化能力,使小样本分类结果更为准确。
Objective
2
The emerging convolutional neural network based (CNN-based) deep learning technique is beneficial for image context like its recognition, detection, segmentation and other related fields nowadays. However, the learning ability of CNN is often challenged for a large number of labeled samples. The CNN model is disturbed of over-fitting problem due to insufficient labeled sample for some categories. The collection task of labeled samples is time-consuming and costly. However, human-related perception has its ability to learn from a small number of samples. For example, it will be easily recognized other related new images in these categories even under a few pictures of each image category circumstances. To make CNN model have the learning ability similar to human, a new machine learning algorithm is concerned about more, called few-shot learning. Few-shot learning can be used to classify new image categories in terms of a limited amount of annotation data. Current metric-based meta-learning methods can be as one of the effective task for few-shot learning methods. However, it is implemented on the basis of global features, which cannot represent the image structure adequately. More local feature information is required to be involved in as well, which can provide discriminative and transferable information across categories. Furthermore, there are some local features representation-derived metric methods can be used to obtain pixel-level deep local descriptors as the local feature representation of image via removing the last global average pooling layer in CNN. However, local descriptors are depth but the classification effect is restricted by sacrificed contextual information of the image. Additionally, for the feature extraction network, due to limited labeled instances, it is challenged to learn a good feature representation and generalize new categories. To utilize the local features of image and improve the generalization ability of the model, we develop a few-shot classification method in terms of local feature fusion.
Method
2
First, to obtain local features, the input image is divided into
H
×
W
local blocks and then transferred to the feature extraction network. This feature representation-related method can demonstrate local information of the image and its context information. Multi-scale grid blocks are illustrated as well. Second, to learn and fuse the relationship between multiple local feature representations, we design a Transformer architecture based local feature fusion module because the self-attention mechanism in Transformer can capture and fuse the relationship between input sequences effectively. Each local feature consists of the information of other local features and it has fusion-after simultaneous global information. And, we concatenate the multiple local feature representations of each image as the final output. The feature representation of the original input image is enhanced and the generalization ability of the model can be improved after that. Finally, the Euclidean distance between the query image embedding and the support class prototype is calculated to classify the query image. Our training process is divided into two steps: pre-training and meta-training. For the pre-training stage, the Sofamax layer-attached backbone network is used to classify all images of the training set. To improve the generalization ability of the model, we use the data-augmented methods of random cropping, horizontal flipping and color jittering. After the pre-training, the backbone network in the model is initialized with pre-trained weights, and other components are then fine-tuned. For meta-learning stage, the episode training strategy is implemented for training. To make a fair comparison with other few-shot classification methods, the ResNet12 structure is used as the feature extractor of the backbone network, and the cross entropy loss of classification is optimized through stochastic gradient descent (SGD). The initial learning rate of the model is set to 5 × 10
-4
, and we set 100 epochs in total, the learning rate is decreased by half every 10 epochs, 100 episodes and 600 episodes for training and validate in each epoch. The domain difference is larger since there are more samples in TieredImageNet
dataset, more iteration is required to make the model convergent. Therefore, we set 200 epochs, and the learning rate is decreased by half very 20 epochs. In the test stage, to evaluate the average classification accuracy, such 5 000 episodes are selected from the test set randomly.
Result
2
Comparative analysis is based on three benchmark datasets in few-shot classification. For MiniImageNet dataset, each average classification accuracy is optimized by 2.96% and 2.9% under 5-way 1-shot and 5-way 5-shot settings. For CUB dataset, each of average classification accuracy is increased by 3.22% and 1.77%. For
TieredImageNet dataset, the proposed method is equivalent to the state-of-the-art method in average classification accuracy. To fully verify the effectiveness of the proposed method, a large number of ablation experiments are also carried out as well.
Conclusion
2
We develop a local feature fusion-based method for few-shot classification. It is beneficial to make sufficient local features and the feature extraction ability and generalization ability of the model can be optimized as well. Our Transformer architecture based local feature fusion module can enhance feature representation further, which can be embedded into other few-shot classification methods potentially.
小样本学习度量学习局部特征Transformer特征融合
few-shot learningmetric learninglocal featureTransformerfeature fusion
Antoniou A, Edwards H and Storkey A. 2019. How to train your MAM [EB/OL]. [2022-01-12]. https://arxiv.org/pdf/1810.09502.pdfhttps://arxiv.org/pdf/1810.09502.pdf
Chen Y B, Liu Z, Xu H J, Darrell T and Wang X L. 2021b. Meta-baseline: exploring simple meta-learning for few-shot learning [EB/OL]. [2022-01-12]. https://arxiv.org/pdf/2003.04390.pdfhttps://arxiv.org/pdf/2003.04390.pdf
Chen Z Y, Ge J X, Zhan H S, Huang S T and Wang D L. 2021a. Pareto self-supervised training for few-shot learning [EB/OL]. [2022-01-12]. https://arxiv.org/pdf/2104.07841.pdfhttps://arxiv.org/pdf/2104.07841.pdf
Finn C, Abbeel P and Levine S. 2017. Model-agnostic meta-learning for fast adaptation of deep networks//Proceedings of the 34th International Conference on Machine Learning. Sydney, Australia: JMLR.org: 1126-1135
Gu J X, Wang Z H, Kuen J, Ma L Y, Shahroudy A, Shuai B, Liu T, Wang X X, Wang G, Cai J F and Chen T. 2018. Recent advances in convolutional neural networks. Pattern Recognition, 77: 354-377 [DOI: 10.1016/j.patcog.2017.10.013http://dx.doi.org/10.1016/j.patcog.2017.10.013]
He K M, Zhang X Y, Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 770-778 [DOI: 10.1109/CVPR.2016.90http://dx.doi.org/10.1109/CVPR.2016.90]
Hou R B, Chang H, Ma B P, Shan S G and Chen X L. 2019. Cross attention network for few-shot classification [EB/OL]. [2022-01-12]. https://arxiv.org/pdf/1910.07677.pdfhttps://arxiv.org/pdf/1910.07677.pdf
Jamal M A and Qi G J. 2019. Task agnostic meta-learning for few-shot learning//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 11711-11719 [DOI: 10.1109/CVPR.2019.01199http://dx.doi.org/10.1109/CVPR.2019.01199]
Krizhevsky A, Sutskever I and Hinton G E. 2017. ImageNet classification with deep convolutional neural networks. Communications of the ACM, 60(6): 84-90 [DOI: 10.1145/3065386http://dx.doi.org/10.1145/3065386]
Lake B M, Salakhutdinov R and Tenenbaum J B. 2013. One-shot learning by inverting a compositional causal process//Proceedings of the 26th International Conference on Neural Information Processing Systems. Lake Tahoe, Nevada: Curran Associates Inc.: 2526-2534
Lee K, Maji S, Ravichandran A and Soatto S. 2019. Meta-learning with differentiable convex optimization//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 10649-10657 [DOI: 10.1109/CVPR.2019.01091http://dx.doi.org/10.1109/CVPR.2019.01091]
Li H Y, Eigen D, Dodge S, Zeiler M and Wang X G. 2019b. Finding task-relevant features for few-shot learning by category traversal//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 1-10 [DOI: 10.1109/CVPR.2019.00009http://dx.doi.org/10.1109/CVPR.2019.00009]
Li K, Zhang Y L, Li K P and Fu Y. 2020. Adversarial feature hallucination networks for few-shot learning//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 13467-13476 [DOI: 10.1109/CVPR42600.2020.01348http://dx.doi.org/10.1109/CVPR42600.2020.01348]
Li W B, Wang L, Xu J L, Huo J, Gao Y and Luo J B. 2019a. Revisiting local descriptor based image-to-class measure for few-shot learning//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 7253-7260 [DOI: 10.1109/CVPR.2019.00743http://dx.doi.org/10.1109/CVPR.2019.00743]
Lifchitz Y, Avrithis Y, Picard S and Bursuc A. 2019. Dense classification and implanting for few-shot learning//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 9250-9259 [DOI: 10.1109/CVPR.2019.00948http://dx.doi.org/10.1109/CVPR.2019.00948]
Liu B, Cao Y, Lin Y T, Li Q, Zhang Z, Long M S and Hu H. 2020. Negative margin matters: understanding margin in few-shot classification//Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer: 438-455 [DOI: 10.1007/978-3-030-58548-8_26http://dx.doi.org/10.1007/978-3-030-58548-8_26]
Lu S, Ye H J and Zhan D C. 2021. Tailoring embedding function to heterogeneous few-shot tasks by global and local feature adaptors. Proceedings of 2021 AAAI Conference on Artificial Intelligence, 35(10): 8776-8783 [DOI: 10.1609/aaai.v35i10.17063http://dx.doi.org/10.1609/aaai.v35i10.17063]
Munkhdalai T and Yu H. 2017. Meta networks//Proceedings of the 34th International Conference on Machine Learning. Sydney, Australia: JMLR.org: 2554-2563
Oreshkin B N, Rodriguez P and Lacoste A. 2018. TADAM: task dependent adaptive metric for improved few-shot learning//Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montréal, Canada: Curran Associates Inc.: 719-729
Ravi S and Larochelle H. 2017. Optimization as a model for few-shot learning//Proceedings of the 5th International Conference on Learning Representations. Toulon, France: ICLR: 1-11
Ren M Y, Triantafillou E, Ravi S, Snell J, Swersky K, Tenenbaum J B, Larochelle H and Zemel R S. 2018. Meta-learning for semi-supervised few-shot classification [EB/OL]. [2022-01-12]. https://arxiv.org/pdf/1803.00676.pdfhttps://arxiv.org/pdf/1803.00676.pdf
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S A, Huang Z H, Karpathy A, Khosla A, Bernstein M, Berg A C and Li F F. 2015. ImageNet large scale visual recognition challenge. International Journal of Computer Vision, 115(3): 211-252 [DOI: 10.1007/s11263-015-0816-yhttp://dx.doi.org/10.1007/s11263-015-0816-y]
Rusu A A, Rao D, Sygnowski J, Vinyals O, Pascanu R, Osindero S and Hadsell R. 2019. Meta-learning with latent embedding optimization [EB/OL]. [2022-01-12]. https://arxiv.org/pdf/1807.05960.pdfhttps://arxiv.org/pdf/1807.05960.pdf
Santoro A, Bartunov S, Botvinick M, Wierstra D and Lillicrap T. 2016. Meta-learning with memory-augmented neural networks//Proceedings of the 33rd International Conference on Machine Learning. New York, USA: JMLR.org: 1842-1850
Shen Z Q, Liu Z C, Qin J, Savvides M and Cheng K T. 2021. Partial is better than all: revisiting fine-tuning strategy for few-shot learning [EB/OL]. [2022-01-12]. https://arxiv.org/pdf/2102.03983.pdfhttps://arxiv.org/pdf/2102.03983.pdf
Snell J, Swersky K and Zemel R. 2017. Prototypical networks for few-shot learning//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA: Curran Associates Inc.: 4080-4090
Sun Q R, Liu Y Y, Chua T S and Schiele B. 2019. Meta-transfer learning for few-shot learning//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 403-412 [DOI: 10.1109/CVPR.2019.00049http://dx.doi.org/10.1109/CVPR.2019.00049]
Sung F, Yang Y X, Zhang L, Xiang T, Torr P H S and Hospedales T M. 2018. Learning to compare: relation network for few-shot learning//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake, USA: IEEE: 1199-1208 [DOI: 10.1109/CVPR.2018.00131http://dx.doi.org/10.1109/CVPR.2018.00131]
Thrun S and Pratt L. 1998. Learning to learn: introduction and overview//Thrun S and Pratt L, eds. Learning to Learn. New York, USA: Springer: 3-17 [DOI: 10.1007/978-1-4615-5529-2_1http://dx.doi.org/10.1007/978-1-4615-5529-2_1]
van der Maaten L and Hinton G. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research, 9(86): 2579-2605
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser Ł and Polosukhin I. 2017. Attention is all you need//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA: Curran Associates Inc.: 6000-6010
Vilalta R and Drissi Y. 2002. A perspective view and survey of meta-learning. Artificial Intelligence Review, 18(2): 77-95 [DOI: 10.1023/A:1019956318069http://dx.doi.org/10.1023/A:1019956318069]
Vinyals O, Blundell C, Lillicrap T, Kavukcuoglu K and Wierstra D. 2016. Matching networks for one shot learning//Proceedings of the 30th International Conference on Neural Information Processing Systems. Barcelona, Spain: Curran Associates Inc.: 3637-3645
Wah C, Branson S, Welinder P, Perona P and Belongie S. 2011. The caltech-UCSD birds-200-2011 dataset.Computation and Neural Systems Technical Report
Wang Y Q, Yao Q M, Kwok J T and Ni L M. 2020. Generalizing from a few examples: a survey on few-shot learning. ACM Computing Surveys, 53(3): #63 [DOI: 10.1145/3386252http://dx.doi.org/10.1145/3386252]
Xing C, Rostamzadeh N, Oreshkin B N and Pinheiro P O. 2019. Adaptive cross-modal few-shot learning//Proceedings of the 33rd International Conference on Neural Information Processing Systems. Vancouver, Canada: NIPS: 4847-4857
Ye H J, Hu H X, Zhan D C and Sha F. 2018. Learning embedding adaptation for few-shot learning [EB/OL]. [2022-01-12]. https://arxiv.org/pdf/1812.03664v1.pdfhttps://arxiv.org/pdf/1812.03664v1.pdf
Ye H J, Hu H X, Zhan D C and Sha F. 2020. Few-shot learning via embedding adaptation with set-to-set functions//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 8805-8814 [DOI: 10.1109/CVPR42600.2020.00883http://dx.doi.org/10.1109/CVPR42600.2020.00883]
Zhang C, Cai Y J, Lin G S and Shen C H. 2020. DeepEMD: few-shot image classification with differentiable earth mover’s distance and structured classifiers//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 12200-12210 [DOI: 10.1109/CVPR42600.2020.01222http://dx.doi.org/10.1109/CVPR42600.2020.01222]
相关作者
相关机构