类别语义相似性监督的小样本图像识别
Few shot image recognition based on class semantic similarity supervision
- 2021年26卷第7期 页码:1594-1603
纸质出版日期: 2021-07-16 ,
录用日期: 2021-02-01
DOI: 10.11834/jig.200504
移动端阅览
浏览全部资源
扫码关注微信
纸质出版日期: 2021-07-16 ,
录用日期: 2021-02-01
移动端阅览
徐鹏帮, 桑基韬, 路冬媛. 类别语义相似性监督的小样本图像识别[J]. 中国图象图形学报, 2021,26(7):1594-1603.
Pengbang Xu, Jitao Sang, Dongyuan Lu. Few shot image recognition based on class semantic similarity supervision[J]. Journal of Image and Graphics, 2021,26(7):1594-1603.
目的
2
现有的深度学习模型往往需要大规模的训练数据,而小样本分类旨在识别只有少量带标签样本的目标类别。作为目前小样本学习的主流方法,基于度量的元学习方法在训练阶段大多没有使用小样本目标类的样本,导致这些模型的特征表示不能很好地泛化到目标类。为了提高基于元学习的小样本图像识别方法的泛化能力,本文提出了基于类别语义相似性监督的小样本图像识别方法。
方法
2
采用经典的词嵌入模型GloVe(global vectors for word representation)学习得到图像数据集每个类别英文名称的词嵌入向量,利用类别词嵌入向量之间的余弦距离表示类别语义相似度。通过把类别之间的语义相关性作为先验知识进行整合,在模型训练阶段引入类别之间的语义相似性度量作为额外的监督信息,训练一个更具类别样本特征约束能力和泛化能力的特征表示。
结果
2
在miniImageNet和tieredImageNet两个小样本学习基准数据集上进行了大量实验,验证提出方法的有效性。结果显示在miniImageNet数据集5-way 1-shot和5-way 5-shot设置上,提出的方法相比原型网络(prototypical networks)分类准确率分别提高1.9%和0.32%;在tieredImageNet数据集5-way 1-shot设置上,分类准确率相比原型网络提高0.33%。
结论
2
提出基于类别语义相似性监督的小样本图像识别模型,提高小样本学习方法的泛化能力,提高小样本图像识别的准确率。
Objective
2
Deep learning has made remarkable achievements in many fields such as image recognition
object detection
and speech recognition. However
most of the extraordinary achievements of these models depend on extraordinary data size. Existing deep-learning models often need large-scale training data. Building large-scale training data sets not only necessitates a large amount of manpower and material resources but are also not feasible in scenarios such as obtaining a large number of rare image class data samples. Inspired by the fact that human children can learn how to distinguish an object through a small number of samples
few-shot image classification aims to identify target categories with only a few labeled samples. Image recognition based on few-shot learning solves the problem in which a deep learning model needs large-scale training data. At present
the mainstream methods of few-shot image recognition are based on meta learning
which mainly includes three methods: meta learning based on metric
meta learning based on optimization
and meta learning based on model. The method of meta learning is divided into two stages: training and testing. However
most of the metric-based meta-learning methods do not use few shots of the target class in the training stage
which leads to a lack of good generalization ability of these models. These metric-based meta-learning models often show high accuracy in the training stage
but the recognition effect for few-shot image categories in the test stage is poor. The deep feature representation learned by the models cannot be effectively generalized to the target class. To improve the generalization ability of the few-shot learning image recognition method
this study proposes a few-shot learning method based on class semantic similarity supervision.
Method
2
The method proposed in this paper mainly includes two parts: the first step is to obtain the class similarity matrix between the image dataset classes
and the second step is to use the class similarity matrix as additional supervision information to train the few-shot image recognition model. The details are as follows: a common crawl database containing one billion level webpage data is used to train an unsupervised word-vector learning algorithm GloVe model (global vectors for word representation)
which generates 300 dimensional vectors for every word. For classes whose names contain more than one word
we match all the words in the training GloVe model and find their word-embedding vectors. By averaging these word-embedding vectors
we obtain the word embedding vector of the class name. Then
the cosine distance between the word-embedding vectors of classes is used to represent the semantic similarity between classes. In addition to the negative logarithm loss caused by the category labels of the original prototypical networks
this study introduces the semantic similarity measure between categories as the extra supervision information in the training stage of the model to establish the implicit relationship between the source class and few-shot target class. This condition enables the model to have better generalization ability. Furthermore
the loss of class semantic similarity can constrain the features of samples within and between classes learned by the model so that the sample features within each class are more similar
and the distribution of sample features between different classes is more consistent with the semantic similarity between categories. By introducing the loss of class semantic similarity to supervise the training process of the model
our proposed model can implicitly learn the relationship between different classes and obtain a feature representation with more constraint and generalization abilities of class sample features.
Result
2
This study compared the proposed model with several state-of-the-art few-shot image classification models
including prototypical
matching
and relation networks and other classic methods. In this study
a large number of experiments are conducted on miniImageNet and tieredImageNet. The results show that the proposed method is effective and competitive with the current advanced methods. To ensure fair comparison with the advanced methods
the classical paradigm of meta learning is used to train and test the model
and many experiments are conducted on the widely used 5-way 1-shot and 5-way 5-shot settings. The experimental results show that on the 5-way 1-shot and 5-way 5-shot settings of the miniImageNet dataset
the classification accuracy of the proposed method is improved by 1.9% and 0.32%
respectively
compared with the classical few-shot image recognition meta-learning method prototypical networks. In the tieredImageNet dataset on the 5-way 1-shot setting
the classification accuracy rate is improved by 0.33% compared with that in the prototypical networks. On the 5-way 5-shot setting of the tieredImageNet dataset
the proposed model achieves a competitive result compared with the prototypical networks. At the same time
several ablation experiments are conducted to verify the effectiveness of the key modules of the proposed method
and the influence of prior information of class semantic similarity on the experimental results is analyzed from multiple perspectives.
Conclusion
2
This study proposed a few-shot image recognition model based on class semantic similarity supervision
which improves the generalization ability and class-feature constraint ability of the few-shot image recognition model. Experimental results show that the proposed method improves the accuracy of few-shot image recognition.
小样本学习图像识别特征表示类别语义相似性监督泛化能力
few shot learningimage recognitionfeature representationclass semantic similarity supervisiongeneralization ability
Deng J, Dong W, Socher R, Li L J, Li K and Li F F. 2009. ImageNet: a large-scale hierarchical image database//Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami, USA: IEEE: 248-255[DOI: 10.1109/CVPR.2009.5206848http://dx.doi.org/10.1109/CVPR.2009.5206848]
Finn C, Abbeel P and Levine S. 2017. Model-agnosticmeta-learning for fast adaptation of deep networks[EB/OL]. [2020-05-15].https://arxiv.org/pdf/1703.03400.pdfhttps://arxiv.org/pdf/1703.03400.pdf
Garcia V and Bruna J. 2018. Few-shot learning with graph neural networks[EB/OL]. [2020-05-15].https://arxiv.org/pdf/1711.04043.pdfhttps://arxiv.org/pdf/1711.04043.pdf
He K M, Zhang X Y, Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 770-778[DOI: 10.1109/CVPR.2016.90http://dx.doi.org/10.1109/CVPR.2016.90]
Munkhdalai T and Yu H. 2017. Meta networks//Proceedings of the 34th International Conference on Machine Learning. Sydney, Australia: 2554-2563
Nichol A, Achiam J and Schulman J. 2018. On first-order meta-learning algorithms[EB/OL]. [2020-05-15].https://arxiv.org/pdf/1803.02999.pdfhttps://arxiv.org/pdf/1803.02999.pdf
Oreshkin B N, Rodriguez P and Lacoste A. 2018. TADAM: task dependent adaptive metric for improved few-shot learning//Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montréal, Canada: Curran Associates Inc. : 721-731
Pennington J, Socher R and Manning C. 2014. GloVe: global vectors for word representation//Proceedings of 2014 Conference on Empirical Methods in Natural Language Processing. Doha, Qatar: Association for Computational Linguistics: 1532-1543[DOI: 10.3115/v1/D14-1162http://dx.doi.org/10.3115/v1/D14-1162]
Rahman S, Khan S and Porikli F. 2018. A unified approach for conventional zero-shot, generalized zero-shot, and few-shot learning. IEEE Transactions on Image Processing, 27(11): 5652-5667[DOI:10.1109/TIP.2018.2861573]
Ravi S and Larochelle H. 2016. Optimization as a model for few-shot learning[EB/OL]. [2020-05-15].https://openreview.net/pdf?id=rJY0-Kcllhttps://openreview.net/pdf?id=rJY0-Kcll
Ren M Y, Triantafillou E, Ravi S, Snell J, Swersky K, Tenenbaum J B, Larochelle H and Zemel R S. 2018. Meta-learning for semi-supervised few-shot classification[EB/OL]. [2020-05-15].https://arxiv.org/pdf/1803.00676.pdfhttps://arxiv.org/pdf/1803.00676.pdf
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Hong Z H, Karpathy A, Khosla A, Bernstein M, Berg A C and Li F F. 2015. ImageNet large scale visual recognition challenge. International Journal of Computer Vision, 115(3): 211-252[DOI:10.1007/s11263-015-0816-y]
Rusu A A, Rao D, Sygnowski J, Vinyals O, Pascanu R, Osindero S and Hadsell R. 2019. Meta-learning with latent embedding optimization[EB/OL]. [2020-05-15].https://arxiv.org/pdf/1807.05960.pdfhttps://arxiv.org/pdf/1807.05960.pdf
Santoro A, Bartunov S, Botvinick M, Wierstra D and Lillicrap T. 2016. One-shot learning with memory-augmented neural networks[EB/OL]. [2020-05-15].https://arxiv.org/pdf/1605.06065v1.pdfhttps://arxiv.org/pdf/1605.06065v1.pdf
Schroff F, Kalenichenko D and Philbin J. 2015. FaceNet: a unified embedding for face recognition and clustering[EB/OL]. [2020-05-15].https://arxiv.org/pdf/1503.03832v3.pdfhttps://arxiv.org/pdf/1503.03832v3.pdf
Snell J, Swersky K and Zemel R S. 2017. Prototypical networks for few-shot learning[EB/OL]. [2020-05-15].https://arxiv.org/pdf/1703.05175.pdfhttps://arxiv.org/pdf/1703.05175.pdf
Sung F, Yang Y X, Zhang L, Xiang T, Torr P H S and Hospedales T M. 2018. Learning to compare: relation network for few-shot learning//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 1199-1208[DOI: 10.1109/CVPR.2018.00131http://dx.doi.org/10.1109/CVPR.2018.00131]
Szegedy C, Ioffe S, Vanhoucke V and Alemi A. 2016. Inception-v4, inception-resnet and the impact of residual connections on learning[EB/OL]. [2020-05-15].https://arxiv.org/pdf/1602.07261v2.pdfhttps://arxiv.org/pdf/1602.07261v2.pdf
Tseng H Y, Lee H Y, Huang J B and Yang M H. 2020. Cross-domain few-shot classification via learned feature-wise transformation[EB/OL]. [2020-05-15].https://arxiv.org/pdf/2001.08735.pdfhttps://arxiv.org/pdf/2001.08735.pdf
van der Maaten L and Hinton G. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research, 9(86): 2579-2605
Vinyals O, Blundell C, Lillicrap T, Kavukcuoglu K and Wierstra D. 2016. Matching networks for one shot learning//Proceedings of the 30th International Conference on Neural Information Processing Systems. Barcelona, Spain: Curran Associates Inc. : 3630-3638
相关作者
相关机构