代表特征网络的小样本学习方法

汪荣贵; 郑岩; 杨娟; 薛丽霞

doi:10.11834/jig.180629

图像分析和识别 | 浏览量 : 0 下载量: 242 CSCD: 2

PDF
导出
分享
收藏
专辑

代表特征网络的小样本学习方法
Representative feature networks for few-shot learning
2019年24卷第9期页码：1514-1527
收稿：2018-11-12，

修回：2019-4-16，

纸质出版：2019-09-16
DOI： 10.11834/jig.180629
稿件说明：

移动端阅览

汪荣贵, 郑岩, 杨娟, 薛丽霞. 代表特征网络的小样本学习方法[J]. 中国图象图形学报, 2019,24(9):1514-1527. DOI： 10.11834/jig.180629.

Ronggui Wang, Yan Zheng, Juan Yang, Lixia Xue. Representative feature networks for few-shot learning[J]. Journal of Image and Graphics, 2019, 24(9): 1514-1527. DOI： 10.11834/jig.180629.

摘要

目的

小样本学习任务旨在仅提供少量有标签样本的情况下完成对测试样本的正确分类。基于度量学习的小样本学习方法通过将样本映射到嵌入空间，计算距离得到相似性度量以预测类别，但未能从类内多个支持向量中归纳出具有代表性的特征以表征类概念，限制了分类准确率的进一步提高。针对该问题，本文提出代表特征网络，分类效果提升显著。

方法

代表特征网络通过类代表特征的度量学习策略，利用类中支持向量集学习得到的代表特征有效地表达类概念，实现对测试样本的正确分类。具体地说，代表特征网络包含两个模块，首先通过嵌入模块提取抽象层次高的嵌入向量，然后堆叠嵌入向量经过代表特征模块得到各个类代表特征。随后通过计算测试样本嵌入向量与各类代表特征的距离以预测类别，最后使用提出的混合损失函数计算损失以拉大嵌入空间中相互类别间距减少相似类别错分情况。

结果

经过广泛实验，在Omniglot、miniImageNet和Cifar100数据集上都验证了本文模型不仅可以获得目前已知最好的分类准确率，而且能够保持较高的训练效率。

结论

代表特征网络可以从类中多个支持向量有效地归纳出代表特征用于对测试样本的分类，对比直接使用支持向量进行分类具有更好的鲁棒性，进一步提高了小样本条件下的分类准确率。

Abstract

Objective

Few-shot learning aims to build a classifier that recognizes new unseen classes given only a few samples. The solutions are mainly in the following categories:data augmentation

meta-learning

and metric learning. Data augmentation can be used to reduce certain over-fitting given a limited data regime in a new class. The corresponding solution is to augment data in the feature domain as hallucinating features. These methods exert a certain effect on few-shot classification. However

due to the extremely small data space

the transformation mode is considerably limited and cannot solve over-fitting problems. The meta-learning method is suitable for few-shot learning because it is based on the high-level strategy of learning similar tasks. Some methods learn good initial values

some learn task-level update strategies

and others construct external memory storages to remember past information for comparison during testing. The few-shot classification results of these methods are superior

but the network structure is increasingly complicated due to the use of RNNs(recurrent neural networks). The efficiency is also low. The metric learning method is simple and efficient. It first maps a sample to the embedding space and then computes the distance to obtain the similarity metric to predict the category. Some approaches improve the representation of features in the embedding space

some use learnable distance metrics to compute distance for loss

and others combine meta-learning methods to improve accuracy. However

this type of method fails to summarize representative features from multiple support vectors in a class to effectively represent the class concept. This drawback limits the further improvement of the accuracy of small sample classification. To address this problem

this study proposes a representative feature network.

Method

The representative feature network is a metric learning strategy based on class representative features. It uses the representative features learned from a support vector set in a class to express the class concept effectively. It also uses mixture loss to reduce the misclassification of similar classes and thus achieve excellent classification results. Specifically

the representative feature network includes two modules. The embedded vector of a high abstraction level is extracted by the embedded module

and then the representative feature per class is obtained by the representative feature module by inputting stacked support vector sets. The class representative feature fully considers the influence of the embedded vector of the support samples on the basis of the target that may or may not be obvious. The use of network learning to assign different weights to each embedded support vector can effectively avoid misclassification caused by the bias effects of representative features for unobvious target samples. Then

the distances from the embedded query vectors to each class representative feature are calculated to predict the class. In addition

the mixture loss function is proposed for the misclassification of similar classes in the embedded space. The cross-entropy loss combined with the relative error loss function is used to increase the inter-class distances and reduce the similar class error rate.

Result

After extensive experiments

the Omniglot

miniImageNet

and Cifar100 datasets verify that the model achieves state-of-the-art results. For the simple Omniglot dataset

the five-way

five-shot classification accuracy is 99.7%

which is 1% higher than that of the original matching network. For the complex miniImageNet dataset

the five-way

five-shot classification accuracy is 75.83%

which is approximately 18% higher than that of the original matching network. Representative features provide approximately 8% improvement

indicating that it can effectively express the prototype by distinguishing the contribution of different support vectors

the target of which may or may not be obvious. Mixture loss provides approximately 1% improvement

indicating that it can reduce some misclassification of similar classes in the testing set. However

the improvement is unremarkable because similar samples are uncommon in the dataset. The last 9% improvement is due to the fine-tuning on the test set

indicating that the advantage of the skip connection method benefits loss propagation relative to the original connection between the network module methods. For the Cifar100 dataset

the five-way

five-shot classification accuracy is 87.99%

which is 20% higher than that of the original matching network. Moreover

the high training efficiency is maintained while the performance is significantly improved.

Conclusion

To address the problem of extremely simple original embedding networks for extracting high-level features of samples

the improved embedding networks in a representative feature network use a skip connection structure so as to deepen the network and extract advanced features. To address the problem of the noise support vector that disturbs the classification accuracy of a testing sample

the representative feature network can effectively summarize the representative features from multiple support vectors in a class for classifying query samples. Compared with the performance when support vectors are used directly

the classification performance when representative features are used is more robust

and the classification accuracy under few-shot samples is further improved. In addition

the mixture loss function proposed for the classification problem of similar classes is used to enlarge the distance between categories in the embedded space and reduce the misclassification of similar classes. Detailed experiments are carried out to verify that these improved methods achieve great performance in few-shot learning tasks for the Omniglot

miniImageNet

and Cifar100 datasets. At the same time

the representative feature network presents improvement. For embedding networks

advanced structures

such as dense connections or se modules

must be included in future work to further improve the results.

关键词

Keywords

references

He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE, 2016: 770-778.[ DOI: 10.1109/CVPR.2016.90 http://dx.doi.org/10.1109/CVPR.2016.90 ]

Szegedy C, Liu W, Jia Y Q, et al. Going deeper with convolutions[C]//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE, 2015: 1-9.[ DOI: 10.1109/CVPR.2015.7298594 http://dx.doi.org/10.1109/CVPR.2015.7298594 ]

Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks[C]//Proceedings of the 25th International Conference on Neural Information Processing Systems. Lake Tahoe, USA: ACM, 2012: 1097-1105.

Weng Y C, Tian Y, Lu D M, et al. Fine-grained bird classification based on deep region networks[J]. Journal of Image and Graphics, 2017, 22(11):1521-1531.

翁雨辰, 田野, 路敦民, 等.深度区域网络方法的细粒度图像分类[J].中国图象图形学报, 2017, 22(11):1521-1531. [DOI:10.11834/jig.170262]

Ren S Q, He K M, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems. Montreal, Canada: ACM, 2015: 91-99.

Redmon J, Divvala S, Girshick R, et al. You only look once: unified, real-time object detection[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE, 2016: 779-788.[ DOI: 10.1109/CVPR.2016.91 http://dx.doi.org/10.1109/CVPR.2016.91 ]

Liu W, Anguelov D, Erhan D, et al. SSD: single shot multibox detector[C]//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, The Netherlands: Springer, 2016: 21-37.[ DOI: 10.1007/978-3-319-46448-0_2 http://dx.doi.org/10.1007/978-3-319-46448-0_2 ]

Zhao W Q, Yan H, Shao X Q. Object detection based on improved non-maximum suppression algorithm[J]. Journal of Image and Graphics, 2018, 23(11):1676-1685.

赵文清, 严海, 邵绪强.改进的非极大值抑制算法的目标检测[J].中国图象图形学报, 2018, 23(11):1676-1685. [DOI:10.11834/jig.180275]

Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]//Proceedings of Advances in Neural Information Processing Systems. California, USA: NIPS, 2017: 5998-6008.

Sutskever I, Vinyals O, Le Q V. Sequence to sequence learning with neural networks[C]//Proceedings of Advances in Neural Information Processing Systems. Montreal, Canada: NIPS, 2014: 3104-3112.

He D, Xia Y C, Qin T, et al. Dual learning for machine translation[C]//Proceedings of the Advances in 30th Conference on Neural Information Processing Systems. Barcelona, Spain: NIPS, 2016: 820-828.

Li Y C, Xiong D Y, Zhang M. A survey of neural machine translation[J]. Chinese Journal of Computers, 2018, 41(12):2734-2755.

李亚超, 熊德意, 张民.神经机器翻译综述[J].计算机学报, 2018, 41(12):2734-2755.

Koch G R, Zemel R, Salakhutdinov R. Siamese neural networks for one-shot imagerecognition[C]//Proceedings of the 32nd International Conference on Machine Learning. Lille Grande Palais, France: ICML, 2015: #2.

Li F F, Fergus R, Perona P. One-shot learning of object categories[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2006, 28(4):594-611.[DOI:10.1109/TPAMI.2006.79]

Lake B, Salakhutdinov R, Gross J, et al. One shot learning of simple visual concepts[C]//Proceedings of the Annual Meeting of the Cognitive Science Society. Boston, USA: CogSci, 2011: 2568-2573.

Bengio Y. Deep learning of representations for unsupervised and transfer learning[C]//Proceedings of ICML Workshop on Unsupervised and Transfer Learning. Bellevue, Washington, USA: ICML, 2011: 17-36.

Pan S J, Yang Q. A survey on transfer learning[J]. IEEE Transactions on Knowledge and Data Engineering, 2010, 22(10):1345-1359.[DOI:10.1109/TKDE.2009.191]

Luo Z L, Zou Y L, Hoffman J, et al. Label efficient learning of transferable representations acrosss domains and tasks[C]//Proceedings of Advances in Neural Information Processing Systems. Long Beach, USA: NIPS, 2017: 165-177.

Yosinski J, Clune J, Bengio Y, et al. How transferable are features in deep neural networks?[C]//Proceedings of Advances in Neural Information Processing Systems. Montreal, Canada: NIPS, 2014: 3320-3328.

Dixit M, Kwitt R, Niethammer M, et al. AGA: attribute-guided augmentation[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, Hawaii, USA: IEEE, 2017: 3328-3336.[ DOI: 10.1109/CVPR.2017.355 http://dx.doi.org/10.1109/CVPR.2017.355 ]

Hariharan B, Girshick R. Low-shot visual recognition by shrinking and hallucinating features[C]//Proceedings of 2017 IEEE Conference on International Computer Vision. Venice, Italy: IEEE, 2017: 3037-3046.[ DOI: 10.1109/ICCV.2017.328 http://dx.doi.org/10.1109/ICCV.2017.328 ]

Ravi S, Larochelle H. Optimization as a model for few-shot learning[C]//Proceedings of the 5th International Conference on Learning Representations. Toulon, France: ICLR, 2017.

Finn C, Abbeel P, Levine S. Model-agnostic meta-learning for fast adaptation of deep networks[J]. arXiv: 1703.03400, 2017.

Santoro A, Bartunov S, Botvinick M, et al. Meta-learning with memory-augmented neural networks[C]//Proceedings of the 33rd International Conference on Machine Learning. New York, USA: ICML, 2016: 1842-1850.

Cheng G, Zhou P C, Han J W. Duplex metric learning for image set classification[J]. IEEE Transactions on Image Processing, 2018, 27(1):281-292.[DOI:10.1109/TIP.2017.2760512]

Cheng G, Yang C Y, Yao X W, et al. When deep learning meets metric learning:remote sensing image scene classification via learning discriminative CNNs[J]. IEEE Transactions on Geoscience and Remote Sensing, 2018, 56(5):2811-2821.[DOI:10.1109/TGRS.2017.2783902]

Graves A, Wayne G, Danihelka I. Neural turing machines[EB/OL].[2018-10-28] . https://arxiv.org/pdf/1410.5401.pdf https://arxiv.org/pdf/1410.5401.pdf .

Vinyals O, Blundell C, Lillicrap T, et al. Matching networks for one shot learning[C]//Proceedings of Advances in Neural Information Processing Systems. Barcelona, Spain: NIPS, 2016: 3630-3638.

Snell J, Swersky K, Zemel R. Prototypical networks for few-shot learning[C]//Proceedings of Advances in Neural Information Processing Systems. California, USA: NIPS, 2017: 4077-4087.

Fort S. Gaussian prototypical networks for few-shot learning on Omniglot[EB/OL].[2018-10-28] . https://arxiv.org/pdf/1410.5401.pdf https://arxiv.org/pdf/1410.5401.pdf .

Mehrotra A, Dukkipati A. Generative adversarial residual pairwise networks for one shot learning[EB/OL].[2018-10-28] . https://arxiv.org/pdf/1410.5401.pdf https://arxiv.org/pdf/1410.5401.pdf .

Sung F, Yang Y X, Zhang L, et al. Learning to compare: relation network for few-shot learning[EB/OL].[2018-10-28] . https://arxiv.org/pdf/1410.5401.pdf https://arxiv.org/pdf/1410.5401.pdf .

Krizhevsky A, Nair V, Hinton G. The CIFAR-10 dataset (Canadian institute for advanced research)[EB/OL].[2018-10-28] http://www.cs.toronto.edu/kriz/cifar.html http://www.cs.toronto.edu/kriz/cifar.html .

Banerjee A, Merugu S, Dhillon I S, et al. Clustering with Bregman divergences[J]. The Journal of Machine Learning Research, 2005, 6:1705-1749.

Russakovsky O, Deng J, Su H, et al. Imagenet large scale visual recognition challenge[J]. International Journal of Computer Vision, 2015, 115(3):211-252.[DOI:10.1007/s11263-015-0816-y]

Kingma D P, Ba J. Adam: a method for stochastic optimization[EB/OL].[2018-10-28] https://arxiv.org/pdf/1412.6980.pdf https://arxiv.org/pdf/1412.6980.pdf .

文章被引用时，请邮件提醒。

提交