多源域混淆的双流深度迁移学习
Two-stream deep transfer learning with multi-source domain confusion
- 2019年24卷第12期 页码:2243-2254
收稿:2019-04-12,
修回:2019-7-4,
录用:2019-7-11,
纸质出版:2019-12-16
DOI: 10.11834/jig.190095
移动端阅览

浏览全部资源
扫码关注微信
收稿:2019-04-12,
修回:2019-7-4,
录用:2019-7-11,
纸质出版:2019-12-16
移动端阅览
目的
2
针对深度学习严重依赖大样本的问题,提出多源域混淆的双流深度迁移学习方法,提升了传统深度迁移学习中迁移特征的适用性。
方法
2
采用多源域的迁移策略,增大源域对目标域迁移特征的覆盖率。提出两阶段适配学习的方法,获得域不变的深层特征表示和域间分类器相似的识别结果,将自然光图像2维特征和深度图像3维特征进行融合,提高小样本数据特征维度的同时抑制了复杂背景对目标识别的干扰。此外,为改善小样本机器学习中分类器的识别性能,在传统的softmax损失中引入中心损失,增强分类损失函数的惩罚监督能力。
结果
2
在公开的少量手势样本数据集上进行对比实验,结果表明,相对于传统的识别模型和迁移模型,基于本文模型进行识别准确率更高,在以DenseNet-169为预训练网络的模型中,识别率达到了97.17%。
结论
2
利用多源域数据集、两阶段适配学习、双流卷积融合以及复合损失函数,构建了多源域混淆的双流深度迁移学习模型。所提模型可增大源域和目标域的数据分布匹配率、丰富目标样本特征维度、提升损失函数的监督性能,改进任意小样本场景迁移特征的适用性。
Objective
2
Feature extraction can be completed automatically by using a nonlinear network structure for deep learning.Thus
multi-dimensional features can be obtained through the distributed expression of features. Deep convolutional neural networks are supported by a large volume of valid data. However
obtaining a large volume of effective labeled data is often labor-intensive and time-consuming. Hence
achieving deep learning on a large volume of labeled datasets is still a challenge. Presently
deep convolutional neural networks on few-shot datasets have become a popular research topic in deep learning
and deep learning with transfer learning is the latest approach to solve the problem of data poverty. In this paper
two-stream deep transfer learning with multi-source domain confusion is proposed to address the limited adaptionissue of the source model's general features extracted on the target data.
Method
2
The proposed deep transfer learning network is based on the confusion domain deep transfer learning model. First
amulti-source domain transfer strategy is used to increase the coverage of target domain transfer features from the source domain. Second
a two-stage adaptive learning method is proposed to achieve domain-invariant deep feature representations and similar recognition results of the inter-domain classifier. Third
a data fusion strategy of natural light images with two-dimensional features and depth images with three-dimensional features is proposed to enrich the features dimension of few-shot datasets and suppress the influence of a complex background. Finally
the composite loss function is presented with the softmax and center loss functions to improve the recognition performance of the classifier in few-shot deep learning
and intra-and inter-class distances are shortened and expanded
respectively. The proposed method increases the recognition rate by improving the feature extraction and loss function of the deep convolutional neural network. Regarding feature extraction
the efficiency of feature transfer is enhanced
and the feature parameters of few-shot datasets are enriched by multi-source deep transfer features and feature fusion. The efficiency of multi-source domain feature transfer is improved with three kinds of loss functions. The inter-and intra-class feature distances are adjusted by introducing the center loss function. To extract the deep adaptation features
the difference loss of domain-invariant deep feature representation is calculated
and the inter-domain features are aligned with oneanother. In addition
the mutual adaptation of different domain classifiers is designed with the difference loss function. A two-stream deep transfer learning model with multi-source domain confusion is developed by combining the above methods. The model enhances the characterization of targets in complex contexts while improving the applicability of transfer features. Gesture recognition experiments are conducted on public datasets to verify the validity of the proposed model. Quantitative analysis of comparative experiments shows that the performance of the proposed model is superior to that of other classical gesture recognition models.
Result
2
The two-stream deep transfer learning model with multi-source domain confusiondemonstratesa more effective gesture recognition performance on few-shot datasets than previous models. In the model with the DenseNet-169 pre-training network
theproposed network achieves 97.17% accuracy. Compared with other classic gesture recognition and transfer learning models
the two-stream deep transfer learning model with multi-source domain confusion has 2.34% higher accuracy.The recognition performance of the proposed model in a small gesture sample dataset is evaluated through comparison as follows. First
compared with other transfer learning models
the proposed framework of the two-stream fusion model with multi-source domain confusion transfer learning can effectively complete the transfer of features. Second
the performance of the proposed fusion model is superior to that of the traditional two-stream information fusion model
which verifies that the proposed fusion model can improve recognition efficiency while effectively combining natural light and depth image features.
Conclusion
2
A deep transfer learning method with multi-source domain confusion is proposed. By studying the principle and mechanism of deep learning and transfer learning
a multi-source domain transfer method that covers the characteristics of the target domain is proposed. First
an adaptable featureis introduced to enhance the description capability of the transfer feature. Second
a two-stage adaptive learning method is proposed to represent the deep features of the invariant domain and reduce the prediction differences of inter-domain classifiers. Third
combined with the three-dimensional feature information of the depth image
a two-stream convolution fusion strategy that can realize the full use of scene information is proposed. Through the fusion of natural light imaging and depth information
the capability to segment the foreground and background in the image is improved
and the data fusion strategy realizes the recombination of the twotypes of modal information. Finally
the efficiency of multi-source domain feature transfer is improved by three kinds of loss functions. To improve the recognition performance of the classifier in few-shot datasets
the penalty performance of classifiers on inter-and intra-class features is adjusted by introducing center loss to softmax loss. The inter-domain features are adapted to oneanother by calculating the loss of the domain-invariant deep feature. The mutual adaptation of different domain classifiers is designed with the difference loss function of inter-domain classifiers. The two-stream deep transfer learning model with multi-source domain confusion is generated through two-stage adaptive learning
which can facilitate the feature transfer from the source domain to the target domain. The model structure of the two-stream deep transfer learning with multi-source domain confusion is designed by combining the proposed deep transfer learning method and data fusion strategy with multi-source domain confusion. On the public gesture dataset
the superior performance of the proposed model is verified through the contrast of multiple angles.Experimental results prove that the proposed method can increase the matching rate of the source and target domains
enrich the feature dimension
and enhance the penalty supervision capability of the loss function. The proposed method can improve the recognition accuracy of the deep transfer network on few-shot datasets.
Borgwardt K M, Gretton A, Rasch M J, Kriegel H P, Scholkopf B and Smola A J. 2006. Integrating structured biological data by Kernel Maximum Mean Discrepancy. Bioinformatics, 22(14):e49-e57[DOI:10.1093/bioinformatics/btl242]
Chen Z T, Fu Y W, Zhang Y D, Jiang Y G, Xue X and Sigal L. 2018. Semantic feature augmentation in few-shot learning[EB/OL].[2019-03-28] . https://arxiv.org/pdf/1804.05298.pdf https://arxiv.org/pdf/1804.05298.pdf
Deng J, Dong W, Socher R, Li L J, Li K and L F F. 2009. Imagenet: a large-scale hierarchical image database//Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami, FL, USA: IEEE, 248-255[ DOI: 10.1109/CVPR.2009.5206848 http://dx.doi.org/10.1109/CVPR.2009.5206848 ]
Estrela B, Cámara-Chávez G, Campos M F M, Schwartz W R and Nascimento E R. 2013. Sign language recognition using partial least squares and RGB-D information//Proceedings of 2013 Conference on Workshop de Visão Computacional. Minas Gerais, Brazil: IEEE, 672-678
Finn C, Abbeel P and Levine S. 2017. Model-agnostic meta-learning for fast adaptation of deep networks//Proceedings of 2017 IEEE Conference on Machine Learning. Sydney, Australia: IEEE, 1126-1135
Ganin Y and Lempitsky V. 2014. Unsupervised domain adaptation by backpropagation[EB/OL].[2019-03-28] . https://arxiv.org/pdf/1409.7495.pdf https://arxiv.org/pdf/1409.7495.pdf
Garcia V and Bruna J. 2017. Few-shot learning with graph neural networks[EB/OL].[2019-03-28] https://arxiv.org/pdf/1711.04043.pdf https://arxiv.org/pdf/1711.04043.pdf
Ghifary M, Kleijn W B and Zhang M J. 2014. Domain adaptive neural networks for object recognition//Proceedings of the 13th Pacific Rim International Conference on Artificial Intelligence. Gold Coast, QLD, Australia: Springer, 898-904[ DOI: 10.1007/978-3-319-13560-1_76 http://dx.doi.org/10.1007/978-3-319-13560-1_76 ]
He K M, Zhang X Y, Ren S Q and Sun J. 2016. Deep residual learning forimage recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 770-778[ DOI: 10.1109/CVPR.2016.90 http://dx.doi.org/10.1109/CVPR.2016.90 ]
Huang G, Liu Z, Van Der Maaten L and Weinberger K Q. 2017. Densely connected convolutional networks//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE, 2261-2269[ DOI: 10.1109/CVPR.2017.243 http://dx.doi.org/10.1109/CVPR.2017.243 ]
Huang S W, Lin C T, Chen S P, Wu Y Y, Hsu P H and Lai S H. 2018. AugGAN: cross domain adaptation with GAN-based data augmentation//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer, 718-731[ DOI: 10.1007/978-3-030-01240-3_44 http://dx.doi.org/10.1007/978-3-030-01240-3_44 ]
Koch G, Zemel R and Salakhutdinov R. 2015. Siamese neural networks for one-shot image recognition//Proceedings of the 32nd International Conference on Machine Learning. Lille, France: ACM, 212-217
Krizhevsky A, Sutskever I and Hinton G E. 2012. Imagenet classification with deep convolutional neural networks//Proceedings of the 25th International Conference on Neural Information Processing Systems. Lake Tahoe, Nevada, USA: ACM, 1097-1105
Liu J C, Osadchy M, Ashton L, Foster M, Solomon C J and Gibson S J. 2017. Deep convolutional neural networks for Raman spectrum recognition:a unified solution. Analyst, 142(21):4067-4074[DOI:10.1039/C7AN01371J]
Long M S, Cao Y, Wang J M and Jordan M I. 2015. Learning transferable features with deep adaptation networks//Proceedings of the 32nd International Conference on Machine Learning. Lille, France: ACM, 4891-4897
Oquab M, Bottou L, Laptev I and Sivic J. 2014. Learning and transferring mid-level image representations using convolutional neural networks//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, Ohio, USA: IEEE, 1717-1724[ DOI: 10.1109/CVPR.2014.222 http://dx.doi.org/10.1109/CVPR.2014.222 ]
Pansare J R, Gawande S H and Ingle M. 2012. Real-time static hand gesture recognition for American Sign Language (ASL) in complex background. Journal of Signal and Information Processing, 3(3):364-367[DOI:/10.4236/jsip.2012.33047]
Pugeault N and Bowden R. 2011. Spelling it out: real-time ASL fingerspelling recognition//Proceedings of 2011 International Conference on Computer Vision Workshops. Barcelona, Spain: IEEE, 1114-1119[ DOI: 10.1109/ICCVW.2011.6130290 http://dx.doi.org/10.1109/ICCVW.2011.6130290 ]
Ravi S and Larochelle H. 2017. Optimization as a model for few-shot learning//Proceedings of 2017 International Conference on Machine Learning. New York, USA: IEEE, 1317-1325
Santoro A, Bartunov S, Botvinick M, Wierstra D and Lillicrap T. 2016. Meta-learning with memory-augmented neural networks//Proceedings of the 33rd International Conference on Machine Learning. New York, USA: IEEE, 1842-1850
Simonyan K and Zisserman A. 2014. Very deep convolutional networks for large-scale image recognition[EB/OL].[2019-03-28] . https://arxiv.org/pdf/1409.1556.pdf https://arxiv.org/pdf/1409.1556.pdf
Snell J, Swersky K and Zemel R. 2017. Prototypical networks for few-shot learning//Proceedings of the 31st Conference on Neural Information Processing Systems. Long Beach, Philippines: ACM, 4077-4087
Sun B C and Saenko K. 2016. Deep coral: correlation alignment for deep domain adaptation//Proceedings of 2016 European Conference on Computer Vision. Amsterdam, Netherlands: Springer, 443-450[ DOI: 10.1007/978-3-319-49409-8_35 http://dx.doi.org/10.1007/978-3-319-49409-8_35 ]
Tzeng E, Hoffman J, Zhang N, Saenko K and Darrell T. 2014. Deep domain confusion: maximizing for domain invariance[EB/OL].[2019-03-28] . https://arxiv.org/pdf/1412.3474.pdf https://arxiv.org/pdf/1412.3474.pdf
Vinyals O, Blundell C, Lillicrap T and Wierstra D. 2016. Matching networks for oneshot learning//Proceedings of the 30th Conference on Neural Information Processing Systems. Barcelona, Spain: IEEE, 3630-3638
Xie S N and Tu Z W. 2015. Holistically-nested edge detection//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE, 1395-1403[ DOI: 10.1109/ICCV.2015.164 http://dx.doi.org/10.1109/ICCV.2015.164 ]
Xu L L, Zhang S M and Zhao J L. 2019. Expression recognition algorithm for parallel convolutional neural networks. Journal of Image and Graphics, 24(2):227-236
徐琳琳, 张树美, 赵俊莉. 2019.构建并行卷积神经网络的表情识别算法.中国图象图形学报, 24(2):227-236[DOI:10.11834/jig.180346]
Yu H P, Zhang P and Zhu J. 2017. Study on face recognition method based on deep transfer learning.Journal of Chengdu University:Natural Science, 36(2):151-156
余化鹏, 张朋, 朱进. 2017.基于深度迁移学习的人脸识别方法研究.成都大学学报:自然科学版, 36(2):151-156[DOI:10.3969/j.issn.1004-5422.2017.02.009]
Zhang Y A, Wang H Y and Xu F. 2017. Face recognition based on deep convolution neural network and center loss. Science Technology and Engineering, 17(35):92-97
张延安, 王宏玉, 徐方. 2017.基于深度卷积神经网络与中心损失的人脸识别.科学技术与工程, 17(35):92-97[DOI:10.3969/j.issn.1671-1815.2017.35.015]
Zheng Y, Chen Q Q and Zhang Y J. 2014. Deep learning and new progress in target and behavior recognition. Journal of Image and Graphics, 19(2):175-184
郑胤, 陈权崎, 章毓晋. 2014.深度学习及其在目标和行为识别中的新进展.中国图象图形学报, 19(2):175-184[DOI:10.11834/jig.20140202]
相关作者
相关机构
京公网安备11010802024621