|
发布时间: 2019-12-16 |
图像分析和识别 |
|
|
收稿日期: 2019-04-12; 修回日期: 2019-07-04; 预印本日期: 2019-07-11
基金项目: 国家自然科学基金项目(61472037,61433003)
第一作者简介:
闫美阳, 1994年生, 女, 硕士研究生, 主要研究方向为计算机视觉。E-mail:13752593085@163.com.
中图法分类号: TP301.6
文献标识码: A
文章编号: 1006-8961(2019)12-2243-12
|
摘要
目的 针对深度学习严重依赖大样本的问题,提出多源域混淆的双流深度迁移学习方法,提升了传统深度迁移学习中迁移特征的适用性。方法 采用多源域的迁移策略,增大源域对目标域迁移特征的覆盖率。提出两阶段适配学习的方法,获得域不变的深层特征表示和域间分类器相似的识别结果,将自然光图像2维特征和深度图像3维特征进行融合,提高小样本数据特征维度的同时抑制了复杂背景对目标识别的干扰。此外,为改善小样本机器学习中分类器的识别性能,在传统的softmax损失中引入中心损失,增强分类损失函数的惩罚监督能力。结果 在公开的少量手势样本数据集上进行对比实验,结果表明,相对于传统的识别模型和迁移模型,基于本文模型进行识别准确率更高,在以DenseNet-169为预训练网络的模型中,识别率达到了97.17%。结论 利用多源域数据集、两阶段适配学习、双流卷积融合以及复合损失函数,构建了多源域混淆的双流深度迁移学习模型。所提模型可增大源域和目标域的数据分布匹配率、丰富目标样本特征维度、提升损失函数的监督性能,改进任意小样本场景迁移特征的适用性。
关键词
小样本; 迁移学习; 多源域; 双流卷积融合; 域混淆
Abstract
Objective Feature extraction can be completed automatically by using a nonlinear network structure for deep learning.Thus, multi-dimensional features can be obtained through the distributed expression of features. Deep convolutional neural networks are supported by a large volume of valid data. However, obtaining a large volume of effective labeled data is often labor-intensive and time-consuming. Hence, achieving deep learning on a large volume of labeled datasets is still a challenge. Presently, deep convolutional neural networks on few-shot datasets have become a popular research topic in deep learning, and deep learning with transfer learning is the latest approach to solve the problem of data poverty. In this paper, two-stream deep transfer learning with multi-source domain confusion is proposed to address the limited adaptionissue of the source model's general features extracted on the target data. Method The proposed deep transfer learning network is based on the confusion domain deep transfer learning model. First, amulti-source domain transfer strategy is used to increase the coverage of target domain transfer features from the source domain. Second, a two-stage adaptive learning method is proposed to achieve domain-invariant deep feature representations and similar recognition results of the inter-domain classifier. Third, a data fusion strategy of natural light images with two-dimensional features and depth images with three-dimensional features is proposed to enrich the features dimension of few-shot datasets and suppress the influence of a complex background. Finally, the composite loss function is presented with the softmax and center loss functions to improve the recognition performance of the classifier in few-shot deep learning, and intra-and inter-class distances are shortened and expanded, respectively. The proposed method increases the recognition rate by improving the feature extraction and loss function of the deep convolutional neural network. Regarding feature extraction, the efficiency of feature transfer is enhanced, and the feature parameters of few-shot datasets are enriched by multi-source deep transfer features and feature fusion. The efficiency of multi-source domain feature transfer is improved with three kinds of loss functions. The inter-and intra-class feature distances are adjusted by introducing the center loss function. To extract the deep adaptation features, the difference loss of domain-invariant deep feature representation is calculated, and the inter-domain features are aligned with oneanother. In addition, the mutual adaptation of different domain classifiers is designed with the difference loss function. A two-stream deep transfer learning model with multi-source domain confusion is developed by combining the above methods. The model enhances the characterization of targets in complex contexts while improving the applicability of transfer features. Gesture recognition experiments are conducted on public datasets to verify the validity of the proposed model. Quantitative analysis of comparative experiments shows that the performance of the proposed model is superior to that of other classical gesture recognition models. Result The two-stream deep transfer learning model with multi-source domain confusiondemonstratesa more effective gesture recognition performance on few-shot datasets than previous models. In the model with the DenseNet-169 pre-training network, theproposed network achieves 97.17% accuracy. Compared with other classic gesture recognition and transfer learning models, the two-stream deep transfer learning model with multi-source domain confusion has 2.34% higher accuracy.The recognition performance of the proposed model in a small gesture sample dataset is evaluated through comparison as follows. First, compared with other transfer learning models, the proposed framework of the two-stream fusion model with multi-source domain confusion transfer learning can effectively complete the transfer of features. Second, the performance of the proposed fusion model is superior to that of the traditional two-stream information fusion model, which verifies that the proposed fusion model can improve recognition efficiency while effectively combining natural light and depth image features. Conclusion A deep transfer learning method with multi-source domain confusion is proposed. By studying the principle and mechanism of deep learning and transfer learning, a multi-source domain transfer method that covers the characteristics of the target domain is proposed. First, an adaptable featureis introduced to enhance the description capability of the transfer feature. Second, a two-stage adaptive learning method is proposed to represent the deep features of the invariant domain and reduce the prediction differences of inter-domain classifiers. Third, combined with the three-dimensional feature information of the depth image, a two-stream convolution fusion strategy that can realize the full use of scene information is proposed. Through the fusion of natural light imaging and depth information, the capability to segment the foreground and background in the image is improved, and the data fusion strategy realizes the recombination of the twotypes of modal information. Finally, the efficiency of multi-source domain feature transfer is improved by three kinds of loss functions. To improve the recognition performance of the classifier in few-shot datasets, the penalty performance of classifiers on inter-and intra-class features is adjusted by introducing center loss to softmax loss. The inter-domain features are adapted to oneanother by calculating the loss of the domain-invariant deep feature. The mutual adaptation of different domain classifiers is designed with the difference loss function of inter-domain classifiers. The two-stream deep transfer learning model with multi-source domain confusion is generated through two-stage adaptive learning, which can facilitate the feature transfer from the source domain to the target domain. The model structure of the two-stream deep transfer learning with multi-source domain confusion is designed by combining the proposed deep transfer learning method and data fusion strategy with multi-source domain confusion. On the public gesture dataset, the superior performance of the proposed model is verified through the contrast of multiple angles.Experimental results prove that the proposed method can increase the matching rate of the source and target domains, enrich the feature dimension, and enhance the penalty supervision capability of the loss function. The proposed method can improve the recognition accuracy of the deep transfer network on few-shot datasets.
Key words
few-shot datasets; transfer learning; multi-source domain; two-stream convolution fusion; domain confusion
0 引言
近年来,深度学习模型在图像处理领域的能力得到了指数级的提升,已成为人工智能领域最为活跃的研究热点之一。深度学习中的自主学习图像特征的方法,一般需要海量的标注数据做支撑,在小规模图像数据集上应用深度卷积神经网络,得到的效果与传统的人工提取特征法相比,并没有明显的提升,而获得大量的标注数据往往需要耗费极高的人力、物力以及时间成本。因此,如何有效地解决数据贫乏问题已经成为深度学习领域的热点研究问题。由于数据缺乏严重制约了深度学习的发展,学术界开始研究相关算法并引入到深度学习中,研究方向包括迁移学习和数据增强。
在迁移学习方面,具体有模型迁移学习、度量空间学习以及元数据学习3类形式。Oquab等人(2014)最早发表了模型迁移学习的研究成果,将ImageNet(Deng等,2009)大数据集上训练得到的AlexNet(Krizhevsky等,2012)模型应用到小样本数据集上,借助学习率和全连接层输出参数的调整,获得目标模型;Koch等人(2015)的孪生网络、Vinyals等人(2016)的匹配网络、Snell等人(2017)的原型网络以及Garcia等人(2017)提出的图神经网络,都是借助度量空间对样本特征间的距离分布进行建模,进而实现同类样本靠近、异类样本远离的目标;元学习即学会学习,通过学习大量的任务,获得内在的元知识,利用神经网络学会比较元知识与新知识的区别,快速处理同类的新任务。如Santoro等人(2016)和Ravi等人(2017)结合神经网络图灵机与长短期记忆网络形成的元分类器以及Finn等人(2017)通过梯度下降策略训练的元分类器实现了元知识的多任务适应。
在数据增强方面,具体有基于神经网络、特征映射以及图像处理3类形式。基于神经网络的数据增强,以生成对抗网络(GAN)为代表,Huang等人(2018)提出数据增强式GAN,利用生成和判别的联合损失函数,实现跨领域的数据增广;特征映射的数据增强方法,如Chen等人(2018)提出的语义增强手段,借助语义空间的丰富信息,通过编码器将视觉特征映射到语义空间,实现视觉特征在语义空间的特征增广,最后从语义空间映射回视觉空间获取增广后的图像样本。利用图像处理的数据增强方法,主要有颜色抖动、主成分抖动、随机剪切、尺度变换、水平或垂直翻转、旋转或仿射变换以及添加噪声等方法,如Liu等人(2017)借助样本平移、添加噪声以及线性组合等进行数据增广,解决了拉曼光谱数据集样本数量少和不均衡的问题,与传统的机器学习算法相比,识别准确度提高了20%~40%。
基于迁移学习和数据增强的算法在解决少量样本深度学习上都存在一些问题。其中,基于迁移学习算法的缺点主要有:1)由于提取特征的能力与模型的网络结构存在紧密联系,因此模型迁移方法的识别性能过分依赖于预训练模型;2)在度量空间学习中,源域与目标域的相似程度作为唯一的迁移学习手段,其度量距离标准的选择严重影响了最终的实验结果;3)元学习方法大幅度增加了迁移学习算法的实现难度,如长短期记忆网络(LSTM)是构建元关系的常用网络,该网络结构的运算存在一定的复杂度,其模型训练也不易于实现。另外,基于数据增强的算法也存在一些问题:1)数据增强即利用特定手段对数据进行扩增,将小样本数据变换成大数据集,其网络输入的原始数据实质仍是大样本数据,没有根本解决小样本深度学习问题;2)针对生成对抗网络的数据增广方法,因将原始数据与增广后数据的相似性作为训练标准,所以增广后的特征维度不会发生较大改变,因此其数据增强的效果并不明显;3)针对特征映射的图像增广方法,其映射规律尤为重要,否则会生成与原始数据集类别不一致的样本。
针对以上问题,本文提出了一种多源域混淆的双流深度迁移学习方法,借助多源域混淆的手段提高迁移特征对目标域特征的适应能力,通过双流卷积网络实现目标特征不同维度的融合,并利用复合损失函数增强识别模型的分类性能。该方法在未经数据增强的原始目标域数据集上进行验证,结果显示,相比于传统的迁移学习,所提方法在多个预训练模型上的准确率和训练收敛速率都有明显提高。
1 双流深度迁移学习模型构建
本文算法针对多源域混淆的双流深度迁移学习模型的构建阶段,主要由3个步骤组成:1)通过分析源域数量对迁移特征的影响,提出多源域与目标域的两阶段适配学习策略;2)通过结合多源特征迁移以及双流卷积特征融合的策略,设计目标域特征提取的操作流程;3)建立模型的复合损失函数。
1.1 多源域混淆的深度迁移学习策略
对于少量样本数据集,在源域和目标域数据样本相似度较低的情况下,通过深度神经网络和迁移学习的结合得到的识别效果并不理想。如图 1所示,相比于单源域,多源域的迁移学习大幅度提高了对目标域的覆盖率,增强了源域特征的迁移效果。
在多源域迁移学习中,当源域和目标域出现数据分布混淆时,将导致相应判别结构的错误对齐,如图 2所示,源域手势类别“w”错误地对齐了目标域手势类别“a”的特征分布。
本文采用深度域混淆的学习策略(DDC)(Tzeng等,2014),通过学习到深层域间共享特征,优化域间概率分布差异和分类误差,实现源域与目标域间的适配学习,提高模型对目标任务的适应度,其优化策略如图 3所示。
DDC主要借助核空间中源域与目标域间的概率分布均值期望
$ \boldsymbol{X}_{\mathrm{s}}=\left\{x_{1}^{\mathrm{s}}, x_{2}^{\mathrm{s}}, \cdots, x_{\left|x_{\mathrm{s}}\right|}^{\mathrm{s}}\right\}, \boldsymbol{X}_{\mathrm{t}}=\left\{x_{1}^{\mathrm{t}}, x_{2}^{\mathrm{t}}, \cdots, x_{\left|x_{\mathrm{t}}\right|}^{\mathrm{t}}\right\} $ | (1) |
MMD的数学表达式为
$ \begin{array}{l} \;\;\;\;\;\;\;\;\;{\mathop{\rm MMD}\nolimits} [F, p, q] = \\ su{p_{\left\| f \right\| \le 1}}\left({{E_{{x_{i - p}}}}\left[ {f\left({{x_i}} \right)} \right] - {E_{{x_{i - q}}}}\left[ {f\left({{x_j}} \right)} \right]} \right) \end{array} $ | (2) |
经验估计为
$ \operatorname{MMD}\left[F, \boldsymbol{X}_{\rm{s}}, \boldsymbol{X}_{\rm{t}}\right]=\left\|\sum\limits_{{x_i} \in {{\mathit{\pmb{X}}}_{\rm{s}}}}\frac{\phi\left(x_{i}\right)}{\left|\boldsymbol{X}_{\rm{s}}\right|}-\sum\limits_{x_{j} \in {\mathit{\pmb{X}}}_{t}} \frac{\phi\left(x_{i}\right)}{\left|\boldsymbol{X}_{\rm{t}}\right|}\right\|_{\mathrm{H}} $ | (3) |
$ L_{\mathrm{sm}}\left(\mathit{\pmb{X}}_{\mathrm{t}}, \mathit{\pmb{X}}_{\mathrm{s}}\right)=\operatorname{MMD}^{2}\left(\mathit{\pmb{X}}_{\mathrm{t}}, \mathit{\pmb{X}}_{\mathrm{s}}\right) $ | (4) |
非线性特征映射
$ L_{\mathrm{sm}}\left(\boldsymbol{X}_{\mathrm{t}}, \boldsymbol{X}_{\mathrm{s}}\right)=\left\|\sum\limits_{x_{i} \in {\mathit{\pmb{X}}}_{\mathrm{s}}} \frac{\phi\left(x_{i}\right)}{\left|\boldsymbol{X}_{\mathrm{s}}\right|}-\sum\limits_{x_{j}={\mathit{\pmb{X}}}_{t}} \frac{\boldsymbol{\phi}\left(x_{i}\right)}{\left|\boldsymbol{X}_{\mathrm{t}}\right|}\right\|_{\mathrm{H}}^{2} $ | (5) |
单源域
$ {L_{{\rm{sm}}}}\left({{{\mathit{\pmb{X}}}_{\rm{t}}}, {\mathit{\pmb{X}}_{\rm{s}}}} \right) = \left\| {\sum\limits_{{x_i} \in {\mathit{\pmb{X}}_{\rm{s}}}} {\frac{{H\left({{\mathit{\pmb{X}}_{\rm{s}}}} \right)}}{{\left| {{\mathit{\pmb{X}}_{\rm{s}}}} \right|}}} - \sum\limits_{{x_j} \in {\mathit{\pmb{X}}_{\rm{t}}}} {\frac{{H\left({{\mathit{\pmb{X}}_{\rm{t}}}} \right)}}{{\left| {{X_{\rm{t}}}} \right|}}} } \right\|_2^2 $ | (6) |
式中,
$ \left. {{{\mathit{\pmb{X}}}_{\rm{s}}} = \left\{ {\left({{\mathit{\pmb{X}}_{{\rm{s}}j}}, {{\mathit{\pmb{Y}}}_{{\rm{s}}j}}} \right)} \right\}_{j = 1}^N, {\mathit{\pmb{X}}_{\rm{t}}} = \left\{ {x_i^{\rm{t}}, y_i^{\rm{t}}} \right\}} \right\}_{i = 1}^{\left| {{\mathit{\pmb{X}}_t}} \right|} $ | (7) |
随后对深层特征进行MMD损失计算,具体为
$ {L_{{\rm{MMD}}}} = \frac{1}{N}\sum\limits_{j = 1}^N {{L_{{\rm{sm}}}}} \left({F\left({{\mathit{\pmb{X}}_{{\rm{s}}j}}} \right), F\left({{\mathit{\pmb{X}}_{\rm{t}}}} \right)} \right) $ | (8) |
另外,多源域迁移学习中
$ \begin{array}{*{20}{c}} {{L_{{\rm{disc}}}} = \frac{2}{{N \times (N - 1)}} \times }\\ {\sum\limits_{j = 1}^{N - 1} {\sum\limits_{i = j + 1}^N {{E_{x - {\mathit{\pmb{X}}_t}}}} } \left[ {\left| {{C_i}\left({{H_i}\left({F\left({{x_k}} \right)} \right)} \right) - {C_j}\left({{H_j}\left({F\left({{x_k}} \right)} \right)} \right)} \right|} \right]} \end{array} $ | (9) |
多源域混淆的深度迁移学习策略的实现流程如图 4所示。由图可知,该流程由一个浅层公共特征提取器、
1.2 多源域迁移及双流融合的特征提取
1.2.1 多源域迁移的特征提取
1.2.2 双流融合的特征提取
由于自然光图像仅包含目标的2维颜色纹理等特征,对于解决实际问题的目标识别问题具有一定的挑战性。本文采用特征级的数据融合策略,设计双流卷积网络(徐琳琳等,2019),将深度图像的空间3维信息融入到目标的特征信息表达中,通过2维和3维信息的相互补充和约束,提高现实生活中复杂背景下的目标识别效果。为有效融合两类模态信息并抑制网络复杂度的增加,采用卷积层特征级的数据融合方式,通过1×1的卷积核实现再卷积操作,融合前后特征图的宽度
已知自然光通道和深度通道对应的特征卷积融合函数
$ \boldsymbol{y}_{i, j, d-1}=\boldsymbol{x}_{i, j, d}^{a}, \boldsymbol{y}_{i, j, d-1}^{\prime}=\boldsymbol{x}_{i, j, d}^{b} $ | (10) |
$ \boldsymbol{y}_{i, j, d}^{\mathrm{cvo}}=f_{\mathrm{cvo}}\left(\boldsymbol{x}_{i, j, d}^{a}, \boldsymbol{x}_{i, j, d}^{b}\right) $ | (11) |
上述方法是在合并融合(concatenation fusion)的基础上提出的,其融合方式可以保证网络自动学习到两类模态对应的特征关系,融合后特征的高度、宽度以及特征通道数保持不变,即
$ {{\mathit{\pmb{y}}}_{{\rm{cvo}}}} = {\mathit{\pmb{y}}_{{\rm{cat}}}}*{\mathit{\pmb{w}}} + {\mathit{\pmb{b}}} $ | (12) |
自然光图像和深度图像分别经过数次卷积与池化后,将获取的特征进行合并,经卷积核自动提取特征后,即可完成两类模态特征的相互学习。表示该融合方法的神经元表达式为
$ \begin{aligned} \alpha_{j}^{l} &=f_{\mathrm{F}}\left\{\boldsymbol{W}^{l}\left[\sum\limits_{i \in \boldsymbol{M}_{\mathrm{Rj}}^{l}}\left(\alpha \cdot \boldsymbol{a}_{i}^{l-1} * \boldsymbol{k}_{i j}^{l}\right)+\right.\right.\\ &\left.\left.\sum\limits_{i \in {\mathit{\pmb{M}}}_{\mathrm{D} j}^{l}}\left(\beta, \boldsymbol{a}_{i}^{l-1} * \boldsymbol{k}_{i j}^{l}\right)+\boldsymbol{b}^{l}\right]\right\} \end{aligned} $ | (13) |
式中,自然光图像特征和深度图像特征的融合系数分别用
$ \frac{\alpha}{\beta}=\frac{R_{\mathrm{RGB}}}{R_{\mathrm{depth}}}, \alpha+\beta=1 $ | (14) |
1.2.3 多源域迁移及双流融合的特征提取
结合上述多源域迁移以及双流融合的特征提取方法,设计本文提取目标域特征的具体操作流程,如图 6所示。由图可知,首先,多源域混淆的深度迁移特征获得源域的通用性特征后,通过
1.3 复合分类损失函数
1.3.1 softmax损失函数
softmax损失函数(Xie等,2015)将上一层输出的特征参数映射到目标类别中,定义
$ {L_{\rm{s}}} = - \frac{1}{m}\left\lceil {\sum\limits_{i = 1}^m {\sum\limits_{j = 1}^k 1 } \left\{ {{{\mathit{\pmb{y}}}^{(i)}} = j} \right\}\lg \frac{{{{\rm{e}}^{{\rm{W}}_m^{\rm{T}}}}}}{{\sum\limits_{l = 1}^k {{{\rm{e}}^{{\rm{W}}_m^{\rm{T}}}}} }}} \right\rceil $ | (15) |
式中
1.3.2 中心损失函数
中心损失函数(center loss)(张延安等,2017)利用目标的特征中心,借助聚类思想搭建其损失函数的具体形式,实现流程为
$ {L_{\rm{c}}} = \frac{1}{{2m}}\sum\limits_{i = 1}^m {\left\| {{x_i} - {c_{yi}}} \right\|_2^2} $ | (16) |
$ \frac{{\partial {L_c}}}{{\partial {x_i}}} = {x_i} - {c_{yi}} $ | (17) |
$ \Delta {c_j} = \frac{{\sum\limits_{i = 1}^m \delta \left({{y_i} = j} \right) \cdot \left({{c_j} - {x_i}} \right)}}{{1 + \sum\limits_{i = 1}^m \delta \left({{y_i} = j} \right)}} $ | (18) |
同时,引入
$ c_j^{t + 1} = c_j^t + \Delta c_j^t $ | (19) |
使中心特征在小批次中完成更新。
1.3.3 构建复合分类损失函数
综合softmax和center loss的优点,构建复合分类损失函数。其中,softmax计算样本间的类间差异性,而center loss计算样本间的类内相似性。结合1.1节提到的
$ \begin{array}{l} {L_{{\rm{total }}}} = {L_{{\rm{cls}}}} + \gamma {L_{{\rm{MMD}}}} + \eta {L_{{\rm{disc}}}} = \\ \;\;{L_{\rm{s}}} + \lambda {L_{\rm{c}}} + \gamma {L_{{\rm{MMD}}}} + \eta {L_{{\rm{disc}}}} \end{array} $ | (20) |
式中,
2 实验结果与分析
为验证本文方法不依赖预训练模型的具体结构形式,实验选用AlexNet、VggNet-16(Simonyan等,2014)、ResNet-50(He等,2016)以及DenseNet-169(Huang等,2017)作为迁移学习中目标数据域的特征提取器。
2.1 实验平台和数据集
实验软件选用PyTorch深度学习框架作为实现深度神经网络的平台;实验硬件平台选用NVIDIA Quadro P5000的显卡进行加速计算。实验中的目标数据集是美国字母手势(ASL)(Pansare等,2012),共24类,每类手势包含自然光图像和对应的深度图像。实验中从ASL的每类手势图像中抽取自然光图像和深度图像各1 000幅作为实验数据,其中训练集600幅, 验证集和测试集各200幅。多源域数据由ImageNet数据集、自采集的简单背景和复杂背景手势彩色图像组成,对比实验中的单源域模型中的源域数据集是ImageNet数据集。自采集图像由10位参与者拍摄的1 920幅手势样本组成,每种背景下各960幅,经过数据增广后,每类背景下各包含4 800幅手势图像样本,如图 7和图 8所示。
2.2 实验超参数设置
实验的超参数设置如表 1所示。首先,为抑制单一样本的输入引起的梯度震荡并提高训练速率,将训练样本按照批次(batch)的方式输入到网络中,每次输入60个训练样本迭代24 000次。参数valid_iter和valid_interval表示模型在训练集上每训练迭代1 000次验证600次,test batch_size表示1次输入16幅样本进行批量测试。其次,网络采用随机梯度下降(SGD)(含有动量)的优化策略,每迭代1 000(步长)次学习率会从初始的0.01逐步减小到它的0.1倍。最后,为抑制模型的过拟合,添加权值衰减的正则项系数;同时为防止模型梯度爆炸,设置梯度阈值使权重更新设置在一个合理的范围内。
表 1
超参数设置
Table 1
Settings of hyperparameter
参数 | 值 |
train batch_size | 60 |
epoch (max_iter) | 100(24 000) |
valid_interval | 1 000 |
valid_iter | 600 |
test batch_size | 16 |
动量 | 0.9 |
步长 | 1 000 |
base_lr | 0.01 |
权重衰减 | 0.000 5 |
梯度阈值 | 40 |
2.3 模型识别流程
模型的前向学习包括多源域数据的特征迁移和双流目标域数据的特征融合形成的目标域特征,最终经1.3节设计的复合损失函数完成分类任务;而反向传播过程借助预训练模型在目标域上的适配学习,通过不断地迭代更新目标模型参数。具体的识别流程如图 9所示。
2.4 实验结果
2.4.1 验证复合损失函数的有效性
为寻找中心损失最佳的权重值,实验以权重因子为0.1间隔的变化经过100次epoch,得到4种网络效果最优的
表 2
中心损失的权重值
Table 2
Weights of center loss
模型 | 准确度/% | |
AlexNet | 0.7 | 88.36 |
VggNet-16 | 0.6 | 84.96 |
ResNet-50 | 0.5 | 56.62 |
DenseNet-169 | 0.5 | 64.93 |
单支损失函数模型的识别结果如图 10所示。在最佳的
2.4.2 验证多源域混淆迁移学习的有效性
2.4.3 多源域混淆的双流深度迁移学习模型性能
以DenseNet-169预训练模型为例,具体说明双流融合策略对模型识别性能的影响。由式(14)可知,融合信息中两类模态流的权重由单支模型的准确度决定,因此融合权重分别为
与图 12相比,可知:1)经双流特征融合后,模型的收敛速度得到明显上升;2)第10次epoch时,经双流特征融合后,识别率上升到97.17%,表明双流特征融合的网络结构可有效提升模型的收敛速度和预测准确度。为全面了解各个模型对每类手势的预测能力,本文通过精度和召回率两个指标展示识别结果,如图 14和图 15所示。
对比上述4类模型可以看出,首先,所提的多源域混淆的深度迁移学习模型的每类手势,在预测结果总数中,正确预测所占的比例平均为0.976,明显高于其他模型的预测精度;其次,所提模型正确预测为正(负)样本占实际正(负)样本的比例为达到0.968,优于其他模型,说明该模型具备较强的识别敏感度。
本文实验中的模型识别结果如表 3所示。由表可知,在不同的预训练模型下,利用多源域数据集、两阶段适配策略、双流融合方法以及复合损失函数,有效完成了深度神经网络在小样本数据集上的识别任务,且神经网络的特征描述能力和识别性能获得了明显提升。
表 3
少量样本在ASL数据集上的识别结果
Table 3
The recognition results on the few-shot of ASL
/% | |||||||||||||||||||||||||||||
模型 | 单支损失 | 复合损失 | |||||||||||||||||||||||||||
单源域 | 单源域 | 多源域 | 多源域双流 | ||||||||||||||||||||||||||
AlexNet | 84.41 | 88.36 | 89.69 | 92.87 | |||||||||||||||||||||||||
VggNet-16 | 82.19 | 84.96 | 90.18 | 94.13 | |||||||||||||||||||||||||
ResNet-50 | 49.84 | 56.62 | 82.16 | 85.79 | |||||||||||||||||||||||||
DenseNet-169 | 54.72 | 64.93 | 93.27 | 97.17 |
为比较本文所提模型的优越性,首先与未进行迁移学习的HSF+RDF(hue saturation value + random decision forest)模型(Pugeault等,2011)、SIFT+PLS(scale-invariant feature transform + partial least squares)模型(Estrela等,2013)和MPC(model predictive control)模型(Pansare等,2012)在ASL整体数据集上进行对比,然后与迁移学习中的DAN(deep alignment network)(Long等,2015)、D-CORAL(deep coral)(Sun等,2016)和RevGrad(Ganin等,2014) 3类经典模型在相同的目标数据域上进行对比,结果如表 4所示。由表可知,多源域混淆的DenseNet-169双流深度迁移学习模型的识别率为97.17%,高于其他模型的识别准确度,证明本文所提方法具有一定的性能优越性和研究价值。
表 4
不同模型的识别率比较
Table 4
Comparison of recognition rates of different models
模型 | 数据量 | 准确度/% |
HSF+RDF | 120 000 | 75.21 |
SIFT+PLS | 120 000 | 71.51 |
MPC | 120 000 | 90.19 |
DAN | 48 000 | 92.4 |
D-CORAL | 48 000 | 91.37 |
RevGrad | 48 000 | 94.83 |
DenseNet-169(本文) | 48 000 | 97.17 |
3 结论
针对深度迁移学习中的源模型在目标数据集上抽取的通用性特征缺乏适用性的问题,提出了一种多源域混淆的双流深度迁移学习,在模型的特征提取和损失函数部分进行了改进。为增强源域特征迁移的高效性,提升目标域的特征参数,抑制小样本深度学习中严重的过拟合问题,首先引入了多源深层特征迁移方法增强目标特征的表征能力;其次针对如何对齐源域与目标域表示特征的问题,提出了多源域混淆的迁移学习策略;最后结合深度图像的3维特征信息,提出双流卷积的特征融合策略,实现了目标的2维和3维模态信息的相互补充和约束。在域内分类损失部分,通过引入中心损失函数提高对类间及类内特征的惩罚监督性能;在深层特征适配损失部分,引入域不变深层特征表示的计算,进行域间特征的相互对齐;在域间分类器损失部分,引入差异损失函数,实现不同域分类器的相互适配。最后在ASL上抽取的少量样本数据集上,选用单支损失函数模型、复合损失函数模型、多源域混淆的迁移学习模型以及多源域混淆的双流迁移学习模型进行对比实验,证明了所提模型的优越性。
下一步的工作包括特征融合和模型优化两个方面。在特征融合方面,由于深度迁移学习中不同网络结构的源模型可以学习到不同的表示特征,如何充分融合这些有效的特征需要进行更加深入的探索;在模型优化方面,本文所提模型对少量样本深度学习具有一定的研究价值,但是模型的训练过程会出现一定程度的过拟合问题,如何对源模型进行有效的网络适配训练,提升模型的泛化能力,是一个很有意义的研究方向。
参考文献
-
Borgwardt K M, Gretton A, Rasch M J, Kriegel H P, Scholkopf B, Smola A J. 2006. Integrating structured biological data by Kernel Maximum Mean Discrepancy. Bioinformatics, 22(14): e49-e57 [DOI:10.1093/bioinformatics/btl242]
-
Chen Z T, Fu Y W, Zhang Y D, Jiang Y G, Xue X and Sigal L. 2018. Semantic feature augmentation in few-shot learning[EB/OL].[2019-03-28]. https://arxiv.org/pdf/1804.05298.pdf
-
Deng J, Dong W, Socher R, Li L J, Li K and L F F. 2009. Imagenet: a large-scale hierarchical image database//Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami, FL, USA: IEEE, 248-255[DOI: 10.1109/CVPR.2009.5206848]
-
Estrela B, Cámara-Chávez G, Campos M F M, Schwartz W R and Nascimento E R. 2013. Sign language recognition using partial least squares and RGB-D information//Proceedings of 2013 Conference on Workshop de Visão Computacional. Minas Gerais, Brazil: IEEE, 672-678
-
Finn C, Abbeel P and Levine S. 2017. Model-agnostic meta-learning for fast adaptation of deep networks//Proceedings of 2017 IEEE Conference on Machine Learning. Sydney, Australia: IEEE, 1126-1135
-
Ganin Y and Lempitsky V. 2014. Unsupervised domain adaptation by backpropagation[EB/OL].[2019-03-28].https://arxiv.org/pdf/1409.7495.pdf
-
Garcia V and Bruna J. 2017. Few-shot learning with graph neural networks[EB/OL].[2019-03-28]https://arxiv.org/pdf/1711.04043.pdf
-
Ghifary M, Kleijn W B and Zhang M J. 2014. Domain adaptive neural networks for object recognition//Proceedings of the 13th Pacific Rim International Conference on Artificial Intelligence. Gold Coast, QLD, Australia: Springer, 898-904[DOI: 10.1007/978-3-319-13560-1_76]
-
He K M, Zhang X Y, Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 770-778[DOI: 10.1109/CVPR.2016.90]
-
Huang G, Liu Z, Van Der Maaten L and Weinberger K Q. 2017. Densely connected convolutional networks//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE, 2261-2269[DOI: 10.1109/CVPR.2017.243]
-
Huang S W, Lin C T, Chen S P, Wu Y Y, Hsu P H and Lai S H. 2018. AugGAN: cross domain adaptation with GAN-based data augmentation//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer, 718-731[DOI: 10.1007/978-3-030-01240-3_44]
-
Koch G, Zemel R and Salakhutdinov R. 2015. Siamese neural networks for one-shot image recognition//Proceedings of the 32nd International Conference on Machine Learning. Lille, France: ACM, 212-217
-
Krizhevsky A, Sutskever I and Hinton G E. 2012. Imagenet classification with deep convolutional neural networks//Proceedings of the 25th International Conference on Neural Information Processing Systems. Lake Tahoe, Nevada, USA: ACM, 1097-1105
-
Liu J C, Osadchy M, Ashton L, Foster M, Solomon C J, Gibson S J. 2017. Deep convolutional neural networks for Raman spectrum recognition:a unified solution. Analyst, 142(21): 4067-4074 [DOI:10.1039/C7AN01371J]
-
Long M S, Cao Y, Wang J M and Jordan M I. 2015. Learning transferable features with deep adaptation networks//Proceedings of the 32nd International Conference on Machine Learning. Lille, France: ACM, 4891-4897
-
Oquab M, Bottou L, Laptev I and Sivic J. 2014. Learning and transferring mid-level image representations using convolutional neural networks//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, Ohio, USA: IEEE, 1717-1724[DOI: 10.1109/CVPR.2014.222]
-
Pansare J R, Gawande S H, Ingle M. 2012. Real-time static hand gesture recognition for American Sign Language (ASL) in complex background. Journal of Signal and Information Processing, 3(3): 364-367 [DOI:10.4236/jsip.2012.33047]
-
Pugeault N and Bowden R. 2011. Spelling it out: real-time ASL fingerspelling recognition//Proceedings of 2011 International Conference on Computer Vision Workshops. Barcelona, Spain: IEEE, 1114-1119[DOI: 10.1109/ICCVW.2011.6130290]
-
Ravi S and Larochelle H. 2017. Optimization as a model for few-shot learning//Proceedings of 2017 International Conference on Machine Learning. New York, USA: IEEE, 1317-1325
-
Santoro A, Bartunov S, Botvinick M, Wierstra D and Lillicrap T. 2016. Meta-learning with memory-augmented neural networks//Proceedings of the 33rd International Conference on Machine Learning. New York, USA: IEEE, 1842-1850
-
Simonyan K and Zisserman A. 2014. Very deep convolutional networks for large-scale image recognition[EB/OL].[2019-03-28].https://arxiv.org/pdf/1409.1556.pdf
-
Snell J, Swersky K and Zemel R. 2017. Prototypical networks for few-shot learning//Proceedings of the 31st Conference on Neural Information Processing Systems. Long Beach, Philippines: ACM, 4077-4087
-
Sun B C and Saenko K. 2016. Deep coral: correlation alignment for deep domain adaptation//Proceedings of 2016 European Conference on Computer Vision. Amsterdam, Netherlands: Springer, 443-450[DOI: 10.1007/978-3-319-49409-8_35]
-
Tzeng E, Hoffman J, Zhang N, Saenko K and Darrell T. 2014. Deep domain confusion: maximizing for domain invariance[EB/OL].[2019-03-28]. https://arxiv.org/pdf/1412.3474.pdf
-
Vinyals O, Blundell C, Lillicrap T and Wierstra D. 2016. Matching networks for oneshot learning//Proceedings of the 30th Conference on Neural Information Processing Systems. Barcelona, Spain: IEEE, 3630-3638
-
Xie S N and Tu Z W. 2015. Holistically-nested edge detection//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE, 1395-1403[DOI: 10.1109/ICCV.2015.164]
-
Xu L L, Zhang S M, Zhao J L. 2019. Expression recognition algorithm for parallel convolutional neural networks. Journal of Image and Graphics, 24(2): 227-236 (徐琳琳, 张树美, 赵俊莉. 2019. 构建并行卷积神经网络的表情识别算法. 中国图象图形学报, 24(2): 227-236) [DOI:10.11834/jig.180346]
-
Yu H P, Zhang P, Zhu J. 2017. Study on face recognition method based on deep transfer learning. Journal of Chengdu University:Natural Science, 36(2): 151-156 (余化鹏, 张朋, 朱进. 2017. 基于深度迁移学习的人脸识别方法研究. 成都大学学报:自然科学版, 36(2): 151-156) [DOI:10.3969/j.issn.1004-5422.2017.02.009]
-
Zhang Y A, Wang H Y, Xu F. 2017. Face recognition based on deep convolution neural network and center loss. Science Technology and Engineering, 17(35): 92-97 (张延安, 王宏玉, 徐方. 2017. 基于深度卷积神经网络与中心损失的人脸识别. 科学技术与工程, 17(35): 92-97) [DOI:10.3969/j.issn.1671-1815.2017.35.015]
-
Zheng Y, Chen Q Q, Zhang Y J. 2014. Deep learning and new progress in target and behavior recognition. Journal of Image and Graphics, 19(2): 175-184 (郑胤, 陈权崎, 章毓晋. 2014. 深度学习及其在目标和行为识别中的新进展. 中国图象图形学报, 19(2): 175-184) [DOI:10.11834/jig.20140202]