发布时间: 2019-09-16
摘要点击次数:
全文下载次数:
DOI: 10.11834/jig.180514
2019 | Volume 24 | Number 9

图像分析和识别

多层校正的无监督领域自适应方法

姚明海, 方存亮

浙江工业大学信息工程学院, 杭州 310023

收稿日期: 2018-09-07; 修回日期: 2019-03-28

基金项目: 国家自然科学基金项目（61871350）

第一作者简介: 姚明海, 1963年生, 男, 教授, 博士生导师, 主要研究方向为模式识别和图像识别。E-mail:ymh@zjut.edu.cn;
方存亮, 男, 硕士研究生, 主要研究方向为模式识别和图像识别。E-mail:514616351@qq.com.

中图法分类号: TP391.4

文献标识码: A

文章编号: 1006-8961(2019)09-1528-09

摘要

目的目前深度神经网络已成功应用于众多机器学习任务，并展现出惊人的性能提升效果。然而传统的深度网络和机器学习算法都假定训练数据和测试数据服从的是同一分布，而这种假设在实际应用中往往是不成立的。如果训练数据和测试数据的分布差异很大，那么由传统机器学习算法训练出来的分类器的性能将会大大降低。为了解决此类问题，提出了一种基于多层校正的无监督领域自适应方法。方法首先利用多层校正来调整现有的深度网络，利用加法叠加来完美对齐源域和目标域的数据表示；然后采用多层权值最大均值差异来适应目标域，增加网络的表示能力；最后提取学习获得的域不变特征来进行分类，得到目标图像的识别效果。结果本文算法在Office-31图像数据集等4个数字数据集上分别进行了测试实验，以对比不同算法在图像识别和分类方面的性能差异，并进行准确度测量。测试结果显示，与同领域算法相比，本文算法在准确率上至少提高了5%，在应对照明变化、复杂背景和图像质量不佳等干扰情况时，亦能获得较好的分类效果，体现出更强的鲁棒性。结论在领域自适应相关数据集上的实验结果表明，本文方法具备一定的泛化能力，可以实现较高的分类性能，并且优于其他现有的无监督领域自适应方法。

关键词

领域自适应; 域不变特征; 多层校正; 图像识别; 迁移学习

Unsupervised domain adaptive method based on multi-layer correction

Yao Minghai, Fang Cunliang

College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China

Supported by: National Natural Science Foundation of China (61871350)

Abstract

Objective With the continuous development of computer technology in recent years, image recognition technology has become one of the most active research topics in the field of computer vision. The image recognition method generates a robust classification model by using existing training samples. Compared with traditional methods, deep learning algorithms have shown superior learning and classification performance by forming abstract and high-level representation through the combination of low-level features. Traditional deep networks assume that training data and test data follow the same distribution, but the same is not true in real-world applications. At the same time, collecting and annotating data sets for each new task or new domain is an expensive and time-consuming process, and the data available for training are limited. If the distributions of training data and test data greatly differ, then the performance of the classifiers trained by traditional machine learning algorithms diminishes considerably, and the advantages of deep network architectures relying on large tag data sets could no longer be maximized. Solving these problems requires cross-domain learning. Cross-domain learning refers to the transfer and learning of knowledge under different data distribution conditions by using the link between existing knowledge and experience to promote the learning of new tasks and ultimately reduce the impact of differences in the sample distributions between domains. The domain adaptive algorithm is used to solve the problem in which the performance of a classifier is degraded due to differences in the distributions of training samples and test samples. Such algorithm has become an effective approach to deal with domain migration problems in image recognition due to the relaxation of the same distribution requirements between fields in traditional machine learning. Its goal is to overcome the differences between training sample and test sample distributions, improve the performance of the training model by using the commonality between fields, achieve classifier migration, and ultimately improve classification accuracy. Method An unsupervised domain adaptive method based on multi-layer correction is proposed. A five-layer neural network structure is established by modifying and optimizing the ResNet-18 network. The correction is used to amend the internal representation of the target data to simulate the source data. For the residual layer of the target data, the data classifier of the source domain must adapt to the target domain by additive correction, and additive superposition is employed to perfectly align the data representations of the source and target domains. If the prior distribution of the class is not considered, then the weight deviation of the class can be easily ignored, thereby decreasing domain adaptation performance. Hence, we introduce the auxiliary weight of a specific class to override the set source sample. In this way, the re-weighted source data share the same category weight as the target data. On the basis of the results, we use multiple weight MMD (maximum mean discrepancy) to modify the fully connected layer and increase the representation capability of the network. Finally, the domain invariant features obtained by the learning process are extracted and classified to obtain the final recognition effect of the target image. Result Test experiments on digital datasets, such as Office-31 image dataset and MNIST, are carried out to confirm the validity of the proposed method. We compare the performances of different algorithms by measuring their classification accuracies in image and digital recognition. For the image dataset, we test two traditional transfer learning methods, three depth domain adaptive methods, and two mainstream deep neural network models. Test results show that our method has higher recognition ability than existing methods and is superior to other methods by 9.5%, on the average, in terms of classification accuracy. For the digital dataset, we test the convolutional neural network classic model LeNet, the subspace alignment method for domain adaptation (SA), and the deep adaptive network model (DAN). Test results show that the proposed method has an average increase of 9.5% in classification accuracy and the best classification performance. In response to disturbances, such as illumination changes, complex backgrounds, and poor image quality, the proposed method can obtain better classification results and show stronger robustness than other methods. Conclusion Deep neural networks have good application prospects in image recognition. The domain adaptive method is also an effective method to deal with domain migration problems in image recognition. With the continuous deepening of network structure and expansion of data samples, the cost and time loss of daily training are also increasing. The unsupervised domain adaptive method has become an important processing method in image recognition because it does not require target sample tags. The application of the domain adaptive method to image recognition and other tasks carries theoretical significance. A multi-layer corrected network structure, which is a new method for unsupervised domain adaptation, is proposed. The additional layer increases the capacity of the neural network because of its excellent generalization performance. The experimental results on the domain adaptive correlation datasets show that the proposed method can learn the complete domain invariant representation for domain adaptive problems, achieves high classification performance, and is superior to other existing unsupervised domain adaptive methods.

Key words

domain adaptation; domain invariant feature; multi-layer correction; image recognition; transfer learning

0 引言

近年来，机器学习取得了巨大成功，深层神经网络使得许多机器学习算法有了显著改进，并在实际应用受益匪浅。然而为每个新任务或新领域收集和注释数据集是一个非常昂贵且耗时的过程，而且不一定存在足够多的数据能进行训练。在没有足够标记数据的情况下，不仅训练出适合样本数据的分类器成为一个难题，而且依赖于大型标记数据集的深度网络体系结构的优势也会消失。而领域自适应学习解决的就是将分类器从源域(训练场景)适配到目标域(测试场景)的问题^[1]，因此可以为有标签数据短缺的实验提供高效可行的解决方案。

实际中由于照明、姿态和图像质量等多重因素的影响，训练良好的源模型可能会严重降低测试图像的分类性能^[2]。源域和目标域的样本分布虽然不同，但二者是具有相关性的^[3]。领域自适应方法从源域的标记数据和目标域的未标记数据中提取域不变特征，建立更优秀的学习模型以提高最终的分类器性能^[4]。

目前，领域自适应已成功应用于图像分类^[5]、人脸识别^[6]和对象检测^[7]等多种现实世界的机器学习任务当中。Tzeng等人^[8]引入深度域混淆(DDC)架构来学习最后隐藏层中源域和目标域的域不变特征。Long等人^[9-10]提出一种深度自适应网络(DAN)进行多层适配来学习两个领域间的可迁移特征，随后在DAN的基础上，提出了联合自适应网络(JAN)，基于联合最大均值差异(JMMD)进行多个领域特定层的联合分布，使得源域和目标域的分布更加可区分。

虽然这些方法都可以取得有效的结果，但不同程度地受到了一定限制。1)面对新的目标领域时，必须重新开始训练深度网络，不能直接使用已训练好的网络模型，导致时间和资源浪费。2)分类能力受所选基础网络限制，适应过程不能提高基础设施网络的泛化能力，限制了实际性能。

针对上述问题，本文提出了一种基于多层校正的领域自适应方法。该方法适用于无监督的深度网络，不需要目标提供标签，大大减少了标记成本。添加的多层校正能调整现有的深度网络，使得源域和目标域的数据表示可以完美对齐。利用多层权值最大均值差异来适应目标域，使得源域和目标域的数据分布能更好地进行适配，这样可以有效地减少实质性的域差异，同时叠加的附加层还可以加深基础架构，提高网络的表示能力。

在4个常用的大型公开数据库上做了较深入的实验，结果表明本文方法所提特征具有更强的鲁棒性，分类准确度更高，领域自适应能力优于目前具有竞争力的其他无监督领域自适应方法。

1 相关工作

1.1 领域自适应

领域自适应算法主要是利用从不同分布中采样的大量有标签源域样本和无标签目标域样本，学习一个目标域分类器$f_{\rm T}$或预测模型来准确分类目标域样本。本文定义源域样本为${\mathit{\boldsymbol{D}}_{\rm{S}}} = \left\{ {\left({{\mathit{\boldsymbol{X}}_{{{\rm{S}}_i}}}, {\mathit{\boldsymbol{Y}}_{{{\rm{S}}_i}}}} \right)} \right\}_{i = 1}^{{n_{\rm{s}}}}$，目标域样本为${{\bf{D}}_{\rm{T}}} = \left\{ {\left({{{\bf{X}}_{{{\rm{T}}_j}}}} \right)} \right\}_{j = 1}^{{n_{\rm{T}}}}$，其中，$\mathit{\boldsymbol{X}}$是目标样本，$\mathit{\boldsymbol{Y}}$是对应的目标类别标签，$n_{\rm{s}}$和$n_{\rm{t}}$分别为源域和目标域的样本个数。定义$P_{\rm{S}}(\mathit{\boldsymbol{X}}_{\rm{S}})$和$P_{\rm{T}}(\mathit{\boldsymbol{X}}_{\rm{T}})$分别为源域和目标域的边缘概率分布，由于源域和目标域样本分布是不同的，故$P_{\rm{S}}(\mathit{\boldsymbol{X}}_{\rm{S}})≠P_{\rm{T}}(\mathit{\boldsymbol{X}}_{\rm{T}})$。

为了将知识从源域转移到目标域，学习域不变表示是至关重要的。深度网络领域适应的主要挑战是任务不局限于将分类器适配到新的数据领域，而主要是数据表示^[11]。当来自目标域的标记数据存在时，对预训练的神经网络进行微调是利用源域学习特征简单有效的方法^[12]。Yosinski等人^[12]证明深度网络学习的可迁移特性由于弱协同适应和表示特异性而具有局限性，而微调可以增强泛化性能。本文研究的是无监督领域，不需要目标标签，因此更适用。

1.2 深度残差网络

众所周知，网络的表达能力随网络深度的增加而增强。He等人^[13]的实验证明，时间复杂度相同的两种网络结构，深度较深的网络性能会有相对提升，然而网络并非越深越好，在相同复杂程度的网络中，随着网络层数的增加，性能不但没有提升，反而出现了显著退化。于是He等人提出了深度残差网络^[14]，通过捷径连接(shortcut connections)的方式将输入直接传到输出作为初始结果进行训练来解决退化问题。Long等人^[15]研究了领域自适应的残差函数，设计了一个端到端的深度卷积神经网络，能够将源任务类别估计迁移到目标任务类别估计，实现特征和类别的自适应。

1.3 最大均值差异

最大均值差异(MMD)是基于两组数据比较分布的有效非参数度量^[16]。给定两个分布$s$和$t$，通过函数$ϕ(·)$将数据映射到再生核Hilbert空间(RKHS)，将$s$和$t$之间的MMD定义为

$ \begin{array}{*{20}{c}} {{f_{{\rm{MM}}{{\rm{D}}^2}}}(s,t) = }\\ {\mathop {\sup }\limits_{||\phi |{|_{\rm{H}}} \le 1} ||{E_{{X^{\rm{S}}} \sim s}}\left[ {\phi \left( {{X^s}} \right)} \right] - {E_{{X^T} \sim t}}\left[ {\left( {\phi \left( {{X^t}} \right)} \right]} \right]||_H^2} \end{array} $

(1)

式中，${{\rm{E}}_{{X^{{\rm{S}}\sim s}}, }}[\phi (\cdot)]$表示关于分布$s$的期望值，$||\phi |{|_{\rm{H}}} \le 1$定义了RKHS单位球中的一组函数。基于MMD定义的统计测试，当$s=t$时，$f^{2}_{\rm MMD}(s, t)=0$，有两个样本分布$\mathit{\boldsymbol{D}}_{s}$和$\mathit{\boldsymbol{D}}_{t}$，MMD的经验估计由文献[17]给出，具体为

$ f_{{\rm{MMD}}}^2\left( {{\mathit{\boldsymbol{D}}_s},{\mathit{\boldsymbol{D}}_t}} \right) = {\rm{||}}\frac{1}{M}\sum\limits_{i = 1}^M \phi \left( {\mathit{\boldsymbol{X}}_i^s} \right) - \frac{1}{N}\sum\limits_{j = 1}^N \phi \left( {\mathit{\boldsymbol{X}}_j^t} \right){\rm{||}}_{\rm{H}}^2 $

(2)

式中，$M$、$N$分别为源域和目标域的样本总数，$\phi (\cdot)$表示与核映射$k\left({{\mathit{\boldsymbol{X}}^s}, {\mathit{\boldsymbol{X}}^t}} \right) = \left\langle {\phi \left({{\mathit{\boldsymbol{X}}^s}} \right), \phi \left({{\mathit{\boldsymbol{X}}^t}} \right)} \right\rangle $相关的特征映射，$k(\mathit{\boldsymbol{X}}^{s}, \mathit{\boldsymbol{X}}^{t})$通常被定义为$l$个基核$k_{l}(\mathit{\boldsymbol{X}}^{s}, \mathit{\boldsymbol{X}}^{t})$的凸组合^[9]。

目前，MMD已经用于许多深度学习应用当中，从生成模型^[18]到图像转换^[19]再到领域自适应^{[8-9, 15]}，MMD都发挥了其优秀性能。

2 网络架构

2.1 多层校正

如图 1所示，在残差网络中，整个残差模块除了正常的卷积层输出外，还有一个分支将输入直接连到输出上，将该输出和卷积的输出进行算术相加得到最终的输出，以实现网络输出的恒等映射^[20]。残差结构人为制造了恒等映射，就能让整个结构朝着恒等映射的方向去收敛，确保最终的错误率不会因为网络深度的加深而越来越差。

图 1 残差结构

Fig. 1 Residual structure

结合残差网络中残差连接的思想，受He等人^[14]的启发，本文建立了一个小的多层神经网络结构，称为校正结构，如图 2所示。由于目标数据缺乏标记，无法直接训练目标域的数据分类器。于是针对目标数据的残差层进行加法校正来使源域的数据分类器适应目标域。深层神经网络的领域自适应方法不仅学习如何优化分类器，而且学习其内部的数据表示，本文通过添加小的修正项来使其模拟源数据，从而校正目标数据的内部表示。

图 2 校正结构

Fig. 2 Correction structure

定义源域和目标域的数据分类器分别为$\mathit{\boldsymbol{F}}_{\rm{S}}$和$\mathit{\boldsymbol{F}}_{\rm{T}}$，$H_{\rm{S}}(x)$为$\mathit{\boldsymbol{F}}_{\rm{S}}$中在给定图层处输入$x$的表示。为了使$\mathit{\boldsymbol{F}}_{\rm{T}}$适配$\mathit{\boldsymbol{F}}_{\rm{S}}$，为目标数据学习其隐藏表示$H_{\rm{T}}(x)$，从而使得源域与目标域的数据分布尽可能相同，即$P_{\rm{S}}(H_{\rm{S}}(x), y)≈P_{\rm{T}}(H_{\rm{T}}(x), y)$，随后借助加法修正项$Δh(x)$，调整目标域的数据表示，使得$H_{\rm{T}}(x)=H_{\rm{S}}(x)+Δh(x)$。

从图 2可以看出，残差函数$Δh(x)$通过两个权重层学习。由于低层特征图与高层特征图之间的特征分布差异较大，使用残差结构直接将低层特征与高层特征融合会造成网络难以学习，因此在校正结构中加入正则化层对数据进行归一化处理。在正则化层后，使用具有单侧抑制且稀疏激活的ReLU激活函数来提高网络的非线性建模能力。由于源数据提供了所有源样本的标签，并且源样本不需要特征校正，所以训练目标分类器$\mathit{\boldsymbol{F}}_{\rm{T}}$作为正则化的一种形式对除了目标数据外的源数据进行正确地分类，正则化方法$R_{\rm{S}}(x)$为

$ {R_{\rm{s}}}(x) = \left\{ {\begin{array}{*{20}{l}} {{l_{\rm{s}}}(x) + \frac{1}{n}{\rm{||}}\Delta h(x){\rm{|}}{{\rm{|}}^2}}&{x \in {\mathit{\boldsymbol{D}}_{\rm{s}}}}\\ 0&{x \in {\mathit{\boldsymbol{D}}_{\rm{T}}}} \end{array}} \right. $

(3)

式中，$l_{\rm{S}}(x)$是源数据的softmax损失函数，$n$是$H_{\rm{S}}(x)$中参数的数目，本文限定$Δh(x)$相对于源数据近似为零。根据输入，权重层可以是3×3的卷积层或全连接层。最终得到的完整变换$H_{\rm{T}}(x)$通过$H_{\rm{S}}(x)$的参数相加和最终残差层输出来计算。

2.2 权值MMD

目前大多数现有的域适应方法都是基于式(2)定义的MMD进行计算，并且为简单起见仅采用线性内核。式(2)中定义的MMD具有成对相似性，并且以二次时间复杂度计算，因此若在基于卷积神经网络(CNN)的领域自适应方法中使用小批量随机梯度下降(SGD)将会非常耗时且效果不佳，同时对于基于MMD的方法，特别是关于分类问题时，往往因为没有考虑类的先验分布而容易忽略类别权重偏差导致域适应性能下降。MMD的缺陷是源域与目标域的类别权重不同或目标域严重缺少源域中的类时，使用MMD会强行使源域与目标域的类别权重保持一致，导致分类错误。

为了解决这个问题，引入特定类的辅助权重来重新设定源样本。以这种方式，重新加权的源数据将与目标数据共享相同的类别权重。通常，类条件分布$p_{s}(\mathit{\boldsymbol{X}}^{s}|y^{s}=c)$与$p_{t}(\mathit{\boldsymbol{X}}^{t}|y^{t}=c)$ (其中，$c$为类别数目)之间的差异可以用来度量源域和目标域的差异，但是由于无监督域自适应中目标数据的类标签不可用，因此将源域样本的概率密度$p_{s}(\mathit{\boldsymbol{X}}^{s})$与目标域样本的概率密度$p_{t}(\mathit{\boldsymbol{X}}^{t})$之间的MMD作为域差异度量。然而在实际应用中，源域与目标域的类别权重不同，MMD无法正常应对跨域的类别权重偏差，因此本文构建了一个参考源域分布$p_{s, α}(\mathit{\boldsymbol{X}}^{s})$，使其与目标域具有相同的类权重，并具有源域的类条件分布。采用参考源域分布来计算源域和目标域的差异，已知源域和目标域样本的类先验概率分别为$w^{s}_{c}$和$w^{t}_{c}$，令$α_{c}=w^{t}_{c}/w^{s}_{c}$，定义的参考源域分布为

$ {p_{s,\alpha }}\left( {{\mathit{\boldsymbol{X}}^s}} \right) = \sum\limits_{s = 1}^c {{\alpha _c}} w_c^s{p_s}\left( {{\mathit{\boldsymbol{X}}^s}|{y^s} = c} \right) $

(4)

式中，$p_{s}(\mathit{\boldsymbol{X}}^{s}|y^{s}=c)$为源域的类别条件分布，$y^{s}$为源样本$\mathit{\boldsymbol{X}}^{s}$的类别标签，$c$为类别数目，$c=1, 2, …, C$。

故给定目标样本的类权重，权值MMD的经验估计为

$ \begin{array}{*{20}{c}} {f_{{\rm{MMD }},w}^2\left( {{\mathit{\boldsymbol{D}}_s},{\mathit{\boldsymbol{D}}_t}} \right) = }\\ {||\frac{1}{{\sum\limits_{i = 1}^M {{\alpha _{y_\iota ^s}}} }}\sum\limits_{i = 1}^M {{\alpha _{y_\iota ^s}}} \phi \left( {\mathit{\boldsymbol{X}}_i^s} \right) - \frac{1}{N}\sum\limits_{j = 1}^N \phi \left( {\mathit{\boldsymbol{X}}_j^t} \right)||_{\rm{H}}^2} \end{array} $

(5)

式中，$M$, $N$分别为源域和目标域的样本总数。

2.3 多层校正网络

结合深度自适应网络思想，本文提出一种应用于无监督领域自适应的多层校正网络(MCDAN)，通过集成多层校正、采用多层权值MMD测量域差异等来共同学习可迁移特征和自适应分类器，实现有效的无监督领域自适应，网络架构如图 3所示。

该网络在Resnet18网络架构的基础上进行修正，增加了校正模块对卷积层逐层校正，保持整个网络中源样本与目标样本之间相对较小的差异，引入了3个全连接层，利用权值MMD来适应多个特征层，确保最后输出有效的分类器。故而，最终的损失函数$\ell$将MMD与正则化结合起来加以训练得到最小化

$ \min \ell = {R_{\rm{S}}} + \lambda \sum\limits_{l \in \mathit{\boldsymbol{F}}} {f_{{\rm{MMD}},w}^2} \left( {\mathit{\boldsymbol{D}}_\alpha ^l,\mathit{\boldsymbol{D}}_1^l} \right) $

(6)

式中，$\mathit{\boldsymbol{F}}=\{f_{c1}, f_{c2}, f_{c3}\}$，分别表示不同的全连接层，$\mathit{\boldsymbol{D}}^{l}_{s}$和$\mathit{\boldsymbol{D}}^{l}_{t}$分别表示第$l$层源数据和目标数据的输出集合，$λ$是超参数。具体参数配置见实验部分。

3 实验与分析

为了验证本文方法在图像识别和手写体数字识别两方面的性能，将MCDAN与最新的深度领域自适应方法和传统浅层自适应方法在目前流行的公开数据集上进行比较。

3.1 数据集

1) 图像识别。采用Office-31数据集^[21]作为评估领域自适应算法的标准。Office-31数据集由4 110幅图像组成，每幅图像的分辨率为300×300像素，涵盖31种不同类型的办公用品，来自amazon.com(A)、DSLR相机(D)和网络摄像头Webcam(W)3个不同领域，本文随机选择两个不同的域作为源域和目标域，测试顺序如下：A→D，A→W，D→A，D→W，W→A和W→D。

2) 数字识别。MNIST(modified national institute of standards and technology)数据集^[22]是一种广泛使用的灰度手写数字数据集，包含60 000幅训练图像和10 000幅测试图像。USPS(United States postal seruice)数据集^[23]是美国邮政服务手写数字识别库，包括9 298幅手写数字图像，图像的灰度值已被归一化，均为16×16像素。街景门牌号码(SVHN)^[24]是包含来自Google Street View的各种房屋号码，图像具有一定的实际背景。本文按SVHN→MNIST、MNIST→USPS和USPS→MNIST的顺序测试MCDAN的数字分类性能，每项任务中，图像分辨率都重新调整为28×28像素。各数据集的示例图片如图 4所示。

图 3 多层校正网络

Fig. 3 Multi-layer correction network

图 4 不同数据集的示例图片

Fig. 4 Sample images of different datasets ((a)amazon; (b)DSLR; (c)webcam; (d)MNIST; (e)USPS; (f)SVHN)

3.2 实验与结果

为了验证本文方法的有效性，在上述数据集上做了8个自适应分类实验，包括2种传统迁移学习方法(迁移成分分析(TCA)^[25]和测地线流式核方法(GFK)^[26])、3种深度领域自适应方法(深度域混淆(DDC)^[8]、深度自适应网络(DAN)^[9]和残差迁移学习(RTN)^[15])、2种主流深度神经网络模型(AlexNet^[5]和ResNet18^[14])和本文方法。

训练时，加权修正采用文献[27]的方案进行初始化，源模型采用随机梯度下降法训练，设置学习率为0.1，权重衰减为0.000 1，添加的Nesterov动量^[28]为0.9。深度神经网络模型AlexNet和ResNet18使用Adam^[29]方法训练，基础学习率为0.01，其余按文献[29]中参数设置，损失函数的超参数$λ$设置为1。每个实验都进行小批量训练，大小设置为512，每个小批量包含相同数量的源样本和目标样本，选择目标域的所有图像作为测试样本，重复实验5次后取平均值，结果如表 1所示。

为了验证本文方法在数字识别中的效果，使用LeNet^[23]、SA(subspace alignment)^[30]、DAN和本文方法4种方法在MNIST(M)、SVHN(S)和USPS(U)3个基准的训练图像集上进行实验。其中，卷积神经网络经典模型LeNet常用于数字识别，SA提出了一种用于领域自适应的子空间对齐方法，目的在于学习将源样本与目标样本对齐的特征映射。源模型全部采用随机梯度下降法训练，设置学习率为0.01，权重衰减为0.000 1，添加的Nesterov动量为0.9，每个实验中小批量训练的大小为256，选择目标域的所有图像作为测试样本，重复实验5次，取平均值，结果如表 2所示。

表 1 Office-31数据集上不同方法的测试结果
Table 1 Results of different methods on Office-31 dataset

下载CSV

网络架构	分类准确率/%
网络架构	A→W	D→W	W→D	A→D	W→A	D→A	平均值
TCA^[25]	56.7±0.0	89.8±0.0	86.2±0.0	54.3±0.0	48.6±0.0	45.5±0.0	63.5
GFK^[26]	55.9±0.0	91.6±0.0	88.4±0.0	55.7±0.0	49.0±0.0	47.5±0.0	64.7
AlexNet^[25]	58.6±0.3	94.7±0.2	96.9±0.1	61.3±0.2	44.7±0.5	46.3±0.3	67.1
ResNet18^[14]	62.2±0.2	95.0±0.1	97.3±0.2	68.9±0.2	48.2±0.1	46.8±0.2	69.7
DDC^[8]	58.8±0.5	95.2±0.4	96.4±0.2	62.9±0.3	45.3±0.5	48.2±0.4	67.8
DAN^[9]	66.3±0.3	95.6±0.4	97.8±0.3	65.4±0.2	51.8±0.3	49.5±0.1	71.1
RTN^[15]	71.4±0.2	96.2±0.2	98.6±0.1	68.0±0.1	49.7±0.3	50.3±0.2	72.4
MCDAN(本文)	77.6±0.5	98.5±0.2	99.2±0.1	76.3±0.2	57.9±0.3	55.6±0.0	77.5
注：加粗字体为最优结果。

表 2 MNIST、USPS和SVHN数据集上不同方法的测试结果
Table 2 Results of different methods on MNIST, USPS and SVHN datasets

下载CSV

网络架构	分类准确率/%
网络架构	S→M	M→U	U→M	平均值
LeNet^[23]	55.6±0.5	60.2±0.4	47.4±0.5	54.4
SA^[30]	59.1±0.3	63.6±0.3	54.7±0.5	59.2
DAN^[9]	66.5±0.3	68.4±0.4	61.6±0.6	65.5
MCDAN(本文)	71.6±0.3	74.5±0.3	67.1±0.4	71.1
注：加粗字体为最优结果。

3.3 结果分析

表 1是图像适应性实验结果，对大多数任务来说，深度领域自适应方法优于GFK和TCA等浅层方法，表明深度领域自适应方法可以学习更多的域不变性和区分性表示来提高最终的分类性能。从各个模型的表现可以看出，涉及amazon数据集的自适应实验最具挑战性，无适应性的模型AlexNet和ResNet18虽然能够在一定程度上实现分类性能，但仍然存在较高的目标误差，这表明增加适应性是必要的，即使对于更深的源网络也是如此。与基于MMD的方法(DDC和DAN)和基于残差的方法(RTN)相比，由于添加了多层校正和多层权值MMD，本文方法具有更小的目标误差，目标精确度大大提高。

表 2是数字适应性实验结果，从表 2可以看出，本文方法在很大程度上实现了更高的准确率和优秀的分类性能，平均分类准确度比LeNet高16.7%、比SA高11.9%、比DAN高6.6%。

4 结论

本文提出了一种基于多层校正的无监督领域自适应方法，可以实现自适应分类器和可迁移特征的端到端学习。该方法利用多层校正来调整现有网络结构，通过对齐源域和目标域的数据表示来学习域不变特征，进而完成更有效的分类任务。同时，加入的权值MMD能有效减少类权重偏差的影响，使得分类准确率大大提高。多层网络结构增加了神经网络的容量，使泛化性能更加优秀。综合实验结果表明，本文方法实现了具有较高准确度的跨域分类性，模型优于目前其他有竞争力的无监督领域自适应方法。本文算法在测量精度和结构优化方面仍有待进一步提高，后续的研究主要在残差校正层和权值MMD层进行进一步的优化处理，改善整个网络结构，使得整体性能朝着更快更准确的方向发展。

参考文献

[1] Pan S J, Yang Q. A survey on transfer learning[J]. IEEE Transactions on Knowledge and Data Engineering, 2010, 22(10): 1345–1359. [DOI:10.1109/TKDE.2009.191]

[2] Long M S, Wang J M, Cao Y, et al. Deep learning of transferable representation for scalable domain adaptation[J]. IEEE Transactions on Knowledge and Data Engineering, 2016, 28(8): 2027–2040. [DOI:10.1109/TKDE.2016.2554549]

[3] Bruzzone L, Marconcini M. Domain adaptation problems:a DASVM classification technique and a circular validation strategy[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 32(5): 770–787. [DOI:10.1109/TPAMI.2009.57]

[4] Tang S, Ye M, Li X D. Domain adaptation object recognition[J]. ZTE Technology Journal, 2017, 23(4): 25–31. [唐宋, 叶茂, 李旭冬. 领域自适应目标识别综述[J]. 中兴通讯技术, 2017, 23(4): 25–31. ] [DOI:10.3969/j.issn.1009-6868.2017.04.005]

[5] Lunga D, Yang H L, Reith A, et al. Domain-adapted convolutional networks for satellite image classification:a large-scale interactive learning workflow[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2018, 11(3): 962–977. [DOI:10.1109/JSTARS.2018.2795753]

[6] Sohn K, Liu S F, Zhong G Y, et al. Unsupervised domain adaptation for face recognition in unlabeled videos[C]//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE, 2017: 5917-5925.[DOI: 10.1109/ICCV.2017.630]

[7] Chen Y H, Li W, Sakaridis C, et al. Domain adaptive faster R-CNN for object detection in the wild[EB/OL]. 2018-03-08[2018-08-12].https://arxiv.org/pdf/1803.03243.pdf.

[8] Tzeng E, Hoffman J, Zhang N, et al. Deep domain confusion: maximizing for domain invariance[EB/OL]. 2014-12-10[2018-08-12].https://arxiv.org/pdf/1412.3474.pdf.

[9] Long M S, Cao Y, Wang J M, et al. Learning transferable features with deep adaptation networks[EB/OL]. 2015-05-27[2018-08-12].https://arxiv.org/pdf/1502.02791.pdf.

[10] Long M S, Zhu H, Wang J M, et al. Deep transfer learning with joint adaptation networks[EB/OL]. 2016-05-21[2018-08-12].https://arxiv.org/pdf/1605.06636.pdf.

[11] Li S, Song S J, Wu C. Layer-wise domain correction for unsupervised domain adaptation[J]. Frontiers of Information Technology & Electronic Engineering, 2018, 19(1): 91–103. [DOI:10.1631/FITEE.1700774]

[12] Yosinski J, Clune J, Bengio Y, et al. How transferable are features in deep neural networks?[EB/OL]. 2014-11-06[2018-08-12].https://arxiv.org/pdf/1411.1792.pdf.

[13] He K M, Sun J. Convolutional neural networks at constrained time cost[C]//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA, USA: IEEE, 2015: 5353-5360.[DOI: 10.1109/CVPR.2015.7299173]

[14] He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2016: 770-778.[DOI: 10.1109/CVPR.2016.90]

[15] Long M S, Zhu H, Wang J M, et al. Unsupervised domain adaptation with residual transfer networks[EB/OL]. 2016-02-14[2018-08-12].https://arxiv.org/pdf/1602.04433.pdf.

[16] Borgwardt K M, Gretton A, Rasch M J, et al. Integrating structured biological data by kernel maximum mean discrepancy[J]. Bioinformatics, 2006, 22(14): e49–e57. [DOI:10.1093/bioinformatics/btl242]

[17] Gretton A, Borgwardt K M, Rasch M J, et al. A kernel two-sample test[J]. Journal of Machine Learning Research, 2012, 13: 723–773.

[18] Li Y J, Swersky K, Zemel R. Generative moment matching networks[EB/OL]. 2015-02-10[2018-08-12].https://arxiv.org/pdf/1502.02761.pdf.

[19] Gardner J R, Upchurch P, Kusner M J, et al. Deep manifold traversal: changing labels with convolutional features[EB/OL]. 2015-11-19[2018-08-12].https://arxiv.org/pdf/1511.06421.pdf.

[20] Wang Y N, Qin P L, Li C P, et al. Improved algorithm of image super resolution based on residual neural network[J]. Journal of Computer Application, 2018, 38(1): 246–254. [王一宁, 秦品乐, 李传朋, 等. 基于残差神经网络的图像超分辨率改进算法[J]. 计算机应用, 2018, 38(1): 246–254. ] [DOI:10.11772/j.issn.1001-9081.2017061461]

[21] Saenko K, Kulis B, Fritz M, et al. Adapting visual category models to new domains[C]//Proceedings of the 11th European Conference on Computer Vision. Heraklion, Crete, Greece: Springer, 2010: 213-226.[DOI: 10.1007/978-3-642-15561-1_16]

[22] Lécun Y, Bottou L, Bengio Y, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278–2324. [DOI:10.1109/5.726791]

[23] Hull J J. A database for handwritten text recognition research[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1994, 16(5): 550–554. [DOI:10.1109/34.291440]

[24] Netzer Y, Wang T, Coates A, et al. Reading digits in natural images with unsupervised feature learning[C]//Proceedings of the NIPS Workshop on Deep Learning and Unsupervised Feature Learning. Granada, Spain: NIPS, 2011: 1-9.

[25] Pan S J, Tsang I W, Kwok J T, et al. Domain adaptation via transfer component analysis[J]. IEEE Transactions on Neural Networks, 2011, 22(2): 199–210. [DOI:10.1109/TNN.2010.2091281]

[26] Gong B Q, Shi Y, Sha F, et al. Geodesic flow kernel for unsupervised domain adaptation[C]//Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence, RI, USA: IEEE, 2012: 2066-2073.[DOI: 10.1109/CVPR.2012.6247911]

[27] He K M, Zhang X Y, Ren S Q, et al. Delving deep into rectifiers: surpassing human-level performance on ImageNet classification[C]//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE, 2015: 1026-1034.[DOI: 10.1109/ICCV.2015.123]

[28] Mikolov T, Sutskever I, Chen K, et al. Distributed representations of words and phrases and their compositionality[C]//Proceedings of the 26th International Conference on Neural Information Processing Systems. Lake Tahoe, Nevada, USA: Curran Associates Inc., 2013: 3111-3119.

[29] Kingma D P, Ba J. Adam: a method for stochastic optimization[EB/OL]. 2014-12-22[2018-08-12].https://arxiv.org/pdf/1412.6980.pdf.

[30] Fernando B, Habrard A, Sebban M, et al. Unsupervised visual domain adaptation using subspace alignment[C]//Proceedings of 2013 IEEE International Conference on Computer Vision. Sydney, NSW, Australia: IEEE, 2013: 2960-2967.