注意力迁移的联合平衡领域自适应

汪荣贵; 姚旭晨; 杨娟; 薛丽霞

doi:10.11834/jig.180497

图像分析和识别 | 浏览量 : 0 下载量: 8 CSCD: 1

PDF
导出
分享
收藏
专辑

注意力迁移的联合平衡领域自适应
Learning transferrable attention for joint balanced domain adaptation
2019年24卷第7期页码：1116-1125
收稿：2018-08-27，

修回：2019-1-17，

纸质出版：2019-07-16
DOI： 10.11834/jig.180497
稿件说明：

移动端阅览

汪荣贵, 姚旭晨, 杨娟, 薛丽霞. 注意力迁移的联合平衡领域自适应[J]. 中国图象图形学报, 2019,24(7):1116-1125. DOI： 10.11834/jig.180497.

Ronggui Wang, Xuchen Yao, Juan Yang, Lixia Xue. Learning transferrable attention for joint balanced domain adaptation[J]. Journal of Image and Graphics, 2019, 24(7): 1116-1125. DOI： 10.11834/jig.180497.

摘要

目的

现有的图像识别方法应用于从同一分布中提取的训练数据和测试数据时具有良好性能，但这些方法在实际场景中并不适用，从而导致识别精度降低。使用领域自适应方法是解决此类问题的有效途径，领域自适应方法旨在解决来自两个领域相关但分布不同的数据问题。

方法

通过对数据分布的分析，提出一种基于注意力迁移的联合平衡自适应方法，将源域有标签数据中提取的图像特征迁移至无标签的目标域。首先，使用注意力迁移机制将有标签源域数据的空间类别信息迁移至无标签的目标域。通过定义卷积神经网络的注意力，使用关注信息来提高图像识别精度。其次，基于目标数据集引入网络参数的先验分布，并且赋予网络自动调整每个领域对齐层特征对齐的能力。最后，通过跨域偏差来描述特定领域的特征对齐层的输入分布，定量地表示每层学习到的领域适应性程度。

结果

该方法在数据集Office-31上平均识别准确率为77.6%，在数据集Office-Caltech上平均识别准确率为90.7%，不仅大幅领先于传统手工特征方法，而且取得了与目前最优的方法相当的识别性能。

结论

注意力迁移的联合平衡领域自适应方法不仅可以获得较高的识别精度，而且能够自动学习领域间特征的对齐程度，同时也验证了进行域间特征迁移可以提高网络优化效果这一结论。

Abstract

Objective

Many image recognition methods demonstrate good performance when applied to training and test data extracted from the same distribution. However

these methods are unsuitable in practical scenarios and result in low performance. Using domain adaptive methods is an effective approach for solving such problem. Domain adaptation aims to solve various problems

such as when data are from two related domains but with different distributions. In practical applications

labeling data takes substantial manual labor. Thus

unsupervised learning has become a clear trend in image recognition. Transfer learning can extract knowledge from the labeled data in the source domain and transfer it to the unlabeled target domain.

Method

We propose a joint balanced adaptive method based on attention transfer mechanism

which transfers feature representations extracted from the labeled datasets in the source domain to the unlabeled datasets in the target domain. Specifically

we first transfer the labeled source-domain space category information to the unlabeled target domain via attention transfer mechanism. Neural networks reflect the basic characteristics of the human brain

and attention is precisely an important part of the human visual experience and closely related to perception. Artificial attention mechanism started to be developed as artificial neural network has become increasingly popular in various fields

such as computer vision and pattern recognition. Allowing a system to learn attending objects and understand the mechanism behind neural networks has become a research tool. Attention information can be used to improve image recognition accuracy significantly by defining the attention of convolutional neural networks (CNNs). In this study

attention can be seen as a set of spatial mappings that encode the spatial regions highly concerned with the network input to determine its possible output. Second

we introduce the prior distribution of the network parameters on the basis of the target dataset and endow the layer with the capability of automatically learning the alignment degree that should be pursued at different levels of the network. We expect to explore abundant source-domain attributes through cross-domain learning and capture substantial complex cross-domain knowledge by embedding cross-dataset information for minimizing the original function loss for the learning tasks in two domains as much as possible. Machine learning is an alternative approach for recognizing the refined features after preprocessing raw data into features on the basis of prior knowledge of humans. Machine learning experts have spent most of their time designing features in the past few years because recognition results depend on the quality of features. Recent breakthrough in object recognition has been mainly achieved by approaches based on deep CNN due to its more powerful feature extraction and image representation capabilities than manually defined features

such as HOG and SIFT. The higher the network layers are

the more specific the characteristics are for the target categorization tasks. Meanwhile

the features on successive layers interact with each other in a complex and fragile way. Accordingly

the neurons between neighboring layers co-adapt during training. Therefore

the mobility of features and classifiers decreases as the cross-domain difference increases. Finally

we describe the input distribution of the domain-specific adaptive alignment layer by introducing cross-domain biases

thereby quantitatively indicating the inter-domain adaptation degree that each layer learns. Meanwhile

we adaptively change the weight of each category in the dataset. Although deep CNN is a unified training and prediction framework that combines multi-level feature extractors and recognizers

end-to-end processing is particularly important. The design concept for our model fully utilizes the capability of CNN to perform end-to-end processing.

Result

The average recognition accuracies of the method in datasets Office-31 and Office-Caltech are 77.6% and 90.7%

respectively. Thus

this method significantly outperforms traditional methods based on handcrafted feature and is also comparable with state-of-the-art methods. Although not all single transfer tasks achieve optimal results

the average recognition accuracy of the six transfer tasks is improved compared with the current mainstream methods.

Conclusion

Transferring image features extracted from labeled data in the source domain to the unlabeled target domain effectively solves data problems from two domains that are related but differently distributed. The method fully utilizes the spatial location information of the labeled data in the source domain through attention transfer mechanism and uses the deep CNN to learn the alignment degree of the features between domains automatically. Learning ability largely depends on the degree of inter-domain correlation

which is a major limitation for transfer learning. In addition

knowledge transition is apparently ineffective if no similarity exists between the domains. Thus

we fully consider the feature correlation in the dataset between source and target domains and adaptively change the weight of each category in the dataset. Our method can not only effectively obtain high recognition accuracy but also automatically learn the degree of feature alignment between domains. This method also verifies that the inter-domain feature transfer can improve network optimization effect.

关键词

Keywords

references

Fu Y H, Aldrich C. Froth image analysis by use of transfer learning and convolutional neural networks[J]. Minerals Engineering, 2018, 115:68-78.[DOI:10.1016/j.mineng.2017.10.005]

Liu W J, Liang X J, Qu H C. Adaptively enhanced convolutional neural network algorithm for image recognition[J]. Journal of Image and Graphics, 2017, 22(12):1723-1736.

刘万军, 梁雪剑, 曲海成.自适应增强卷积神经网络图像识别[J].中国图象图形学报, 2017, 22(12):1723-1736. [DOI:10.11834/jig.170079]

Saito K, Watanabe K, Ushiku Y, et al. Maximum classifier discrepancy for unsupervised domain adaptation[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018: 3723-3732.[ DOI:10.1109/CVPR.2018.00392 http://dx.doi.org/10.1109/CVPR.2018.00392 ]

Long M S, Zhu H, Wang J M, et al. Deep transfer learning with joint adaptation networks[C]//Proceedings of the 34th International Conference on Machine Learning. Sydney, Australia: ACM, 2017: 2208-2217.

Long M S, Cao Y, Wang J M, et al. Learning transferable features with deep adaptation networks[C]//Proceedings of the 32nd International Conference on Machine Learning. Lille, France: ACM, 2015: 97-105.

Tzeng E, Hoffman J, Zhang N, et al. Deep domain confusion: maximizing for domain invariance[EB/OL].[2018-08-12] . https://arxiv.org/pdf/1412.3474.pdf https://arxiv.org/pdf/1412.3474.pdf .

Fan S J, Shen Z Q, Jiang M, et al. Emotional attention: a study of image sentiment and visual attention[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018: 7521-7531.[ DOI:10.1109/CVPR.2018.00785 http://dx.doi.org/10.1109/CVPR.2018.00785 ]

Li K P, Wu Z Y, Peng K C, et al. Tell me where to look: guided attention inference network[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018: 9215-9223.[ DOI:10.1109/CVPR.2018.00960 http://dx.doi.org/10.1109/CVPR.2018.00960 ]

Wang S Y, Zhou H Y, Yang Y. Kernel correlation adaptive target tracking based on convolution feature[J]. Journal of Image and Graphics, 2017, 22(9):1230-1239.

王守义, 周海英, 杨阳.基于卷积特征的核相关自适应目标跟踪[J].中国图象图形学报, 2017, 22(9):1230-1239. [DOI:10.11834/jig.170009]

Hong S, Oh J, Lee H, et al. Learning transferrable knowledge for semantic segmentation with deep convolutional neural network[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE, 2016: 3204-3212.[ DOI:10.1109/CVPR.2016.349 http://dx.doi.org/10.1109/CVPR.2016.349 ]

Xu K, Ba J, Kiros R, et al. Show, attend and tell: neural image caption generation with visual attention[EB/OL].[2018-08-12] . https://arxiv.org/pdf/1502.03044.pdf https://arxiv.org/pdf/1502.03044.pdf .

Memisevic R. Learning to relate images[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(8):1829-1846.[DOI:10.1109/TPAMI.2013.53]

Ioffe S, Szegedy C. Batch normalization: accelerating deep network training by reducing internal covariate shift[EB/OL].[2018-08-12] . https://arxiv.org/pdf/1502.03167.pdf https://arxiv.org/pdf/1502.03167.pdf .

Jia Y Q, Shelhamer E, Donahue J, et al. Caffe: convolutional architecture for fast feature embedding[C]//Proceedings of the 22nd ACM International Conference on Multimedia. Orlando, USA: ACM, 2014: 675-678.[ DOI:10.1145/2647868.2654889 http://dx.doi.org/10.1145/2647868.2654889 ]

Saenko K, Kulis B, Fritz M, et al. Adapting visual category models to new domains[C]//Proceedings of the 11th European Conference on Computer Vision. Heraklion, Crete, Greece: Springer, 2010: 213-226.[ DOI:10.1007/978-3-642-15561-1_16 http://dx.doi.org/10.1007/978-3-642-15561-1_16 ]

Gong B Q, Shi Y, Sha F, et al. Geodesic flow kernel for unsupervised domain adaptation[C]//Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence, USA: IEEE, 2012: 2066-2073.[ DOI:10.1109/CVPR.2012.6247911 http://dx.doi.org/10.1109/CVPR.2012.6247911 ]

Griffin G, Holub A, Perona P. Caltech-256 object category dataset[R]. California: California Institute of Technology, 2007.

Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks[C]//Proceedings of the 25th International Conference on Neural Information Processing Systems. Lake Tahoe, USA: ACM, 2012: 1097-1105.

Long M S, Wang J M, et al. Unsupervised domain adaptation with residual transfer networks[C]//Proceedings of the 30th International Conference on Neural Information Processing Systems. Barcelona, Spain: ACM, 2016: 136-144.

Ganin Y, Lempitsky V. Unsupervised domain adaptation by backpropagation[C]//Proceedings of the 32nd International Conference on Machine Learning. Lille, France: ACM, 2015: 1180-1189.

Cariucci F M, Porzi L, Caputo B, et al. AutoDIAL: automatic domain alignment layers[C] //Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE, 2017: 5077-5085.[ DOI:10.1109/ICCV.2017.542 http://dx.doi.org/10.1109/ICCV.2017.542 ]

Pan S J, Tsang I W, Kwok J T, et al. Domain adaptation via transfer component analysis[J]. IEEE Transactions on Neural Networks, 2011, 22(2):199-210.[DOI:10.1109/TNN.2010.2091281]

Li Y H, Wang N Y, Shi J P, et al. Adaptive batch normalization for practical domain adaptation[J]. Pattern Recognition, 2018, 80:109-117.[DOI:10.1016/j.patcog.2018.03.005]