细粒度图像分类的互补注意力方法

赵勋; 王家宝; 李阳; 王亚鹏; 苗壮

doi:10.11834/jig.200426

图像分析和识别 | 浏览量 : 0 下载量: 0 CSCD: 2

PDF
导出
分享
收藏
专辑

细粒度图像分类的互补注意力方法
Complemented attention method for fine-grained image classification
2021年26卷第12期页码：2860-2869
纸质出版日期： 2021-12-16 ，

录用日期： 2020-10-19
DOI： 10.11834/jig.200426
稿件说明：

移动端阅览

赵勋, 王家宝, 李阳, 王亚鹏, 苗壮. 细粒度图像分类的互补注意力方法[J]. 中国图象图形学报, 2021,26(12):2860-2869.

Xun Zhao, Jiabao Wang, Yang Li, Yapeng Wang, Zhuang Miao. Complemented attention method for fine-grained image classification[J]. Journal of Image and Graphics, 2021,26(12):2860-2869.
赵勋, 王家宝, 李阳, 王亚鹏, 苗壮. 细粒度图像分类的互补注意力方法[J]. 中国图象图形学报, 2021,26(12):2860-2869. DOI： 10.11834/jig.200426.

Xun Zhao, Jiabao Wang, Yang Li, Yapeng Wang, Zhuang Miao. Complemented attention method for fine-grained image classification[J]. Journal of Image and Graphics, 2021,26(12):2860-2869. DOI： 10.11834/jig.200426.

摘要

目的

由于分类对象具有细微类间差异和较大类内变化的特点，细粒度分类一直是一个具有挑战性的任务。绝大多数方法利用注意力机制学习目标中显著的局部特征。然而，传统的注意力机制往往只关注了目标最显著的局部特征，同时抑制其他区域的次级显著信息，但是这些抑制的信息中通常也含有目标的有效特征。为了充分提取目标中的有效显著特征，本文提出了一种简单而有效的互补注意力机制。

方法

基于SE(squeeze-and-excitation)注意力机制，提出了一种新的注意力机制，称为互补注意力机制(complemented SE，CSE)。既从原始特征中提取主要的显著局部特征，也从抑制的剩余通道信息中提取次级显著特征，这些特征之间具有互补性，通过融合这些特征可以得到更加高效的特征表示。

结果

在CUB-Birds(Caltech-UCSD Birds-200- 2011)、Stanford Dogs、Stanford Cars和FGVC-Aircraft(fine-grained visual classification of aircraft)4个细粒度数据集上对所提方法进行验证，以ResNet50为主干网络，在测试集上的分类精度分别达到了87.9%、89.1%、93.9%和92.4%。实验结果表明，所提方法在CUB-Birds和Stanford Dogs两个数据集上超越了当前表现最好的方法，在Stanford Cars和FGVC-Aircraft数据集的表现也接近当前主流方法。

结论

本文方法着重提升注意力机制提取特征的能力，得到高效的目标特征表示，可用于细粒度图像分类和特征提取相关的计算机视觉任务。

Abstract

Objective

Fine-grained image classification has aimed at classifying objects based on very similar categories

such as subcategories of birds

dogs and cars in comparison with coarse-grained image classification. Due to the characteristic of small inter-class variation and large intra-class variation

fine-grained image classification has been more challenging than general image classification. The key is to extract the subtle discriminative features of the object. The attention mechanism can actively learn the salient features of the target

which has been widely used in various image feature extraction tasks. The traditional attention mechanism has one obstacle that the effective characteristics of the objects. e.g.

SE (squeeze-and-excitation) attention mechanism

OSME (one-squeeze multi-excitation) attention mechanism and BAM (bottleneck attention module) cannot be adequately extracted. The traditional attention mechanism has focused on the most salient features of the target and suppressed the feature representation of other regions. The suppressed regions have usually contained the effective features of the target. The feature representation can be obtained more adequate via extracting the features form the suppressed regions of object to propose a new attention mechanism

called complementary attention mechanism (complemented SE

CSE)

which can extract more effective features of the target.

Method

A new complemented attention mechanism CSE based on the SE attention mechanism has been proposed. The complementary attention mechanism has been divided into three steps. 1) the SE attention mechanism has been used to extract the most significant discriminative features of the target and the suppressed features. 2) The SE attention mechanism to extract secondary salient features has been used for the suppressed features again. 3) Two kinds of features fusing have obtained a more efficient feature representation. Moreover

a cross-layer network structure has extracted the significance features of different layers and fused them to get the final characteristic representation for all information of the object mining. In the experimental stage

the model in PyTorch has been developed and ResNet50 (pretrained on ImageNet) as convolutional neural network(CNN) backbone has been used. The input images have been resized to 448×448 pixels for training and testing. The model has been trained using the SGD (stochastic gradient descent) with momentum of 0.9

weight decay of 0.000 5 and the learning rate of 0.001. The model has been trained for 150 epochs and the learning rate decayed by 0.1 every 30 epochs.

Result

In order to verify the effectiveness

the experiments have been conducted on four fine-grained datasets: CUB-Birds

Stanford Dogs

Stanford Cars and FGVC-Aircraft. The classification accuracy has been achieved with the following percentages: 87.9%

89.1%

93.9% and 92.4%

respectively. The results have shown that the method has achieved the same effect as the state-of-the-art methods. In the ablation study

the capability of three attention mechanisms (SE

OSME and CSE) has been compared to extract features in the same conditions. The results have shown that CSE attention mechanism improved by 1.1% and 0.6%

respectively

in the CUB-Birds dataset

and improved by 1.7% and 1%

respectively

in the Stanford Dogs dataset compared with SE attention mechanism and OSME attention mechanism. The feature visualization has been conducted to see the regional features of attention mechanism more intuitive. All results have shown that CSE attention mechanism has more powerful ability of feature extraction than SE attention mechanism and OSME attention mechanism. The validity of each structure in the network on the CUB-Birds dataset has been verified.

Conclusion

To solve the problem of insufficient feature extraction in traditional attention mechanisms

a complemented attention method for fine-grained image classification have been proposed

which focused on improving the ability of the attention mechanism to extract features and obtaining efficient representation of target features. The CSE attention mechanism has been more concerned to discriminative regional characteristics than the SE attention mechanism and the OSME attention mechanism in ablation study.

关键词

细粒度注意力机制图像分类特征提取特征表示

Keywords

fine-grainedattention mechanismimage classificationfeature extractionfeature representation

references

Chen Y, Bai Y L, Zhang W and Mei T. 2019. Destruction and construction learning for fine-grained image recognition//Proceedings of 2019 IEEE/CVF IEEE Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 5152-5161[DOI:10.1109/CVPR.2019.00530http://dx.doi.org/10.1109/CVPR.2019.00530]

Ding Y, Zhou Y Z, Zhu Y, Ye Q X and Jiao J B. 2019. Selective sparse sampling for fine-grained image recognition//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 6598-6607[DOI:10.1109/CVPR.2017.476http://dx.doi.org/10.1109/CVPR.2017.476]

Feng Y S and Wang Z L. 2016. Fine-grained image categorization with segmentation based on top-down attention map. Journal of Image and Graphics, 21(9): 1147-1154

冯语姗, 王子磊. 2016. 自上而下注意图分割的细粒度图像分类. 中国图象图形学报, 21(9): 1147-1154[DOI: 10.11834/jig.20160904

Fu J L, Zheng H L and Mei T. 2017. Look closer to see better: recurrent attention convolutional neural network for fine-grained image recognition//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 4476-4484[https://arxiv.org/pdf/2005.05123.pdfhttps://arxiv.org/pdf/2005.05123.pdf]

Hanselmann H and Ney H. 2020. Fine-grained visual classification with efficient end-to-end localization[EB/OL][2020-05-20].DOI:10.1109/CVPR.2016.90http://dx.doi.org/10.1109/CVPR.2016.90

He K M, Zhang X Y, Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 770-778[https://arxiv.org/pdf/1709.01507.pdfhttps://arxiv.org/pdf/1709.01507.pdf]

Hu J, Shen L, Albanie S, Sun G and Wu E H. 2019. Squeeze-and-excitation networks[EB/OL]. [2020-07-30].DOI:10.1109/CVPR.2016.132http://dx.doi.org/10.1109/CVPR.2016.132

Huang S L, Xu Z, Tao D C and Zhang Y. 2016. Part-stacked CNN for fine-grained visual categorization//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 1173-1182[DOI:10.1109/CVPR42600.2020.01048http://dx.doi.org/10.1109/CVPR42600.2020.01048]

Ji R Y, Wen L Y, Zhang L B, Du D W, Wu Y J, Zhao C, Liu X L and Huang F Y. 2020. Attention convolutional binary neural tree for fine-grained visual categorization//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 10465-10474[http://vision.stanford.edu/aditya86/ImageNetDogs/http://vision.stanford.edu/aditya86/ImageNetDogs/]

Khosla A, Jayadevaprakash N, Yao B P and Li F F. 2011. Novel dataset for fine-grained image categorization: Stanford dogs[EB/OL]. [2020-07-30].DOI:10.1109/ICCVW.2013.77http://dx.doi.org/10.1109/ICCVW.2013.77

Krause J, Stark M, Deng J and Li F F. 2013. 3D object representations for fine-grained categorization//Proceedings of 2013 IEEE International Conference on Computer Vision Workshops. Sydney, Australia: IEEE: 554-561[DOI:10.1109/ICCV.2015.170http://dx.doi.org/10.1109/ICCV.2015.170]

Krizhevsky A, Sutskever I and Hinton G. 2012. ImageNet classification with deep convolutional neural networks//Proceedings of the 26th Annual Conference on Neural: Information Processing Systems. Lake Tahoe, USA: NIPS: 1106-1114.

Lin T Y, RoyChowdhury A and Maji S. 2015. Bilinear CNN models for fine-grained visual recognition//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 1449-1457[DOI:10.1109/ICCV.2019.00833http://dx.doi.org/10.1109/ICCV.2019.00833]

Luo W, Yang X T, Mo X J, Lu Y H, Davis L, Li J, Yang J and Lim S N. 2019. Cross-X learning for fine-grained visual categorization//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 8241-8250[http://arxiv.org/pdf/1306.5151.pdfhttp://arxiv.org/pdf/1306.5151.pdf]

Maji S, Rahtu E, Kannala J, Blaschko M B and Vedaldi A. 2013. Fine-grained visual classification of aircraft[EB/OL]. [2020-07-30].https://arxiv.org/pdf/1903.07071.pdfhttps://arxiv.org/pdf/1903.07071.pdf

Park J, Woo S, Lee J Y and Kweon I S. 2018. BAM: bottleneck attention module[EB/OL]. [2020-07-30].https://arxiv.org/pdf/1409.1556.pdfhttps://arxiv.org/pdf/1409.1556.pdf

Simonyan K and Zisserman A. 2015. Very deep convolutional networks for large-scale image recognition[EB/OL]. [2020-07-30].DOI:10.1007/978-3-030-01270-0_49http://dx.doi.org/10.1007/978-3-030-01270-0_49

Sun M, Yuan Y C, Zhou F and Ding E R. 2018. Multi-attention multi-class constraint for fine-grained image recognition//Proceedings of European Conference on Computer Vision. Munich, Germany: Springer: 834-850[http://www.vision.caltech.edu/visipedia/papers/CUB_200_2011.pdfhttp://www.vision.caltech.edu/visipedia/papers/CUB_200_2011.pdf]

Wah C, Branson S, Welinder P, Perona P and Belongie S. 2011. The caltech-UCSD birds-200-2011 dataset[EB/OL]. [2020-07-30].DOI:10.1109/CVPR.2018.00436http://dx.doi.org/10.1109/CVPR.2018.00436

Wang Y M, Morariu V I and Davis L S. 2018. Learning a discriminative filter bank within a CNN for fine-grained recognition//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 4148-4157[DOI:10.1007/978-3-030-01234-2_1http://dx.doi.org/10.1007/978-3-030-01234-2_1]

Wang Y X and Zhang X B. 2019. Fine-grained image classification with network architecture of focus and recognition. Journal of Image and Graphics, 24(4): 493-502

王永雄, 张晓兵. 2019. 聚焦-识别网络架构的细粒度图像分类. 中国图象图形学报, 24(4): 493-502[DOI:10.11834/jig.180423]

Woo S, Park J, Lee J Y and Kweon I S. 2018. CBAM: convolutional block attention module//Proceedings of European Conference on Computer Vision. Munich, Germany: Springer: 3-19[DOI:10.1109/CVPR.2015.7298685http://dx.doi.org/10.1109/CVPR.2015.7298685]

Xiao T J, Xu Y C, Yang K Y, Zhang J X, Peng Y X and Zhang Z. 2015. The application of two-level attention models in deep convolutional neural network for fine-grained image classification//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 842-850[DOI:10.1007/978-3-030-01264-9_26http://dx.doi.org/10.1007/978-3-030-01264-9_26]

Yang Z, Luo T G, Wang D, Hu Z Q, Gao J and Wang L W. 2018. Learning to navigate for fine-grained classification//Proceedings of European Conference on Computer Vision. Munich, Germany: Springer: 438-454[DOI:10.1007/978-3-319-10590-1_54http://dx.doi.org/10.1007/978-3-319-10590-1_54]

Zhang N, Donahue J, Girshick R and Darrell T. 2014. Part-based R-CNNs for fine-grained category detection//Proceedings of European Conference on Computer Vision. Zurich, Switzerland: Springer: 834-849[DOI:10.1109/CVPR.2016.128http://dx.doi.org/10.1109/CVPR.2016.128]

Zhang X P, Xiong H K, Zhou W G, Lin W Y and Tian Q. 2016. Picking deep filter responses for fine-grained image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: CVPR: 1134-1142[DOI:10.1109/ICCV.2017.557http://dx.doi.org/10.1109/ICCV.2017.557]

Zheng H L, Fu J L, Mei T and Luo J B. 2017. Learning multi-attention convolutional neural network for fine-grained image recognition//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 5219-5227[DOI:10.1109/ICCV.2017.557http://dx.doi.org/10.1109/ICCV.2017.557]

Zhuang P Q, Wang Y L and Qiao Y. 2020. Learning attentive pairwise interaction for fine-grained classification[EB/OL]. [2020-06-10].https://arxiv.org/pdf/2002.10191.pdfhttps://arxiv.org/pdf/2002.10191.pdf

文章被引用时，请邮件提醒。

提交

智能交通系统中的车辆标志识别方法综述

特征重用和注意力机制下肝肿瘤自动分类

红外与可见光图像特征动态选择的目标检测网络

注意力引导局部特征联合学习的人脸表情识别

面向虚拟视点绘制空洞填充的渐进式迭代网络