网络监督数据下的细粒度图像识别综述

魏秀参; 许玉燕; 杨健

doi:10.11834/jig.210188

学者观点 | 浏览量 : 0 下载量: 0 CSCD: 3

PDF
导出
分享
收藏
专辑

网络监督数据下的细粒度图像识别综述
Review of webly-supervised fine-grained image recognition
2022年27卷第7期页码：2057-2077
纸质出版日期： 2022-07-16 ，

录用日期： 2021-04-29
DOI： 10.11834/jig.210188
稿件说明：

移动端阅览

魏秀参, 许玉燕, 杨健. 网络监督数据下的细粒度图像识别综述[J]. 中国图象图形学报, 2022,27(7):2057-2077.

Xiushen Wei, Yuyan Xu, Jian Yang. Review of webly-supervised fine-grained image recognition[J]. Journal of Image and Graphics, 2022,27(7):2057-2077.
魏秀参, 许玉燕, 杨健. 网络监督数据下的细粒度图像识别综述[J]. 中国图象图形学报, 2022,27(7):2057-2077. DOI： 10.11834/jig.210188.

Xiushen Wei, Yuyan Xu, Jian Yang. Review of webly-supervised fine-grained image recognition[J]. Journal of Image and Graphics, 2022,27(7):2057-2077. DOI： 10.11834/jig.210188.

摘要

细粒度图像识别旨在对某一传统语义类别下细粒度级别的不同子类类别进行视觉识别，在智慧新经济和工业物联网等领域（如智慧城市、公共安全、生态保护、农业生产与安全保障）具有重要的科学意义和应用价值。细粒度图像识别在深度学习的助力下取得了长足进步，但其对大规模优质细粒度图像数据的依赖成为制约细粒度图像识别推广和普及的瓶颈。随着互联网和大数据的快速发展，网络监督图像数据作为免费的数据来源成为缓解深度学习对大数据依赖的可行解决方案，如何有效利用网络监督数据成为提升细粒度图像识别推广性和泛化性的热门课题。本文围绕细粒度图像识别主题，以网络监督数据下的细粒度识别为重点，先后对细粒度识别数据集、传统细粒度识别方法、网络监督下细粒度识别特点与方法进行介绍，并回顾了全球首届网络监督下的细粒度图像识别竞赛的相关情况及冠军解决方案。最后，在上述内容基础上总结和讨论了该领域的未来发展趋势。

Abstract

Fine-grained image recognition aims to study the visual recognition of different sub-categories at the fine-grained level under a certain traditional semantic category. In many scenarios such as smart cities

public safety

ecological protection

and agriculture

fine-grained image recognition has important scientific significance and application values. In recent years

fine-grained image recognition has made great progress with the help of deep learning

but its reliance on large-scale

high-quality of fine-grained image data has become a bottleneck restricting the promotion and popularization of fine-grained image recognition. Our research focuses on the traditional fine-grained image recognition

fine-grained recognition under the webly-supervised setting

and the features and methods of fine-grained recognition datasets

and the challenge and approaches for the webly-supervised fine-grained recognition. Our research develops the traditional fine-grained image recognition datasets

the traditional webly-supervised image recognition datasets and the webly-supervised fine-grained image recognition datasets

respectively. Specifically

the webly-supervised datasets have the similar attributes of large intra class differences and small inter class differences in the traditional fine-grained datasets. Meanwhile

the webly-supervised datasets have the challenges on noises

data bias and long-tailed distribution. Regarding the traditional fine-grained recognition

there are 3 core paradigms to resolve vision issue. The first is fine-grained image recognition based on localization-classification sub-networks. The second one is fine-grained image recognition via end-to-end feature encoding. The final is fine-grained image recognition derived of external information. Due to the data in the webly-supervision datasets is obtained from the internet

there exists a lot of noise data. Noise data can affect the training of the deep models. Regarding noise data

it can be segmented into 2 categories like irrelevant data and ambiguous data. Irrelevant data refers to the data error that has unknown categories like maps

tables and article screenshots. Ambiguous data refers to the image objects related to tag categories and others. There are 2 kind of problem solving of noise data

i.e.

clustering and cross validation. Our research introduces the key clustering methods

analyzes their advantages and disadvantages

and discusses the results and possibilities of these methods in webly-supervised fine-grained images. For cross validation

our demonstration proposed a brief introduction to traditional cross validation and illustrated a customized cross validation method used in the ACCV(Asian Conference on Computer Vision) WebFG(the webly-supervised fine-grained image recognition) 2020 competition. In the internet

data is generated/uploaded via users with their own perceptions. In this process

the data bias factors affected by various factors in the context of culture

politics and environment. Due to the similarity between fine-grained categories

the problem of data bias is particularly dominated in fine-grained datasets. The main data bias deducted method are knowledge distillation

label smoothing and data enhancement. The data bias in the webly-supervision datasets will affect the training of the model

and the dark knowledge generated in the knowledge distillation can release the data bias. There are 3 learning schemes of knowledge distillation in related to offline distillation

online distillation and self-distillation. Label smoothing can reduce label cost of the model and conduct the data error alarming

it can also release the data error on the model training. Due to the data bias

the quantity and quality of data cannot be guaranteed. An effective way to alleviate the data bias is via the number of samples increase in the dataset. However

the accuracy of manual introduction of data cannot be guaranteed due to the small difference between fine-grained categories. Data enhancement becomes an effective method to handle data bias of fine-grained dataset. For fine-grained categories

only a small number of categories are commonly seen in daily life

and there are many fine-grained categories that cannot be seen in daily life. The internet can truly reflect the state of natural life

so the long-tailed distribution on the internet is also a challenge to deal with the real scenario from internet. In general

the main solutions of long-tailed distribution recognition are resampling

reweighting and novel network structures. More specifically

resampling refers to the reverse weighting of different categories of images in accordance with the number of samples

which leads to 2 methods

including under sampling of the head category and over sampling of the tail category. Reweighting is mainly reflected in the loss function. The specific operation is to add a larger penalty weight to the loss function of the tail category. A novel of network structure can decouple the network and train it each and decompose the learning based process into representation and classifier. These experiments illustrate the demonstrated results have their priorities in resampling and reweighting. In particular

our research reviews and discusses the relevant situation and champion solutions of WebFG the world's first webly-supervised fine-grained image recognition competition

held with Nanjing University of Science and Technology as well.

关键词

网络监督细粒度图像识别噪声数据长尾分布类间差异小综述

Keywords

webly-supervisedfine-grained image recognitionnoise datalong-tailed distributionsmall inter-class variancereview

references

Aodha O M, Cole E and Perona P. 2019. Presence-only geographical priors for fine-grained image classification//Proceedings of 2019IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 9595-9605 [DOI: 10.1109/ICCV.2019.00969http://dx.doi.org/10.1109/ICCV.2019.00969]

Barbará D and Chen P. 2000. Using the fractal dimension to cluster datasets//Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Boston, USA: ACM: 260-264. [DOI: 10.1145/347090.347145http://dx.doi.org/10.1145/347090.347145]

Belongie S. 2017. Fine-grained visual category recognition and perceptual embedding//New York R Conference. Delivered by Serge Belongie (Cornell University) at the 2017 New York R Conference on April 21st and 22nd at Work-Bench[EB/OL]. [2021-03-31].https://www.youtube.com/watch?v=mD5cuMza6Rchttps://www.youtube.com/watch?v=mD5cuMza6Rc

Berg T, Liu J X, Lee S W, Alexander M L, Jacobs D W and Belhumeur P N. 2014. Birdsnap: large-scale fine-grained visual categorization of birds//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, USA: IEEE: 2019-2026 [DOI: 10.1109/CVPR.2014.259http://dx.doi.org/10.1109/CVPR.2014.259]

Bossard L, Guillaumin M and van Gool L. 2014. Food-101-mining discriminative components with random forests//Proceedings of the 13th European Conference on Computer Vision. Zurich, Switzerland: Springer: 446-461 [DOI: 10.1007/978-3-319-10599-4_29http://dx.doi.org/10.1007/978-3-319-10599-4_29]

Branson S, van Horn G, Belongie S and Perona P. 2014. Bird species categorization using pose normalized deep convolutional nets[EB/OL]. [2021-03-31].https://arxiv.org/pdf/1406.2952.pdfhttps://arxiv.org/pdf/1406.2952.pdf

Cao K D, Wei C, Gaidon A, Arechiga N and Ma T Y. 2019. Learning imbalanced datasets with label-distribution-aware margin loss//Proceedings of the 33rd International Conference on Neural Information Processing Systems. Vancouver, Canada: Curran Associates Inc. : #140

Chawla N V, Bowyer K W, Hall L O and Kegelmeyer W P. 2002. SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16(1): 321-357 [DOI: 10.1613/jair.953]

Chua T S, Tang J H, Hong R C, Li H J, Luo Z P and Zheng Y T. 2007. NUS-Wide: a real-world web image database from national university of Singapore//Proceedings of the ACM International Conference on Image and Video Retrieval. Santorini, Greece: ACM: #48 [DOI: 10.1145/1646396.1646452http://dx.doi.org/10.1145/1646396.1646452]

Cimpoi M, Maji S and Vedaldi A. 2015. Deep filter banks for texture recognition and segmentation//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 3828-3836 [DOI: 10.1109/CVPR.2015.7299007http://dx.doi.org/10.1109/CVPR.2015.7299007]

Corizzo R, Pio G, Ceci M and Malerba D. 2019. DENCAST: distributed density-based clustering for multi-target regression. Journal of Big Data, 6(1): #43 [DOI: 10.1186/s40537-019-0207-2]

Cudeck R and Browne M W. 1983. Cross-validation of covariance structures. Multivariate Behavioral Research, 18(2): 147-167 [DOI: 10.1207/s15327906mbr1802_2]

Cui Q, Jiang Q Y, Wei X S, Li W J and Yoshie O. 2020. ExchNet: a unified hashing network for large-scale fine-grained image retrieval//Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer: 189-205 [DOI: 10.1007/978-3-030-58580-8_12http://dx.doi.org/10.1007/978-3-030-58580-8_12]

Cui Y, Jia M L, Lin T Y, Song Y and Belongie S. 2019. Class-Balanced loss based on effective number of samples//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE: 9260-9269 [DOI: 10.1109/CVPR.2019.00949http://dx.doi.org/10.1109/CVPR.2019.00949]

Cui Y, Zhou F, Lin Y Q and Belongie S. 2016. Fine-grained categorization and dataset bootstrapping using deep metric learning with humans in the loop//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 1153-1162 [DOI: 10.1109/CVPR.2016.130http://dx.doi.org/10.1109/CVPR.2016.130]

Cui Y, Zhou F, Wang J, Liu X, Lin Y Q and Belongie S. 2017. Kernel pooling for convolutional neural networks//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 3049-3058 [DOI: 10.1109/CVPR.2017.325http://dx.doi.org/10.1109/CVPR.2017.325]

Ding Y, Zhou Y Z, Zhu Y, Ye Q X and Jiao J B. 2019. Selective sparse sampling for fine-grained image recognition//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 6598-6607 [DOI: 10.1109/ICCV.2019.00670http://dx.doi.org/10.1109/ICCV.2019.00670]

Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X H, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Uszkoreit J and Houlsby N. 2021. An image is worth 16×16 words: transformers for image recognition at scale[EB/OL]. [2021-04-02].https://arxiv.org/pdf/2010.11929.pdfhttps://arxiv.org/pdf/2010.11929.pdf

Dubey A, Gupta O, Guo P, Raskar R, Farrell R and Naik N. 2018. Pairwise confusion for fine-grained visual classification//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 71-88 [DOI: 10.1007/978-3-030-01258-8_5http://dx.doi.org/10.1007/978-3-030-01258-8_5]

Fisher D H. 1987. Knowledge acquisition via incremental conceptual clustering. Machine Learning, 2(2): 139-172 [DOI: 10.1007/BF00114265]

Follmann P, Böttger T, Härtinger P, König R and Ulrich M. 2018. MVTec D2S: densely segmented supermarket dataset//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 581-597 [DOI: 10.1007/978-3-030-01249-6_35http://dx.doi.org/10.1007/978-3-030-01249-6_35]

Fu J L, Zheng H L and Mei T. 2017. Look closer to see better: recurrent attention convolutional neural network for fine-grained image recognition//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 4438-4446 [DOI: 10.1109/CVPR.2017.476http://dx.doi.org/10.1109/CVPR.2017.476]

Gao Y, Beijbom O, Zhang N and Darrell T. 2016. Compact bilinear pooling//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 317-326 [DOI: 10.1109/CVPR.2016.41http://dx.doi.org/10.1109/CVPR.2016.41]

Ge W F, Lin X R and Yu Y Z. 2019. Weakly supervised complementary parts models for fine-grained image classification from the bottom up//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 3029-3038 [DOI: 10.1109/CVPR.2019.00315http://dx.doi.org/10.1109/CVPR.2019.00315]

Girshick R, Donahue J, Darrell T and Malik J. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, USA: IEEE: 580-587 [DOI: 10.1109/CVPR.2014.81http://dx.doi.org/10.1109/CVPR.2014.81]

Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A and Bengio Y. 2014. Generative adversarial nets//Proceedings of the 27th International Conference on Neural Information Processing Systems. Cambridge, UK: MIT Press: 2672-2680

Guha S, Rastogi R and Shim K. 1998. CURE: an efficient clustering algorithm for large databases//Proceedings of 1998 ACM SIGMOD International Conference on Management of Data. Seattle, USA: ACM: 73-84 [DOI: 10.1145/276304.276312http://dx.doi.org/10.1145/276304.276312]

He J, Chen J N, Liu S, Kortylewski A, Yang C, Bai Y T, Wang C H and Yuille A. 2021. TransFG: a transformer architecture for fine-grained recognition [EB/OL]. [2021-03-31].https://arxiv.org/pdf/2103.07976.pdfhttps://arxiv.org/pdf/2103.07976.pdf

He K M, Zhang X Y, Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 770-778 [DOI: 10.1109/CVPR.2016.90http://dx.doi.org/10.1109/CVPR.2016.90]

He X T and Peng Y X. 2017. Weakly supervised learning of part selection model with spatial constraints for fine-grained image classification//Proceedings of the 31st AAAI Conference on Artificial Intelligence. San Francisco, USA: AAAI Press: 4075-4081

Hinton G, Vinyals O and Dean J. 2015. Distilling the knowledge in a neural network [EB/OL]. [2021-03-31].https://arxiv.org/pdf/1503.02531.pdfhttps://arxiv.org/pdf/1503.02531.pdf

Hou S H, Feng Y S and Wang Z L. 2017. VegFru: a domain-specific dataset for fine-grained visual categorization//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 541-549 [DOI: 10.1109/ICCV.2017.66http://dx.doi.org/10.1109/ICCV.2017.66]

Huang Z X and Li Y. 2020. Interpretable and accurate fine-grained recognition via region grouping//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 8659-8669 [DOI: 10.1109/CVPR42600.2020.00869http://dx.doi.org/10.1109/CVPR42600.2020.00869]

Inoue H. 2018. Data augmentation by pairing samples for images classification [EB/OL]. [2022-04-21].https://arxiv.org/pdf/1801.02929.pdfhttps://arxiv.org/pdf/1801.02929.pdf

Itti L, Koch C and Niebur E. 1998. A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(11): 1254-1259 [DOI: 10.1109/34.730558]

Jaccard N, Rogers T W, Morton E J and Griffin L D. 2017. Detection of concealed cars in complex cargo X-ray imagery using deep learning. Journal of X-ray Science and Technology, 25(3): 323-339 [DOI: 10.3233/XST-16199]

Jaderberg M, Simonyan K, Zisserman A and Kavukcuoglu K. 2016. Spatial transformer networks [EB/OL]. [2021-04-02].https://arxiv.org/pdf/1506.02025.pdfhttps://arxiv.org/pdf/1506.02025.pdf

Jafarzadegan M, Safi-Esfahani F and Beheshti Z. 2019. Combining hierarchical clustering approaches using the PCA method. Expert Systems with Applications, 137: 1-10 [DOI: 10.1016/j.eswa.2019.06.064]

Ji R Y, Wen L Y, Zhang L B, Du D W, Wu Y J, Zhao C, Liu X L and Huang F Y. 2020. Attention convolutional binary neural tree for fine-grained visual categorization//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 10465-10474 [DOI: 10.1109/CVPR42600.2020.01048http://dx.doi.org/10.1109/CVPR42600.2020.01048]

Jia D, Krause J, Stark M and Li F F. 2016. Leveraging the wisdom of the crowd for fine-grained recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(4): 666-676 [DOI: 10.1109/TPAMI.2015.2439285]

Jo T and Japkowicz N. 2004. Class imbalances versus small disjuncts. ACM SIGKDD Explorations Newsletter, 6(1): 40-49 [DOI: 10.1145/1007730.1007737]

Kang B Y, Xie S N, Rohrbach M, Yan Z C, Gordo A, Feng J S and Kalantidis Y. 2020. Decoupling representation and classifier for long-tailed recognition [EB/OL]. [2021-03-31].https://arxiv.org/pdf/1910.09217.pdfhttps://arxiv.org/pdf/1910.09217.pdf

Kang G L, Dong X Y, Zheng L and Yang Y. 2017. Patchshuffle regularization [EB/OL]. [2022-04-21].https://arxiv.org/pdf/1707.07103.pdfhttps://arxiv.org/pdf/1707.07103.pdf

Khosla A, Jayadevaprakash N, Yao B P and Li F F. 2011. Novel dataset for fine-grained image categorization: Stanford dogs//Proceedings of the 1st Workshop on Fine-Grained Visual Categorization (FGVC), IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Springs, USA: IEEE: 806-813

Kohavi R. 1995. A study of cross-validation and bootstrap for accuracy estimation and model selection//Proceedings of the 14th International Joint Conference on Artificial Intelligence. Montreal, Canada: Morgan Kaufmann Publishers Inc. : 1137-1143

Kong S and Fowlkes C. 2017. Low-rank bilinear pooling for fine-grained classification//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 7025-7034 [DOI: 10.1109/CVPR.2017.743http://dx.doi.org/10.1109/CVPR.2017.743]

Krasin I, Duerig T, Alldrin N, Veit A, Abu-El-Haija S, Belongie S, Cai D, Feng Z Y, Ferrari V, Gomes V and Gupta A. 2016. OpenImages: a public dataset for large-scale multi-label and multi-class image classification [EB/OL]. [2022-04-21].https://github.com/openimageshttps://github.com/openimages

Krause J, Sapp B, Howard A, Zhou H, Toshev A, Duerig T, Philbin J and Li F F. 2016. The unreasonable effectiveness of noisy data for fine-grained recognition//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands: Springer: 301-320 [DOI: 10.1007/978-3-319-46487-9_19http://dx.doi.org/10.1007/978-3-319-46487-9_19]

Krause J, Stark M, Deng J and Li F F. 2013. 3D object representations for fine-grained categorization//Proceedings of 2013 IEEE International Conference on Computer Vision workshops. Sydney, Australia: IEEE: 554-561 [DOI: 10.1109/iccvw.2013.77http://dx.doi.org/10.1109/iccvw.2013.77]

Kriegel H P, Kröger P and Zimek A. 2009. Clustering high-dimensional data: a survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Transactions on Knowledge Discovery from Data, 3(1): #1 [DOI: 10.1145/1497577.1497578]

Kriegel H P, Kröger P, Sander J and Zimek A. 2011. Density-based clustering. WIREs Data Mining and Knowledge Discovery, 1(3): 231-240 [DOI: 10.1002/widm.30]

Krizhevsky A, Sutskever I and Hinton G E. 2012. ImageNet classification with deep convolutional neural networks//Proceedings of the 25th International Conference on Neural Information Processing Systems. Lake Tahoe, USA: Curran Associates Inc. : 1097-1105

Kubat M and Matwin S. 1997. Addressing the curse of imbalanced training sets: one-sided selection//Proceedings of the 14th International Conference on Machine Learning. Nashville, USA: ICML: 179-186

Lam M, Mahasseni B and Todorovic S. 2017. Fine-grained recognition as HSnet search for informative image parts//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 6497-6506 [DOI: 10.1109/CVPR.2017.688http://dx.doi.org/10.1109/CVPR.2017.688]

Lan X, Zhu X T and Gong S G. 2018. Knowledge distillation by on-the-fly native ensemble//Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montréal, Canada: Curran Associates Inc. : 7528-7538

Lehmann J, Isele R, Jakob M, Jentzsch A, Kontokostas D, Mendes P N, Hellmann S, Morsey M, Kleef P V, Auer S and Bizer C. 2015. DBpedia-A large-scale, multilingual knowledge base extracted from Wikipedia. Semantic Web, 6(2): 167-195 [DOI: 10.3233/SW-140134]

Li H, Liu X J, Li T and Gan R D. 2020a. A novel density-based clustering algorithm using nearest neighbor graph. Pattern Recognition, 102: #107206 [DOI: 10.1016/j.patcog.2020.107206http://dx.doi.org/10.1016/j.patcog.2020.107206]

Li T H, Li J G, Liu Z and Zhang C S. 2020b. Few sample knowledge distillation for efficient network compression//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 14627-14635 [DOI: 10.1109/CVPR42600.2020.01465http://dx.doi.org/10.1109/CVPR42600.2020.01465]

Li W, Wang L M, Li W, Agustsson E and van Gool L. 2017. WebVision database: visual learning and understanding from web data [EB/OL]. [2021-03-31].https://arxiv.org/pdf/1708.02862.pdfhttps://arxiv.org/pdf/1708.02862.pdf

Liang D J, Yang F, Zhang T and Yang P. 2018. Understanding mixup training methods. IEEE Access, 6: 58774-58783 [DOI: 10.1109/ACCESS.2018.2872698]

Liu C B, Xie H T, Zha Z J, Ma L F, Yu L Y and Zhang Y D. 2020. Filtration and distillation: enhancing region attention for fine-grained visual categorization//Proceedings of the 34th AAAI Conference on Artificial Intelligence. Palo Alto, USA: AAAI Press: 11555-11562 [DOI: 10.1609/aaai.v34i07.6822http://dx.doi.org/10.1609/aaai.v34i07.6822]

Liu L Q, Shen C H and Van Den Hengel A. 2015. The treasure beneath convolutional layers: cross-convolutional-layer pooling for image classification//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 4749-4757 [DOI: 10.1109/CVPR.2015.7299107http://dx.doi.org/10.1109/CVPR.2015.7299107]

MacQueen J. 1967. Some methods for classification and analysis of multivariate observations//Le Cam L M, Neyman J, eds. The 5th Berkeley Symposium on Mathematical Statistics and Probability. Berkeley, USA: University of California Press: 281-297

Maji S, Rahtu E, Kannala J, Blaschko M and Vedaldi A. 2013. Fine-grained visual classification of aircraft [EB/OL]. [2021-03-31].https://arxiv.org/pdf/1306.5151.pdfhttps://arxiv.org/pdf/1306.5151.pdf

Mandelbrot B B. 1983. An explicit fractal model of percolation clusters//Percolation Structures and Processes. Israel Physical Society

Miao C J, Xie L X, Wan F, Su C, Liu H Y, Jiao J B and Ye Q X. 2019. SIXray: a large-scale security inspection X-ray benchmark for prohibited item discovery in overlapping images//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 2114-2123 [DOI: 10.1109/CVPR.2019.00222http://dx.doi.org/10.1109/CVPR.2019.00222]

Nilsback M E and Zisserman A. 2008. Automated flower classification over a large number of classes//Proceedings of the 6th Indian Conference on Computer Vision, Graphics and Image Processing. Bhubaneswar, India: IEEE: 722-729 [DOI: 10.1109/ICVGIP.2008.47http://dx.doi.org/10.1109/ICVGIP.2008.47]

Niu L, Veeraraghavan A and Sabharwal A. 2018. Webly supervised learning meets zero-shot learning: a hybrid approach for fine-grained classification//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 7171-7180 [DOI: 10.1109/CVPR.2018.00749http://dx.doi.org/10.1109/CVPR.2018.00749]

Passalis N and Tefas A. 2018. Learning deep representations with probabilistic knowledge transfer//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 268-284 [DOI: 10.1007/978-3-030-01252-6_17http://dx.doi.org/10.1007/978-3-030-01252-6_17]

Peng Y X, He X T and Zhao J J. 2018. Object-part attention model for fine-grained image classification. IEEE Transactions on Image Processing, 27(3): 1487-1500 [DOI: 10.1109/TIP.2017.2774041]

Pleiss G, Zhang T Y, Elenberg E and Weinberger K Q. 2020. Identifying mislabeled data using the area under the margin ranking [EB/OL]. [2021-03-31].https://arxiv.org/pdf/2001.10528.pdfhttps://arxiv.org/pdf/2001.10528.pdf

Reed S, Akata Z, Lee H and Schiele B. 2016. Learning deep representations of fine-grained visual descriptions//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 49-58 [DOI: 10.1109/CVPR.2016.13http://dx.doi.org/10.1109/CVPR.2016.13]

Ren Z W and Sun Q S. 2021. Simultaneous global and local graph structure preserving for multiple kernel clustering. IEEE Transactions on Neural Networks and Learning Systems, 32(5): 1839-1851 [DOI: 10.1109/TNNLS.2020.2991366]

Romero A, Ballas N, Kahou S E, Chassang A, Gatta C and Bengio Y. 2015. FitNets: hints for thin deep nets [EB/OL]. [2021-03-31].https://arxiv.org/pdf/1412.6550.pdfhttps://arxiv.org/pdf/1412.6550.pdf

Schölkopf B, Knirsch P, Smola A and Burges C. 1998. Fast approximation of support vector kernel expansions, and an interpretation of clustering as approximation in feature spaces//Levi P, Schanz M, Ahlers R J and May F, eds. Mustererkennung. Berlin, Germany: Springer: 125-132 [DOI: 10.1007/978-3-642-72282-0_12http://dx.doi.org/10.1007/978-3-642-72282-0_12]

Shen L, Lin Z C and Huang Q M. 2016. Relay backpropagation for effective learning of deep convolutional neural networks//Proceedings of the 14th European Conference on Computer Vision-ECCV 2016. Amsterdam, the Netherlands: Springer: 467-482 [DOI: 10.1007/978-3-319-46478-7_29http://dx.doi.org/10.1007/978-3-319-46478-7_29]

Steinbach M, Karypis G and Kumar V. 2000. A comparison of document clustering techniques [EB/OL]. [2021-03-01]. The University of Minnesota Digital Conservancy.https://hdl.handle.net/11299/215421https://hdl.handle.net/11299/215421

Stone M. 1974. Cross-validatory choice and assessment of statistical predictions. Journal of the Royal Statistical Society: Series B (Methodological), 36(2): 111-133 [DOI: 10.1111/j.2517-6161.1974.tb00994.x]

Sun G L, Cholakkal H, Khan S, Khan F and Shao L. 2020. Fine-grained recognition: accounting for subtle differences between similar classes//Proceedings of the 34th AAAI Conference on Artificial Intelligence. Palo Alto, USA: AAAI Press: 12047-12054 [DOI: 10.1609/aaai.v34i07.6882http://dx.doi.org/10.1609/aaai.v34i07.6882]

Sun M, Yuan Y C, Zhou F and Ding E R. 2018. Multi-attention multi-class constraint for fine-grained image recognition//Proceedings of the 15thEuropean Conference on Computer Vision. Munich, Germany: Springer: 834-850 [DOI: 10.1007/978-3-030-01270-0_49http://dx.doi.org/10.1007/978-3-030-01270-0_49]

Tan M X and Le Q V. 2019. EfficientNet: rethinking model scaling for convolutional neural networks//Proceedings of the 36th International Conference on Machine Learning. Long Beach, USA: PMLR: 6105-6114

Tasoulis D K and Vrahatis M N. 2004. Unsupervised distributed clustering//Proceedings of the IASTED International Conference on Parallel and Distributed Computing and Networks. Innsbruck, Austria: IASTED/ACTA Press: 347-351

van Horn G, Aodha O M, Song Y, Cui Y, Sun C, Shepard A, Adam H, Perona P and Belongie S. 2018. The inaturalist species classification and detection dataset//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 8769-8778 [DOI: 10.1109/CVPR.2018.00914http://dx.doi.org/10.1109/CVPR.2018.00914]

van Horn G, Branson S, Farrell R, Haber S, Barry J, Ipeirotis P, Perona P and Belongie S. 2015. Building a bird recognition app and large scale dataset with citizen scientists: the fine print in fine-grained dataset collection//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 595-604 [DOI: 10.1109/CVPR.2015.7298658http://dx.doi.org/10.1109/CVPR.2015.7298658]

Wah C, Branson S, Welinder P, Perona P and Belongie S. 2011. The Caltech-UCSD birds-200-2011 dataset. California Institute of Technology

Wang J and Perez L. 2017. The effectiveness of data augmentation in image classification using deep learning [EB/OL]. [2022-04-21].https://arxiv.org/pdf/1712.04621.pdfhttps://arxiv.org/pdf/1712.04621.pdf

Wang L, Zhang J J, Zhou L P, Tang C and Li W Q. 2015. Beyond covariance: feature representation with nonlinear kernel matrices//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 4570-4578 [DOI: 10.1109/ICCV.2015.519http://dx.doi.org/10.1109/ICCV.2015.519]

Wang W, Yang J and Muntz R R. 1997. STING: a statistical information grid approach to spatial data mining//Proceedings of the 23rd International Conference on Very Large Data Bases. San Francisco, USA: Morgan Kaufmann Publishers Inc. : 186-195

Wang Y M, Morariu V I and Davis L S. 2018. Learning a discriminative filter bank within a CNN for fine-grained recognition//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 4148-4157 [DOI: 10.1109/CVPR.2018.00436http://dx.doi.org/10.1109/CVPR.2018.00436]

Wang Z H, Wang S J, Li H J, Dou Z and Li J J. 2020. Graph-propagation based correlation learning for weakly supervised fine-grained image classification//Proceedings of the 34th AAAI Conference on Artificial Intelligence. Palo Alto, USA: AAAI Press: 12289-12296 [DOI: 10.1609/aaai.v34i07.6912http://dx.doi.org/10.1609/aaai.v34i07.6912]

Wei X S, Cui Q, Yang L, Wang P and Liu L Q. 2019a. RPC: a large-scale retail product checkout dataset [EB/OL]. [2021-03-31].https://arxiv.org/pdf/1901.07249.pdfhttps://arxiv.org/pdf/1901.07249.pdf

Wei X S, Wu J X and Cui Q. 2019b. Deep learning for fine-grained image analysis: a survey [EB/OL]. [2021-03-31].https://arxiv.org/pdf/1907.03069.pdfhttps://arxiv.org/pdf/1907.03069.pdf

Wei X S, Xie C W, Wu J X and Shen C H. 2018. Mask-CNN: localizing parts and selecting descriptors for fine-grained bird species categorization. Pattern Recognition, 76: 704-714 [DOI: 10.1016/j.patcog.2017.10.002]

Wei X S, Xu Y Y, Yao Y Z, Wei J, Xi S, Xu W Y, Zhang W D, Lyu X X, Fu D P, Li Q, Chen B Y, Guo H J, Xue T L, Jing H P, Wang Z H, Zhang T M and Zhang M W. 2020a. Tips and tricks for webly-supervised fine-grained recognition: learning from the WebFG 2020 challenge [EB/OL]. [2020-03-31].https://arxiv.org/pdf/2012.14672.pdfhttps://arxiv.org/pdf/2012.14672.pdf

Wei Y C, Tran S, Xu S X, Kang B and Springer M. 2020b. Deep learning for retail product recognition: challenges and techniques. Computational Intelligence and Neuroscience, 2020: #8875910 [DOI: 10.1155/2020/8875910]

Xiao T J, Xu Y C, Yang K Y, Zhang J X, Peng Y X and Zhang Z. 2015. The application of two-level attention models in deep convolutional neural network for fine-grained image classification//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 842-850 [DOI: 10.1109/CVPR.2015.7298685http://dx.doi.org/10.1109/CVPR.2015.7298685]

Xiong W, He Y T, Zhang Y X, Luo W H, Ma L and Luo J B. 2020. Fine-grained image-to-image transformation towards visual recognition//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 5839-5848 [DOI: 10.1109/CVPR42600.2020.00588http://dx.doi.org/10.1109/CVPR42600.2020.00588]

Xu Z, Yang Y and Hauptmann A G. 2015. A discriminative CNN video representation for event detection//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 1798-1807 [DOI: 10.1109/CVPR.2015.7298789http://dx.doi.org/10.1109/CVPR.2015.7298789]

Yan Y C, Ni B B and Yang X K. 2017. Fine-grained recognition via attribute-guided attentive feature aggregation//Proceedings of the 25th ACM International Conference on Multimedia. Mountain View, USA: ACM: 1032-1040 [DOI: 10.1145/3123266.3123358http://dx.doi.org/10.1145/3123266.3123358]

Yang C L, Xie L X, Su C and Yuille A L. 2019. Snapshot distillation: Teacher-student optimization in one generation//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 2854-2863 [DOI: 10.1109/CVPR.2019.00297http://dx.doi.org/10.1109/CVPR.2019.00297]

Yim J, Joo D, Bae J and Kim J. 2017. A gift from knowledge distillation: fast optimization, network minimization and transfer learning//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 7130-7138 [DOI: 10.1109/CVPR.2017.754http://dx.doi.org/10.1109/CVPR.2017.754]

Yin J H, Wu A C and Zheng W S. 2020. Fine-grained person re-identification. International Journal of Computer Vision, 128(6): 1654-1672 [DOI: 10.1007/s11263-019-01259-0]

Yu C J, Zhao X Y, Zheng Q, Zhang P and You X G. 2018. Hierarchical bilinear pooling for fine-grained visual recognition//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 595-610 [DOI: 10.1007/978-3-030-01270-0_35http://dx.doi.org/10.1007/978-3-030-01270-0_35]

Zeiler M D and Fergus R. 2014. Visualizing and understanding convolutional networks//Proceedings of the 13th European Conference on Computer Vision. Zurich, Switzerland: Springer: 818-833 [DOI: 10.1007/978-3-319-10590-1_53http://dx.doi.org/10.1007/978-3-319-10590-1_53]

Zhang H Y, Cisse M, Dauphin Y N and Lopez-Paz D. 2018c. mixup: beyond empirical risk minimization//Proceedings of the 6th International Conference on Learning Representations. Vancouver, Canada: [s. n.]

Zhang L F, Song J B, Gao A N, Chen J W, Bao C L and Ma K S. 2019. Be your own teacher: improve the performance of convolutional neural networks via self distillation//Proceedings of 2019 IEEE/CVF Conference on International Conference on Computer Vision. Seoul, Korea (South): IEEE: 3712-3721 [DOI: 10.1109/ICCV.2019.00381http://dx.doi.org/10.1109/ICCV.2019.00381]

Zhang N, Donahue J, Girshick R and Darrell T. 2014. Part-based R-CNNs for fine-grained category detection//Proceedings of the 13th European Conference on Computer Vision. Zurich, Switzerland: Springer: 834-849 [DOI: 10.1007/978-3-319-10590-1_54http://dx.doi.org/10.1007/978-3-319-10590-1_54]

Zhang X P, Xiong H K, Zhou W G, Lin W Y and Tian Q. 2016a. Picking deep filter responses for fine-grained image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 1134-1142 [DOI: 10.1109/CVPR.2016.128http://dx.doi.org/10.1109/CVPR.2016.128]

Zhang Y, Wei X S, Wu J X, Cai J F, Lu J B, Nguyen V A and Do M N. 2016b. Weakly supervised fine-grained categorization with part-based image representation. IEEE Transactions on Image Processing, 25(4): 1713-1725 [DOI: 10.1109/TIP.2016.2531289]

Zhang Y, Xiang T, Hospedales T M and Lu H C. 2018b. Deep mutual learning//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 4320-4328 [DOI: 10.1109/CVPR.2018.00454http://dx.doi.org/10.1109/CVPR.2018.00454]

Zhang Y B, Tang H and Jia K. 2018a. Fine-grained visual categorization using meta-Learning optimization with sample selection of auxiliary data//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 241-256 [DOI: 10.1007/978-3-030-01237-3_15http://dx.doi.org/10.1007/978-3-030-01237-3_15]

Zheng H L, Fu J L, Mei T and Luo J B. 2017. Learning multi-attention convolutional neural network for fine-grained image recognition//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 5219-5227 [DOI: 10.1109/ICCV.2017.557http://dx.doi.org/10.1109/ICCV.2017.557]

Zheng H L, Fu J L, Zha Z J and Luo J B. 2019. Looking for the devil in the details: learning trilinear attention sampling network for fine-grained image recognition//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 5007-5016 [DOI: 10.1109/CVPR.2019.00515http://dx.doi.org/10.1109/CVPR.2019.00515]

Zheng H L, Fu J L, Zha Z J, Luo J B and Mei T. 2020. Learning rich part hierarchies with progressive attention networks for fine-grained image recognition. IEEE Transactions on Image Processing, 29: 476-488 [DOI: 10.1109/TIP.2019.2921876]

Zhou B Y, Cui Q, Wei X S and Chen Z M. 2020. BBN: bilateral-branch network with cumulative learning for long-tailed visual recognition//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 9716-9725 [DOI: 10.1109/CVPR42600.2020.00974http://dx.doi.org/10.1109/CVPR42600.2020.00974]

Zhou F and Lin Y Q. 2016. Fine-grained image classification by exploring bipartite-graph labels//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 1124-1133 [DOI: 10.1109/CVPR.2016.127http://dx.doi.org/10.1109/CVPR.2016.127]

Zhu J Y, Park T, Isola P and Efros A A. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 2242-2251 [DOI: 10.1109/ICCV.2017.244http://dx.doi.org/10.1109/ICCV.2017.244]

Zhuang B H, Liu L Q, Li Y, Shen C H and Reid I. 2017. Attend in groups: a weakly-supervised deep learning framework for learning from web data//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 2915-2924 [DOI: 10.1109/CVPR.2017.311http://dx.doi.org/10.1109/CVPR.2017.311]

Zhuang P Q, Wang Y L and Qiao Y. 2020. Learning attentive pairwise interaction for fine-grained classification//Proceedings of the 34th AAAI Conference on Artificial Intelligence. Palo Alto, USA: AAAI Press: 13130-13137 [DOI: 10.1609/aaai.v34i07.7016http://dx.doi.org/10.1609/aaai.v34i07.7016]

文章被引用时，请邮件提醒。

提交