Current Issue Cover
网络监督数据下的细粒度图像识别综述

魏秀参1,2,3, 许玉燕1,2,3, 杨健1,2,3(1. 南京理工大学计算机科学与工程学院, 南京 210094;2.
2. 高维信息智能感知与系统教育部重点实验室, 南京 210094;3.
3. 社会安全图像与视频理解江苏省重点实验室, 南京 210094)

摘 要
细粒度图像识别旨在对某一传统语义类别下细粒度级别的不同子类类别进行视觉识别,在智慧新经济和工业物联网等领域(如智慧城市、公共安全、生态保护、农业生产与安全保障)具有重要的科学意义和应用价值。细粒度图像识别在深度学习的助力下取得了长足进步,但其对大规模优质细粒度图像数据的依赖成为制约细粒度图像识别推广和普及的瓶颈。随着互联网和大数据的快速发展,网络监督图像数据作为免费的数据来源成为缓解深度学习对大数据依赖的可行解决方案,如何有效利用网络监督数据成为提升细粒度图像识别推广性和泛化性的热门课题。本文围绕细粒度图像识别主题,以网络监督数据下的细粒度识别为重点,先后对细粒度识别数据集、传统细粒度识别方法、网络监督下细粒度识别特点与方法进行介绍,并回顾了全球首届网络监督下的细粒度图像识别竞赛的相关情况及冠军解决方案。最后,在上述内容基础上总结和讨论了该领域的未来发展趋势。
关键词
Review of webly-supervised fine-grained image recognition

Wei Xiushen1,2,3, Xu Yuyan1,2,3, Yang Jian1,2,3(1. School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China;2.
2. Key Laboratory of Intelligent Perception and Systems for High-Dimensional Information of Ministry of Education, Nanjing 210094, China;3.
3. Jiangsu Key Laboratory of Image and Video Understanding for Social Security, Nanjing 210094, China)

Abstract
Fine-grained image recognition aims to study the visual recognition of different sub-categories at the fine-grained level under a certain traditional semantic category. In many scenarios such as smart cities, public safety, ecological protection, and agriculture, fine-grained image recognition has important scientific significance and application values. In recent years, fine-grained image recognition has made great progress with the help of deep learning, but its reliance on large-scale, high-quality of fine-grained image data has become a bottleneck restricting the promotion and popularization of fine-grained image recognition. Our research focuses on the traditional fine-grained image recognition, fine-grained recognition under the webly-supervised setting, and the features and methods of fine-grained recognition datasets, and the challenge and approaches for the webly-supervised fine-grained recognition. Our research develops the traditional fine-grained image recognition datasets, the traditional webly-supervised image recognition datasets and the webly-supervised fine-grained image recognition datasets, respectively. Specifically, the webly-supervised datasets have the similar attributes of large intra class differences and small inter class differences in the traditional fine-grained datasets. Meanwhile, the webly-supervised datasets have the challenges on noises, data bias and long-tailed distribution. Regarding the traditional fine-grained recognition, there are 3 core paradigms to resolve vision issue. The first is fine-grained image recognition based on localization-classification sub-networks. The second one is fine-grained image recognition via end-to-end feature encoding. The final is fine-grained image recognition derived of external information. Due to the data in the webly-supervision datasets is obtained from the internet, there exists a lot of noise data. Noise data can affect the training of the deep models. Regarding noise data, it can be segmented into 2 categories like irrelevant data and ambiguous data. Irrelevant data refers to the data error that has unknown categories like maps, tables and article screenshots. Ambiguous data refers to the image objects related to tag categories and others. There are 2 kind of problem solving of noise data, i.e., clustering and cross validation. Our research introduces the key clustering methods, analyzes their advantages and disadvantages, and discusses the results and possibilities of these methods in webly-supervised fine-grained images. For cross validation, our demonstration proposed a brief introduction to traditional cross validation and illustrated a customized cross validation method used in the ACCV(Asian Conference on Computer Vision) WebFG(the webly-supervised fine-grained image recognition) 2020 competition. In the internet, data is generated/uploaded via users with their own perceptions. In this process, the data bias factors affected by various factors in the context of culture, politics and environment. Due to the similarity between fine-grained categories, the problem of data bias is particularly dominated in fine-grained datasets. The main data bias deducted method are knowledge distillation, label smoothing and data enhancement. The data bias in the webly-supervision datasets will affect the training of the model, and the dark knowledge generated in the knowledge distillation can release the data bias. There are 3 learning schemes of knowledge distillation in related to offline distillation, online distillation and self-distillation. Label smoothing can reduce label cost of the model and conduct the data error alarming, it can also release the data error on the model training. Due to the data bias, the quantity and quality of data cannot be guaranteed. An effective way to alleviate the data bias is via the number of samples increase in the dataset. However, the accuracy of manual introduction of data cannot be guaranteed due to the small difference between fine-grained categories. Data enhancement becomes an effective method to handle data bias of fine-grained dataset. For fine-grained categories, only a small number of categories are commonly seen in daily life, and there are many fine-grained categories that cannot be seen in daily life. The internet can truly reflect the state of natural life, so the long-tailed distribution on the internet is also a challenge to deal with the real scenario from internet. In general, the main solutions of long-tailed distribution recognition are resampling, reweighting and novel network structures. More specifically, resampling refers to the reverse weighting of different categories of images in accordance with the number of samples, which leads to 2 methods, including under sampling of the head category and over sampling of the tail category. Reweighting is mainly reflected in the loss function. The specific operation is to add a larger penalty weight to the loss function of the tail category. A novel of network structure can decouple the network and train it each and decompose the learning based process into representation and classifier. These experiments illustrate the demonstrated results have their priorities in resampling and reweighting. In particular, our research reviews and discusses the relevant situation and champion solutions of WebFG the world's first webly-supervised fine-grained image recognition competition, held with Nanjing University of Science and Technology as well.
Keywords

订阅号|日报