深度学习图像数据增广方法研究综述

马岽奡; 唐娉; 赵理君; 张正

发布时间： 2021-03-19
摘要点击次数： 21175
全文下载次数： 6001
DOI: 10.11834/jig.200089
2021 | Volume 26 | Number 3

深度学习图像数据增广方法研究综述

马岽奡, 唐娉, 赵理君, 张正(中国科学院空天信息创新研究院, 北京 100094)

摘要

数据作为深度学习的驱动力，对于模型的训练至关重要。充足的训练数据不仅可以缓解模型在训练时的过拟合问题，而且可以进一步扩大参数搜索空间，帮助模型进一步朝着全局最优解优化。然而，在许多领域或任务中，获取到充足训练样本的难度和代价非常高。因此，数据增广成为一种常用的增加训练样本的手段。本文对目前深度学习中的图像数据增广方法进行研究综述，梳理了目前深度学习领域为缓解模型过拟合问题而提出的各类数据增广方法，按照方法本质原理的不同，将其分为单数据变形、多数据混合、学习数据分布和学习增广策略等4类方法，并以图像数据为主要研究对象，对各类算法进一步按照核心思想进行细分，并对方法的原理、适用场景和优缺点进行比较和分析，帮助研究者根据数据的特点选用合适的数据增广方法，为后续国内外研究者应用和发展研究数据增广方法提供基础。针对图像的数据增广方法，单数据变形方法主要可以分为几何变换、色域变换、清晰度变换、噪声注入和局部擦除等5种；多数据混合可按照图像维度的混合和特征空间下的混合进行划分；学习数据分布的方法主要基于生成对抗网络和图像风格迁移的应用进行划分；学习增广策略的典型方法则可以按照基于元学习和基于强化学习进行分类。目前，数据增广已然成为推进深度学习在各领域应用的一项重要技术，可以很有效地缓解训练数据不足带来的深度学习模型过拟合的问题，进一步提高模型的精度。在实际应用中可根据数据和任务的特点选择和组合最合适的方法，形成一套有效的数据增广方案，进而为深度学习方法的应用提供更强的动力。在未来，根据数据和任务基于强化学习探索最优的组合策略，基于元学习自适应地学习最优数据变形和混合方式，基于生成对抗网络进一步拟合真实数据分布以采样高质量的未知数据，基于风格迁移探索多模态数据互相转换的应用，这些研究方向十分值得探索并且具有广阔的发展前景。

关键词

深度学习过拟合数据增广图像变换生成对抗网络元学习强化学习

Review of data augmentation for image in deep learning

Ma Dongao, Tang Ping, Zhao Lijun, Zhang Zheng(Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China)

Abstract

Deep learning has a tremendous influence on numerous research fields due to its outstanding performance in representing high-level feature for high-dimensional data. Especially in computer vision field, deep learning has shown its powerful abilities for various tasks such as image classification, object detection, and image segmentation. Normally, when constructing networks and using the deep learning-based method, a suitable neural network architecture is designed for our data and task, a reasonable task-oriented objective function is set, and a large amount of labeled training data is used to calculate the target loss, optimize the model parameters by the gradient descent method, and finally train an "end-to-end" deep neural network model to perform our task. Data, as the driving forces for deep learning, is areessential for training the model. With sufficient data, the overfitting problem during training can be alleviated, and the parametric search space can be expanded such that the model can be further optimized toward the global optimal solution. However, in several areas or tasks, attaining sufficient labeled samples for training a model is difficult and expensive. As a result, the overfitting problem during training occurs often and prevents deep learning models from achieving a higher performance. Thus, many methods have been proposed to address this issue, and data augmentation becomes one of the most important solutions to addressthis problem by increasing the amount and variety for the limited data set. Innumerable works have proven the effectiveness of data augmentation for improving the performance of deep learning models, which can be traced back to the seminal work of convolutional neural networks-LeNet. In this review, we examine the most representative image data augmentation methods for deep learning. This review can facilitate the researchers to adopt the appropriate methods for their task and promote the research progression of data augmentation. Current diverse data augmentation methods that can relieve the overfitting problem in deep learning models are compared and analyzed. Based on the difference of internal mechanism, a taxonomy for data augmentation methods is proposed with four classes:single data warping, multiple data mixing, learning the data distribution, and learning the augmentation strategy. First, for the image data, single data warping generates new data by image transformation over spatial space or spectral space. These methods can be divided into five categories:geometric transformations, color space transformations, sharpness transformations, noise injection, and local erasing.These methods have been widely used in image data augmentation for a long time due to their simplicity. Second, multiple data mixing can be divided according to the mixture in image space and the mixture in feature space. The mixing modes include linear mixing and nonlinear mixing for more than one image. Although mixing images seems to be a counter-intuitive method for data augmentation, experiments in many works have proven its effectiveness in improving the performance of the deep learning model. Third, the methods of learning data distribution try to capture the potential probability distribution of training data and generate new samples by sampling in that data distribution. This goal can be achieved by adversarial networks. Therefore this kind of data augmentation method is mainly based on generative adversarial network and the application of image-to-image translation. Fourth, the methods of learning augmentation strategy try to train a model to select the optimal data augmentation strategy adaptively according to the characteristics of the data or task. This goal can be achieved by metalearning, replacing data augmentation with a trainable neural network. The strategy searching problem can also be solved by reinforcement learning. When performing data augmentation in practical applications, researchers can select and combine the most suitable methods from the above methods according to the characteristics of data and tasks to form a set of effective data augmentation schemes, which in turn provides a stronger motivation for the application of deep learning methods with more effective training data. Although a better data augmentation strategy can be obtained more intelligently through learning data distribution or searching data augmentation strategies, how to customize an optimal data augmentation scheme automatically for a given task remains to be studied. In the future, conducting theoretical analysis and experimental verification of the suitability of various data augmentation methods for different data and tasks is of great research significance and application value,and will enable researchers to customize an optimal data augmentation scheme for their task. A large gap remains in applying the idea of metalearning in performing data augmentation, constructing a "data augmentation network" to learn an optimal way of data warping or data mixing. Moreover, improving the ability of generative adversarial networks(GAN)to fit the data distribution more perfectly is substantial because the oversampling in real data space should be the ideal manner of obtaining unobserved new data infinitely. The real world has numerous cross-domain and cross-modality data. The style transfer ability of encoder-decoder networks and GAN can formulate mapping functions between the different data distributions and achieve the complementation of data in different domains. Thus, exploring the application of "image-to-image translation" in different fields has bright prospects.

Keywords