Review of data augmentation for image in deep learning
Deep learning has a tremendous influence on numerous research fields due to its outstanding performance in representing high-level feature for high-dimensional data. Especially in computer vision field, deep learning has shown its powerful abilities for various tasks such as image classification, object detection, and image segmentation. Normally, when constructing networks and using the deep learning-based method, a suitable neural network architecture is designed for our data and task, a reasonable task-oriented objective function is set, and a large amount of labeled training data is used to calculate the target loss, optimize the model parameters by the gradient descent method, and finally train an "end-to-end" deep neural network model to perform our task. Data, as the driving forces for deep learning, is areessential for training the model. With sufficient data, the overfitting problem during training can be alleviated, and the parametric search space can be expanded such that the model can be further optimized toward the global optimal solution. However, in several areas or tasks, attaining sufficient labeled samples for training a model is difficult and expensive. As a result, the overfitting problem during training occurs often and prevents deep learning models from achieving a higher performance. Thus, many methods have been proposed to address this issue, and data augmentation becomes one of the most important solutions to addressthis problem by increasing the amount and variety for the limited data set. Innumerable works have proven the effectiveness of data augmentation for improving the performance of deep learning models, which can be traced back to the seminal work of convolutional neural networks-LeNet. In this review, we examine the most representative image data augmentation methods for deep learning. This review can facilitate the researchers to adopt the appropriate methods for their task and promote the research progression of data augmentation. Current diverse data augmentation methods that can relieve the overfitting problem in deep learning models are compared and analyzed. Based on the difference of internal mechanism, a taxonomy for data augmentation methods is proposed with four classes:single data warping, multiple data mixing, learning the data distribution, and learning the augmentation strategy. First, for the image data, single data warping generates new data by image transformation over spatial space or spectral space. These methods can be divided into five categories:geometric transformations, color space transformations, sharpness transformations, noise injection, and local erasing.These methods have been widely used in image data augmentation for a long time due to their simplicity. Second, multiple data mixing can be divided according to the mixture in image space and the mixture in feature space. The mixing modes include linear mixing and nonlinear mixing for more than one image. Although mixing images seems to be a counter-intuitive method for data augmentation, experiments in many works have proven its effectiveness in improving the performance of the deep learning model. Third, the methods of learning data distribution try to capture the potential probability distribution of training data and generate new samples by sampling in that data distribution. This goal can be achieved by adversarial networks. Therefore this kind of data augmentation method is mainly based on generative adversarial network and the application of image-to-image translation. Fourth, the methods of learning augmentation strategy try to train a model to select the optimal data augmentation strategy adaptively according to the characteristics of the data or task. This goal can be achieved by metalearning, replacing data augmentation with a trainable neural network. The strategy searching problem can also be solved by reinforcement learning. When performing data augmentation in practical applications, researchers can select and combine the most suitable methods from the above methods according to the characteristics of data and tasks to form a set of effective data augmentation schemes, which in turn provides a stronger motivation for the application of deep learning methods with more effective training data. Although a better data augmentation strategy can be obtained more intelligently through learning data distribution or searching data augmentation strategies, how to customize an optimal data augmentation scheme automatically for a given task remains to be studied. In the future, conducting theoretical analysis and experimental verification of the suitability of various data augmentation methods for different data and tasks is of great research significance and application value,and will enable researchers to customize an optimal data augmentation scheme for their task. A large gap remains in applying the idea of metalearning in performing data augmentation, constructing a "data augmentation network" to learn an optimal way of data warping or data mixing. Moreover, improving the ability of generative adversarial networks(GAN)to fit the data distribution more perfectly is substantial because the oversampling in real data space should be the ideal manner of obtaining unobserved new data infinitely. The real world has numerous cross-domain and cross-modality data. The style transfer ability of encoder-decoder networks and GAN can formulate mapping functions between the different data distributions and achieve the complementation of data in different domains. Thus, exploring the application of "image-to-image translation" in different fields has bright prospects.