Review of data augmentation for image in deep learning
- Vol. 26, Issue 3, Pages: 487-502(2021)
Received:03 April 2020,
Revised:2020-6-28,
Accepted:05 July 2020,
Published:16 March 2021
DOI: 10.11834/jig.200089
移动端阅览

浏览全部资源
扫码关注微信
Received:03 April 2020,
Revised:2020-6-28,
Accepted:05 July 2020,
Published:16 March 2021
移动端阅览
数据作为深度学习的驱动力,对于模型的训练至关重要。充足的训练数据不仅可以缓解模型在训练时的过拟合问题,而且可以进一步扩大参数搜索空间,帮助模型进一步朝着全局最优解优化。然而,在许多领域或任务中,获取到充足训练样本的难度和代价非常高。因此,数据增广成为一种常用的增加训练样本的手段。本文对目前深度学习中的图像数据增广方法进行研究综述,梳理了目前深度学习领域为缓解模型过拟合问题而提出的各类数据增广方法,按照方法本质原理的不同,将其分为单数据变形、多数据混合、学习数据分布和学习增广策略等4类方法,并以图像数据为主要研究对象,对各类算法进一步按照核心思想进行细分,并对方法的原理、适用场景和优缺点进行比较和分析,帮助研究者根据数据的特点选用合适的数据增广方法,为后续国内外研究者应用和发展研究数据增广方法提供基础。针对图像的数据增广方法,单数据变形方法主要可以分为几何变换、色域变换、清晰度变换、噪声注入和局部擦除等5种;多数据混合可按照图像维度的混合和特征空间下的混合进行划分;学习数据分布的方法主要基于生成对抗网络和图像风格迁移的应用进行划分;学习增广策略的典型方法则可以按照基于元学习和基于强化学习进行分类。目前,数据增广已然成为推进深度学习在各领域应用的一项重要技术,可以很有效地缓解训练数据不足带来的深度学习模型过拟合的问题,进一步提高模型的精度。在实际应用中可根据数据和任务的特点选择和组合最合适的方法,形成一套有效的数据增广方案,进而为深度学习方法的应用提供更强的动力。在未来,根据数据和任务基于强化学习探索最优的组合策略,基于元学习自适应地学习最优数据变形和混合方式,基于生成对抗网络进一步拟合真实数据分布以采样高质量的未知数据,基于风格迁移探索多模态数据互相转换的应用,这些研究方向十分值得探索并且具有广阔的发展前景。
Deep learning has a tremendous influence on numerous research fields due to its outstanding performance in representing high-level feature for high-dimensional data. Especially in computer vision field
deep learning has shown its powerful abilities for various tasks such as image classification
object detection
and image segmentation. Normally
when constructing networks and using the deep learning-based method
a suitable neural network architecture is designed for our data and task
a reasonable task-oriented objective function is set
and a large amount of labeled training data is used to calculate the target loss
optimize the model parameters by the gradient descent method
and finally train an "end-to-end" deep neural network model to perform our task. Data
as the driving forces for deep learning
is areessential for training the model. With sufficient data
the overfitting problem during training can be alleviated
and the parametric search space can be expanded such that the model can be further optimized toward the global optimal solution. However
in several areas or tasks
attaining sufficient labeled samples for training a model is difficult and expensive. As a result
the overfitting problem during training occurs often and prevents deep learning models from achieving a higher performance. Thus
many methods have been proposed to address this issue
and data augmentation becomes one of the most important solutions to addressthis problem by increasing the amount and variety for the limited data set. Innumerable works have proven the effectiveness of data augmentation for improving the performance of deep learning models
which can be traced back to the seminal work of convolutional neural networks-LeNet. In this review
we examine the most representative image data augmentation methods for deep learning. This review can facilitate the researchers to adopt the appropriate methods for their task and promote the research progression of data augmentation. Current diverse data augmentation methods that can relieve the overfitting problem in deep learning models are compared and analyzed. Based on the difference of internal mechanism
a taxonomy for data augmentation methods is proposed with four classes: single data warping
multiple data mixing
learning the data distribution
and learning the augmentation strategy. First
for the image data
single data warping generates new data by image transformation over spatial space or spectral space. These methods can be divided into five categories: geometric transformations
color space transformations
sharpness transformations
noise injection
and local erasing.These methods have been widely used in image data augmentation for a long time due to their simplicity. Second
multiple data mixing can be divided according to the mixture in image space and the mixture in feature space. The mixing modes include linear mixing and nonlinear mixing for more than one image. Although mixing images seems to be a counter-intuitive method for data augmentation
experiments in many works have proven its effectiveness in improving the performance of the deep learning model. Third
the methods of learning data distribution try to capture the potential probability distribution of training data and generate new samples by sampling in that data distribution. This goal can be achieved by adversarial networks. Therefore this kind of data augmentation method is mainly based on generative adversarial network and the application of image-to-image translation. Fourth
the methods of learning augmentation strategy try to train a model to select the optimal data augmentation strategy adaptively according to the characteristics of the data or task. This goal can be achieved by metalearning
replacing data augmentation with a trainable neural network. The strategy searching problem can also be solved by reinforcement learning. When performing data augmentation in practical applications
researchers can select and combine the most suitable methods from the above methods according to the characteristics of data and tasks to form a set of effective data augmentation schemes
which in turn provides a stronger motivation for the application of deep learning methods with more effective training data. Although a better data augmentation strategy can be obtained more intelligently through learning data distribution or searching data augmentation strategies
how to customize an optimal data augmentation scheme automatically for a given task remains to be studied. In the future
conducting theoretical analysis and experimental verification of the suitability of various data augmentation methods for different data and tasks is of great research significance and application value
and will enable researchers to customize an optimal data augmentation scheme for their task. A large gap remains in applying the idea of metalearning in performing data augmentation
constructing a "data augmentation network" to learn an optimal way of data warping or data mixing. Moreover
improving the ability of generative adversarial networks(GAN)to fit the data distribution more perfectly is substantial because the oversampling in real data space should be the ideal manner of obtaining unobserved new data infinitely. The real world has numerous cross-domain and cross-modality data. The style transfer ability of encoder-decoder networks and GAN can formulate mapping functions between the different data distributions and achieve the complementation of data in different domains. Thus
exploring the application of "image-to-image translation" in different fields has bright prospects.
Brock A, Donahue J and Simonyan K. 2018. Large scale GAN training for high fidelity natural image synthesis[EB/OL]. 2018-09-28[2020-03-03] . https://arxiv.org/pdf/1809.11096.pdf https://arxiv.org/pdf/1809.11096.pdf
Chawla N V, Bowyer K W, Hall L O and Kegelmeyer W P. 2002. SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16(1): 321-357[DOI: 10.1613/jair.953]
Chen P G, Liu S, Zhao H S and Jia J Y. 2020. GridMask data augmentation[EB/OL].2020-01-13[2020-03-03] . https://arxiv.org/pdf/2001.04086.pdf https://arxiv.org/pdf/2001.04086.pdf
Cubuk E D, Zoph B, ManéD, Vasudevan V and Le Q V. 2019a. AutoAugment: learning augmentation strategies from data//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE: 113-123[ DOI: 10.1109/CVPR.2019.00020 http://dx.doi.org/10.1109/CVPR.2019.00020 ]
Cubuk E D, Zoph B, Shlens J and Le Q V. 2019b. RandAugment: practical automated data augmentation with a reduced search space[EB/OL]. 2019-09-30[2020-03-03] . https://arxiv.org/pdf/1909.13719.pdf https://arxiv.org/pdf/1909.13719.pdf
Devries T and Taylor G W. 2017a. Dataset augmentation in feature space[EB/OL]. 2017-02-17[2020-03-03] . https://arxiv.org/pdf/1702.05538.pdf https://arxiv.org/pdf/1702.05538.pdf
Devries T and Taylor G W. 2017b. Improved regularization of convolutional neural networks with cutout[EB/OL]. 2017-08-15[2020-03-03] . https://arxiv.org/pdf/1708.04552.pdf https://arxiv.org/pdf/1708.04552.pdf
Erhan D, Bengio Y, Courville A, Manzagol P A, Vincent P and Bengio S. 2010. Why does unsupervised pre-training help deep learning? Journal of Machine Learning Research, 11: 625-660[10.5555/1756006.1756025]
Frid-Adar M, Diamant I, Klang E, Amitai M, Goldberger J and Greenspan H. 2018. GAN-based synthetic medical image augmentation for increased CNN performance in liver lesion classification. Neurocomputing, 321: 321-331[DOI: 10.1016/j.neucom.2018.09.013]
Gatys L A, Ecker A S and Bethge M. 2015. A neural algorithm of artistic style[EB/OL]. 2015-08-26[2020-03-03] . https://arxiv.org/pdf/1508.06576.pdf https://arxiv.org/pdf/1508.06576.pdf
Goodfellow I J, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A and Bengio Y. 2014. Generative adversarial nets//Proceedings of the 27th International Conference on Neural Information Processing Systems. Montreal, Canada: MIT Press: 2672-2680
He K M, Zhang X Y, Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition(CVPR). Las Vegas, USA: IEEE: 770-778[ DOI: 10.1109/CVPR.2016.90 http://dx.doi.org/10.1109/CVPR.2016.90 ]
Hiasa Y, Otake Y, Takao M, Matsuoka T, Takashima K, Carass A, Prince J L, Sugano N and Sato Y. 2018. Cross-modality image synthesis from unpaired data using CycleGAN//Gooya A, Goksel O, Oguz I and Burgos N, eds. Simulation and Synthesis in Medical Imaging. Cham: Springer: 31-41[ DOI: 10.1007/978-3-030-00536-8_4 http://dx.doi.org/10.1007/978-3-030-00536-8_4 ]
Huang G, Liu Z, Maaten L V D and Weinberger K Q. 2017. Densely connected convolutional networks//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE: 2261-2269[ DOI: 10.1109/CVPR.2017.243 http://dx.doi.org/10.1109/CVPR.2017.243 ]
Inoue H. 2018. Data augmentation by pairing samples for images classification[EB/OL]. 2018-01-09[2020-03-03] . https://arxiv.org/pdf/1801.02929.pdf https://arxiv.org/pdf/1801.02929.pdf
Ioffe S and Szegedy C. 2015. Batch normalization: accelerating deep network training by reducing internal covariate shift[EB/OL]. 2015-02-11[2020-03-03] . https://arxiv.org/pdf/1502.03167.pdf https://arxiv.org/pdf/1502.03167.pdf
Isola P, Zhu J Y, Zhou T H and Efros A A. 2017. Image-to-image translation with conditional adversarial networks//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE: 5967-5976[ DOI: 10.1109/CVPR.2017.632 http://dx.doi.org/10.1109/CVPR.2017.632 ]
Jackson P T, Atapour-Abarghouei A, Bonner S, Breckon T and Obara B. 2018. Style augmentation: data augmentation via style randomization[EB/OL]. 2018-09-14[2020-03-03] . https://arxiv.org/pdf/1809.05375.pdf https://arxiv.org/pdf/1809.05375.pdf
Jurio A, Pagola M, Galar M, Lopez-Molina C and Paternain D. 2010. A comparison study of different color spaces in clustering based image segmentation//Hüllermeier E, Kruse R and Hoffmann F, eds. Information Processing and Management ofUncertainty in Knowledge-Based Systems. Applications. Berlin, Heidelberg: Springer: 532-541[ DOI: 10.1007/978-3-642-14058-7_55 http://dx.doi.org/10.1007/978-3-642-14058-7_55 ]
Kang G L, Dong X Y, Zheng L and Yang Y. 2017. PatchShuffle regularization[EB/OL]. 2017-07-22[2020-03-03] . https://arxiv.org/pdf/1707.07103.pdf https://arxiv.org/pdf/1707.07103.pdf
Karras T, Aila T, Laine S and Lehtinen J. 2017. Progressive growing of gans for improved quality, stability, and variation[EB/OL]. 2017-12-27[2020-03-03] . https://arxiv.org/pdf/1710.10196.pdf https://arxiv.org/pdf/1710.10196.pdf
Krizhevsky A, Sutskever I and Hinton G E. 2017. ImageNet classification with deep convolutional neural networks. Communications of the ACM, 60(6): 84-90[DOI: 10.1145/3065386]
LeCun Y, Bottou L, Bengio Y and Haffner P. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11): 2278-2324[DOI: 10.1109/5.726791]
LeCun Y, Bengio Y and Hinton G. 2015. Deep learning. Nature, 521: 436-444[DOI: 10.1038/nature14539]
Lemley J, Bazrafkan S and Corcoran P. 2017. Smart augmentation learning an optimal data augmentation strategy. IEEE Access, 5: 5858-5869[DOI: 10.1109/ACCESS.2017.2696121]
Li S T, Chen Y K, Peng Y L and Bai L. 2018. Learning more robust features with adversarial training[EB/OL]. 2018-04-20[2020-03-03] . https://arxiv.org/pdf/1804.07757.pdf https://arxiv.org/pdf/1804.07757.pdf
Ma D G, Tang P and Zhao L J. 2019. SiftingGAN: generating and sifting labeled samples to improve the remote sensing image scene classification baseline in vitro. IEEE Geoscience and Remote Sensing Letters, 16(7): 1046-1050[DOI: 10.1109/LGRS.2018.2890413]
Mirza M and Osindero S. 2014. Conditional generative adversarial nets[EB/OL]. 2014-11-06[2020-03-03] . https://arxiv.org/pdf/1411.1784.pdf https://arxiv.org/pdf/1411.1784.pdf
Moosavi-Dezfooli S M, Fawzi A and Frossard P. 2016. DeepFool: a simple and accurate method to fool Deep neural networks//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA: IEEE: 2574-2582[ DOI: 10.1109/CVPR.2016.282 http://dx.doi.org/10.1109/CVPR.2016.282 ]
Moreno-Barea F J, Strazzera F, Jerez J M, Urda D and Franco L. 2018. Forward noise adjustment scheme for data augmentation//Proceedings of 2018 IEEE Symposium Series on Computational Intelligence (SSCI). Bangalore, India: IEEE: 728-734[ DOI: 10.1109/SSCI.2018.8628917 http://dx.doi.org/10.1109/SSCI.2018.8628917 ]
Perez L and Wang J. 2017. The effectiveness of data augmentation in image classification using deep learning[EB/OL]. 2017-12-13[2020-03-03] . https://arxiv.org/pdf/1712.04621.pdf https://arxiv.org/pdf/1712.04621.pdf
Radford A, Metz L and Chintala S. 2015. Unsupervised representation learning with deep convolutional generative adversarial networks[EB/OL]. 2015-11-19[2020-03-03] . https://arxiv.org/pdf/1511.06434.pdf https://arxiv.org/pdf/1511.06434.pdf
Shorten C and Khoshgoftaar T M. 2019. A survey on image data augmentation for deep learning. Journal of Big Data, 6(1): 1-48[DOI: 10.1186/s40537-019-0197-0]
Simonyan K and Zisserman A. 2014. Very deep convolutional networks for large-scale image recognition[EB/OL]. 2014-09-04[2020-03-03] . https://arxiv.org/pdf/1409.1556.pdf https://arxiv.org/pdf/1409.1556.pdf
Singh K K and Lee Y J. 2017. Hide-and-seek: forcing a network to be meticulous for weakly-supervised object and action localization//Proceedings of 2017 IEEE International Conference on Computer Vision (ICCV). Venice, Italy: IEEE: 3544-3553[ DOI: 10.1109/ICCV.2017.381 http://dx.doi.org/10.1109/ICCV.2017.381 ]
Srivastava N, Hinton G, Krizhevsky A, Sutskever I and Salakhutdinov R. 2014. Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(1): 1929-1958[DOI: 10.5555/2627435.2670313]
Su J W, Vargas D V and Sakurai K. 2019. One pixel attack for fooling deep neural networks. IEEE Transactions on Evolutionary Computation, 23(5): 828-841[DOI: 10.1109/TEVC.2019.2890858]
Summers C and Dinneen M J. 2019. Improved mixed-example data augmentation//Proceedings of 2019 IEEE Winter Conference on Applications of Computer Vision (WACV). Waikoloa Village, USA: IEEE: 1262-1270[ DOI: 10.1109/WACV.2019.00139 http://dx.doi.org/10.1109/WACV.2019.00139 ]
Sung F, Yang Y X, Zhang L, Xiang T, Torr P H S and Hospedales T M. 2018. Learning to compare: relation network for few-shot learning//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 1199-1208[ DOI: 10.1109/CVPR.2018.00131 http://dx.doi.org/10.1109/CVPR.2018.00131 ]
Szegedy C, Liu W, Jia Y Q, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V and Rabinovich A. 2015. Going deeper with convolutions//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston, USA: IEEE: 1-9[ DOI: 10.1109/CVPR.2015.7298594 http://dx.doi.org/10.1109/CVPR.2015.7298594 ]
Szegedy C, Zaremba W, Sutskever I, Bruna J, Erhan D, Goodfellow I and Fergus R. 2013. Intriguing properties of neural networks[EB/OL]. 2013-12-21[2020-03-03] . https://arxiv.org/pdf/1312.6199.pdf https://arxiv.org/pdf/1312.6199.pdf
Takahashi R, Matsubara T and Uehara K. 2019. Data augmentation using random image cropping and patching for deep CNNs. IEEE Transactions on Circuits and Systems for Video Technology, 30(9): 2917-2931[DOI: 10.1109/TCSVT.2019.2935128]
Taylor L and Nitschke G. 2017. Improving deep learning using generic data augmentation[EB/OL]. 2017-08-20[2020-03-03] . https://arxiv.org/pdf/1708.06020.pdf https://arxiv.org/pdf/1708.06020.pdf
Tobin J, Fong R, Ray A, Schneider J, Zaremba W and Abbeel P. 2017. Domain randomization for transferring deep neural networks from simulation to the real world//Proceedings of 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Vancouver, Canada: IEEE: 23-30[ DOI: 10.1109/IROS.2017.8202133 http://dx.doi.org/10.1109/IROS.2017.8202133 ]
Tokozume Y, Ushiku Y and Harada T. 2018. Between-class learning for image classification//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 5486-5494[ DOI: 10.1109/CVPR.2018.00575 http://dx.doi.org/10.1109/CVPR.2018.00575 ]
Wang L, Xu X, Yu Y, Yang R, Gui R, Xu Z Z and Pu F L. 2019. SAR-to-optical image translation using supervised cycle-consistent adversarial networks. IEEE Access, 7: 129136-129149[DOI: 10.1109/ACCESS.2019.2939649]
Weiss K, Khoshgoftaar TM and Wang D D. 2016. A survey of transfer learning. Journal of Big Data, 3(1): 1-40[DOI: 10.1186/s40537-016-0043-6]
Wong S C, Gatt A, Stamatescu V and Mcdonnell M D. 2016. Understanding data augmentation for classification: when to warp?//Proceedings of 2016 International Conference on Digital Image Computing: Techniques and Applications (DICTA). Gold Coast, Australia: IEEE: 1-6[ DOI: 10.1109/DICTA.2016.7797091 http://dx.doi.org/10.1109/DICTA.2016.7797091 ]
Xie L X, Wang J D, Wei Z, Wang M and Tian Q. 2016. DisturbLabel: regularizing CNN on the loss layer//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA: IEEE: 4753-4762[ DOI: 10.1109/CVPR.2016.514 http://dx.doi.org/10.1109/CVPR.2016.514 ]
Zhang H Y, Cisse M, Dauphin Y N and Lopez-Paz D. 2017. Mixup: beyond empirical risk minimization[EB/OL]. 2017-10-25[2020-03-03] . https://arxiv.org/pdf/1710.09412.pdf https://arxiv.org/pdf/1710.09412.pdf
Zheng Z D, Zheng L and Yang Y. 2017. Unlabeled samples generated by GAN improve the person re-identification baseline in vitro//Proceedings of 2017 IEEE International Conference on Computer Vision (ICCV). Venice, Italy: IEEE: 3774-3782[ DOI: 10.1109/ICCV.2017.405 http://dx.doi.org/10.1109/ICCV.2017.405 ]
Zhong Z, Zheng L, Kang G L, Li S Z and Yang Y. 2017. Random erasing data augmentation[EB/OL]. 2017-08-16[2020-03-03] . https://arxiv.org/pdf/1708.04896.pdf https://arxiv.org/pdf/1708.04896.pdf
Zhu J Y, Park T, Isola P and Efros A A. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks//Proceedings of 2017 IEEE International Conference on Computer Vision (ICCV). Venice, Italy: IEEE: 2242-2251[ DOI: 10.1109/ICCV.2017.244 http://dx.doi.org/10.1109/ICCV.2017.244 ]
Zhu X Y, Liu Y F, Li J H, Wan T and Qin Z C. 2018. Emotion classification with data augmentation using generative adversarial networks//Phung D, Tseng V, Webb G, Ho B, Ganji M and Rashidi L, eds. Advances in Knowledge Discovery and Data Mining. Cham, Switzerland: Springer: 349-360[ DOI: 10.1007/978-3-319-93040-4_28 http://dx.doi.org/10.1007/978-3-319-93040-4_28 ]
相关文章
相关作者
相关机构
京公网安备11010802024621