深度神经网络结构搜索综述

唐浪; 李慧霞; 颜晨倩; 郑侠武; 纪荣嵘

doi:10.11834/jig.200202

学者观点 | 浏览量 : 0 下载量: 73 CSCD: 3

PDF
导出
分享
收藏
专辑

深度神经网络结构搜索综述
Survey on neural architecture search
2021年26卷第2期页码：245-264
收稿：2020-05-24，

修回：2020-7-16，

录用：2020-7-23，

纸质出版：2021-02-16
DOI： 10.11834/jig.200202
稿件说明：

移动端阅览

唐浪, 李慧霞, 颜晨倩, 郑侠武, 纪荣嵘. 深度神经网络结构搜索综述[J]. 中国图象图形学报, 2021,26(2):245-264. DOI： 10.11834/jig.200202.

Lang Tang, Huixia Li, Chenqian Yan, Xiawu Zheng, Rongrong Ji. Survey on neural architecture search[J]. Journal of Image and Graphics, 2021, 26(2): 245-264. DOI： 10.11834/jig.200202.

摘要

深度神经网络在图像识别、语言识别和机器翻译等人工智能任务中取得了巨大进展，很大程度上归功于优秀的神经网络结构设计。神经网络大都由手工设计，需要专业的机器学习知识以及大量的试错。为此，自动化的神经网络结构搜索成为研究热点。神经网络结构搜索（neural architecture search，NAS）主要由搜索空间、搜索策略与性能评估方法3部分组成。在搜索空间设计上，出于计算量的考虑，通常不会搜索整个网络结构，而是先将网络分成几块，然后搜索块中的结构。根据实际情况的不同，可以共享不同块中的结构，也可以对每个块单独搜索不同的结构。在搜索策略上，主流的优化方法包含强化学习、进化算法、贝叶斯优化和基于梯度的优化等。在性能评估上，为了节省计算时间，通常不会将每一个网络都充分训练到收敛，而是通过权值共享、早停等方法尽可能减小单个网络的训练时间。与手工设计的网络相比，神经网络结构搜索得到的深度神经网络具有更好的性能。在ImageNet分类任务上，与手工设计的MobileNetV2相比，通过神经网络结构搜索得到的MobileNetV3减少了近30%的计算量，并且top-1分类精度提升了3.2%；在Cityscapes语义分割任务上，与手工设计的DeepLabv3+相比，通过神经网络结构搜索得到的Auto-DeepLab-L可以在没有ImageNet预训练的情况下，达到比DeepLabv3+更高的平均交并比（mean intersection over union，mIOU），同时减小一半以上的计算量。神经网络结构搜索得到的深度神经网络通常比手工设计的神经网络有着更好的表现，是未来神经网络设计的发展趋势。

Abstract

Deep neural networks(DNNs) have achieved remarkable progress over the past years on a variety of tasks

such as image recognition

speech recognition

and machine translation. One of the most crucial aspects for this progress is novel neural architectures

in which hierarchical feature extractors are learned from data in an end-to-end manner rather than manually designed. Neural network training can be considered an automatic feature engineering process

and its success has been accompanied by an increasing demand for architecture engineering. At present

most neural networks are developed by human experts; however

the process involved is time-consuming and error-prone. Consequently

interest in automated neural architecture search methods has increased recently. Neural architecture search can be regarded as a subfield of automated machine learning

and it significantly overlaps with hyperparameter optimization and meta learning. Neural architecture search can be categorized into three dimensions: search space

search strategy

and performance estimation strategy. The search space defines which architectures can be represented in principle. The choice of search space largely determines the difficulty of optimization and search time. To reduce search time

neural architecture search is typically not applied to the entire network

but instead

the neural network is divided into several blocks and the search space is designed inside the blocks. All the blocks are combined into a whole neural network by using a predefined paradigm. In this manner

the search space can be significantly reduced

saving search time. In accordance with different situations

the architecture of the searched block can be shared or not. If the architecture is not shared

then every block has a unique architecture; otherwise

all the blocks in the neural network exhibit the same architecture. In this manner

search time can be further reduced. The search strategy details how the search space can be explored. Many search strategies can be used to explore the space of neural architectures

including random search

reinforcement learning

evolution algorithm

Bayesian optimization

and gradient-based optimization. A search strategy encompasses the classical exploration-exploitation trade-off. The objective of neural architecture search is typically to find architectures that achieve high predictive performance on unseen data. Performance estimation refers to the process of estimating this performance. The most direct approach is performing complete training and validation of the architecture on target data. This technique is extremely time-consuming

in the order of thousands of graphics processing unit (GPU) days. Thus

we generally do not train each candidate to converge. Instead

methods

such as like weight sharing

early stopping

or searching smaller proxy datasets

are used in the performance estimation strategy

considerably reducing training time for each candidate architecture performance estimation. Weight sharing can be achieved by inheriting weights from pretrained models or searching a one-shot model

whose weights are then shared across different architectures that are merely subgraphs of the one-shot model. The early stopping method estimates performance in accordance with the early stage validation result via learning curve extrapolation. Training on a smaller proxy dataset finds a neural architecture on a small dataset

such as CIFAR-10. Then

the architecture is trained on the target large dataset

such as ImageNet. Compared with neural networks developed by human experts

models found via neural architecture search exhibit better performance on various tasks

such as image classification

image detection

and semantic segmentation. For the ImageNet classification task

for example

MobileNetV3

which was found via neural architecture search

reduced approximately 30% FLOPs compared with the MobileNetV2

which was designed by human experts

with more 3.2% top-1 accuracy. For the Cityscapes segmentation task

Auto-DeepLab-L found via neural architecture search has exhibited better performance than DeepLabv3+

with only half multi-adds. In this survey

we propose several neural architecture methods and applications

demonstrating that neural networks found via neural architecture search outperform manually designed architectures on certain tasks

such as image classification

object detection

and semantic segmentation. However

insights into why specific architectures work efficiently remain minimal. Identifying common motifs

providing an understanding why these motifs are important for high performance

and investigating whether these motifs can be generalized over different problems will be desirable.

关键词

Keywords

references

Abadi M, Barham P, Chen J M, Chen Z F, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M, Kudlur M, Levenberg J, Monga R, Moore S, Murray D G, Steiner B, Tucker P, Vanhoucke V, Warden P, Wicke M, Yu Y and Zheng X Q. 2016. Tensorflow: a system for large-scale machine learning//Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation. Savannah, USA: USENIX Association: 265-283

Angeline P J, Saunders G M and Pollack J B. 1994. An evolutionary algorithm that constructs recurrent neural networks. IEEE Transactions on Neural Networks, 5(1): 54-65[DOI: 10.1109/72.265960]

Baker B, Gupta O, Raskar R and Naik N. 2017. Accelerating neural architecture search using performance prediction[EB/OL].[2020-05-21] . https://arxiv.org/pdf/1705.10823.pdf https://arxiv.org/pdf/1705.10823.pdf

Bergstra J, Bardenet R, Bengio Y and Kégl B. 2011. Algorithms for hyper-parameter optimization//Proceedings of the 24th International Conference on Neural Information Processing Systems. Granada, Spain: Curran Associates Inc.: 2546-2554

Bergstra J and Bengio Y. 2012. Random search for hyper-parameter optimization. Journal of Machine Learning Research, 13: 281-305

Bergstra J, Yamins D and Cox D D. 2013. Making a science of model search: hyperparameter optimization in hundreds of dimensions for vision architectures//Proceedings of the 30th International Conference on Machine Learning (ICML 2013). Atlanta, Gerorgia: ICML: 115-123

Cai H, Chen T Y, Zhang W N, Yu Y and Wang J. 2018a. Efficient architecture search by network transformation//Proceedings of the 32nd AAAI Conference on Artificial Intelligence. New Orleans, Louisiana, USA: AAAI: 2787-2794

Cai H, Gan C, Wang T, Zhang Z and Han S. 2019. Once-for-All: train one network and specialize it for efficient deployment[EB/OL].[2020-04-24] . https://arxiv.org/pdf/1908.09791.pdf https://arxiv.org/pdf/1908.09791.pdf

Cai H, Zhu L and Han S. 2018b. ProxylessNAS: direct neural architecture search on target task and hardware[EB/OL].[2020-04-24] . https://arxiv.org/pdf/1812.00332.pdf https://arxiv.org/pdf/1812.00332.pdf

Chen H, Zhuo L, Zhang B C, Zheng X W, Liu J Z, Ji R R and David D.2020. Binarized neural architecture search for efficient object recognition. International Journal of Computer Vision, 1-16

Chen L C, Collins M D, Zhu Y K, Papandreou G, Zoph B, Schroff F, Adam H and Shlens J. 2018. Searching for efficient multi-scale architectures for dense image prediction//Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montreal, Canada: Curran Associates Inc.: 8713-8724

Chen L C, Papandreou G, Schroff F and Adam H. 2017. Rethinking atrous convolution for semantic image segmentation[EB/OL].[2020-04-24]. https: //arxiv.org/pdf/1706.05587.pdf

Chen X, Xie L X, Wu J and Tian Q. 2019a. Progressive differentiable architecture search: bridging the depth gap between search and evaluation//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Seoul, South Korea: IEEE: 1294-1303[ DOI: 10.1109/ICCV.2019.00138 http://dx.doi.org/10.1109/ICCV.2019.00138 ]

Chen Y K, Yang T, Zhang X Y, Meng G F, Xiao X Y and Sun J. 2019b. DetNAS: backbone search for object detection//Proceedings of the 33rd International Conference on Neural Information Processing Systems. Vancouver, Canada: NIPS: 6638-6648

Chrabaszcz P, Loshchilov I and Hutter F. 2017. A downsampled variant of ImageNet as an alternative to the CIFAR datasets[EB/OL].[2020-04-24] . https://arxiv.org/pdf/1707.08819.pdf https://arxiv.org/pdf/1707.08819.pdf

Dalal N and Triggs B. 2005. Histograms of oriented gradients for human detection//Proceedings of 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05). San Diego, USA: IEEE: 886-893[ DOI: 10.1109/CVPR.2005.177 http://dx.doi.org/10.1109/CVPR.2005.177 ]

Deng J, Dong W, Socher R, Li L J, Li K and Li F F. 2009. ImageNet: a large-scale hierarchical image database//Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami, USA: IEEE: 248-255[ DOI: 10.1109/CVPR.2009.5206848 http://dx.doi.org/10.1109/CVPR.2009.5206848 ]

Deng L, Li J Y, Huang J T, Yao K S, Yu D, Seide F, Seltzer M, Zweig G, He X D, Williams J, Gong Y F and Acero A. 2013. Recent advances in deep learning for speech research at Microsoft//Proceedings of 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. Vancouver, Canada: IEEE: 8604-8608[ DOI: 10.1109/ICASSP.2013.6639345 http://dx.doi.org/10.1109/ICASSP.2013.6639345 ]

Devlin J, Chang M W, Lee K and Kristina T. 2019.BERT: pre-training of deep bidirectional transformers for language understanding[EB/OL].[2020-04-24] . https://arxiv.org/pdf/1810.04805.pdf https://arxiv.org/pdf/1810.04805.pdf

Domhan T, Springenberg J T and Hutter F. 2015. Speeding up automatic hyperparameter optimization of deep neural networks by extrapolation of learning curves//Proceedings of the 24th International Conference on Artificial Intelligence. Buenos Aires, Argentina: AAAI: 3460-3468

Dong X Y and Yang Y. 2019. Network pruning via transformable architecture search//Proceedings of the 33rd Conference on Neural Information Processing Systems. Vancouver, Canada: NIPS: 759-770

Dong X and Yang Y. 2020. NAS-Bench-201: Extending the Scope of Reproducible Neural Architecture Search[EB/OL].[2020-04-24] . https://arxiv.org/pdf/2001.00326.pdf https://arxiv.org/pdf/2001.00326.pdf

Du X Z, Lin T Y, Jin P C, Ghiasi G, Tan M X, Cui Y, Le Q V and Song X D. 2020. SpineNet: learning scale-permuted backbone for recognition and localization//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE: 11589-11598[ DOI: 10.1109/CVPR42600.2020.01161 http://dx.doi.org/10.1109/CVPR42600.2020.01161 ]

Elsken T, Metzen J H, Hutter F. 2018. Efficient multi-objective neural architecture search via Lamarckian evolution[EB/OL].[2020-04-24] . https://arxiv.org/pdf/1804.09081.pdf https://arxiv.org/pdf/1804.09081.pdf

Elsken T, Metzen J H and Hutter F. 2019. Neural architecture search: a survey. Journal of Machine Learning Research, 20: 1-21

Floreano D, Dürr P and Mattiussi C. 2008. Neuroevolution: from architectures to learning. Evolutionary Intelligence, 1(1): 47-62[DOI: 10.1007/s12065-007-0002-4]

Ghiasi G, Lin T Y and Le Q V. 2019. NAS-FPN: learning scalable feature pyramid architecture for object detection//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE: 7029-7038[ DOI: 10.1109/CVPR.2019.00720 http://dx.doi.org/10.1109/CVPR.2019.00720 ]

Gong X Y, Chang S Y, Jiang Y F and Wang Z Y. 2019. AutoGAN: neural architecture search for generative adversarial networks//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Seoul, South Korea: IEEE: 3223-3233[ DOI: 10.1109/ICCV.2019.00332 http://dx.doi.org/10.1109/ICCV.2019.00332 ]

Gordon A, Eban E, Nachum O, Chen B, Wu H, Yang T J and Chio E. 2018. Morphnet: fast and simple resource-constrained structure learning of deep networks//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 1586-1595[ DOI: 10.1109/CVPR.2018.00171 http://dx.doi.org/10.1109/CVPR.2018.00171 ]

Guo J Y, Han K, Wang Y H, Zhang C, Yang Z H, Wu H, Chen X H and Xu C. 2020a. Hit-Detector: hierarchical trinity architecture search for object detection//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE: 11402-11411[ DOI: 10.1109/CVPR42600.2020.01142 http://dx.doi.org/10.1109/CVPR42600.2020.01142 ]

Guo Z C, Zhang X Y, Mu H Y, Heng W, Liu Z C, Wei Y C and Sun J. 2020b. Single path one-shot neural architecture search with uniform sampling//Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer: 544-560[ DOI: 10.1007/978-3-030-58517-4_32 http://dx.doi.org/10.1007/978-3-030-58517-4_32 ]

He K M, Zhang X Y, Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA: IEEE: 770-778[ DOI: 10.1109/CVPR.2016.90 http://dx.doi.org/10.1109/CVPR.2016.90 ]

He Y, Liu P, Wang Z W, Hu Z L and Yang Y. 2019. Filter pruning via geometric median for deep convolutional neural networks acceleration//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE: 4340-4349[ DOI: 10.1109/CVPR.2019.00447 http://dx.doi.org/10.1109/CVPR.2019.00447 ]

He Y H, Lin J, Liu Z J, Wang H R, Li L J and Han S. 2018. AMC: automl for model compression and acceleration on mobile devices//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 815-832[ DOI: 10.1007/978-3-030-01234-2_48 http://dx.doi.org/10.1007/978-3-030-01234-2_48 ]

Hinton G E. 2007. Learning multiple layers of representation. Trends in Cognitive Sciences, 11(10): 428-434[DOI: 10.1016/j.tics.2007.09.004]

Howard A, Sandler M, Chen B, Wang W J, Chen L C, Tan M X, Chu G, Vasudevan V, Zhu Y K, Pang R M, Adam H and Le Q. 2019. Searching for mobilenetv3//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Seoul, South Korea: IEEE: 1314-1324[ DOI: 10.1109/ICCV.2019.00140 http://dx.doi.org/10.1109/ICCV.2019.00140 ]

Howard A G, Zhu M L, Chen B, Kalenichenko D, Wang W J, Weyand T, Andreetto M and Adam H. 2017. Mobilenets: efficient convolutional neural networks for mobile vision applications[EB/OL].[2020-04-24] . https://arxiv.org/pdf/1704.04861.pdf https://arxiv.org/pdf/1704.04861.pdf

Hu J, Shen L and Sun G. 2018. Squeeze-and-excitation networks//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 7132-7141[ DOI: 10.1109/CVPR.2018.00745 http://dx.doi.org/10.1109/CVPR.2018.00745 ]

Huang G, Liu Z, Van Der Maaten L and Weinberger K Q. 2017. Densely connected convolutional networks//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE: 4700-4708[ DOI: 10.1109/CVPR.2017.243 http://dx.doi.org/10.1109/CVPR.2017.243 ]

Ioffe S and Szegedy C. 2015. Batch normalization: accelerating deep network training by reducing internal covariate shift//Proceedings of the 32nd International Conference on Machine Learning. Lile, France: ICML: 448-456

Jozefowicz R, Zaremba W and Sutskever I. 2015. An empirical exploration of recurrent network architectures//Proceedings of the 32nd International Conference on Machine Learning. Lille, France: ICML: 2342-2350

Klein A, Falkner S, Springenberg J T and Hutter F. 2016. Learning curve prediction with Bayesian neural networks[EB/OL].[2020-04-24] . https://openreview.net/forum?id=S11KBYclx https://openreview.net/forum?id=S11KBYclx

Krizhevsky A, Sutskever I and Hinton G E. 2012. ImageNet classification with deep convolutional neural networks//Proceedings of the 25th International Conference on Neural Information Processing Systems. Lake Tahoe, USA: Curran Associates Inc.: 1097-1105

Kalchbrenner N, Grefenstette E and Blunsom P. 2014. A convolutional neural network for modelling sentences[EB/OL].[2020-04-24] . https://arxiv.org/pdf/1404.2188.pdf https://arxiv.org/pdf/1404.2188.pdf

LeCun Y, Bengio Y and Hinton G. 2015. Deep learning. Nature, 521(7553): 436-444[DOI: 10.1038/nature14539]

LeCun Y, Bottou L, Bengio Y and Huffner P. 1998. Gradient-based learning applied to document recognition//Proceedings of 1998 IEEE, 86(11): 2278-2324[ DOI: 10.1109/5.726791 http://dx.doi.org/10.1109/5.726791 ]

Li H, Kadav A, Durdanovic I, Samet H and Graf H P. 2016. Pruning filters for efficient convnets[EB/OL].[2020-04-24] . https://arxiv.org/pdf/1608.08710.pdf https://arxiv.org/pdf/1608.08710.pdf

Li L S, Jamieson K, DeSalvo G, Rostamizadeh A and Talwalkar A. 2017. Hyperband: a novel bandit-based approach to hyperparameter optimization. The Journal of Machine Learning Research, 18(1): 6765-6816

Lin M B, Ji R R, Zhang Y X, Zhang B C, Wu Y J and Tian Y H. 2020. Channel pruning via automatic structure search[EB/OL].[2020-04-24] . https://arxiv.org/pdf/2001.08565.pdf https://arxiv.org/pdf/2001.08565.pdf

Liu C X, Chen L C, Schroff F, Adam H, Hua W, Yuille A L and Li F F. 2019a. Auto-DeepLab: hierarchical neural architecture search for semantic image segmentation//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE: 82-92[ DOI: 10.1109/CVPR.2019.00017 http://dx.doi.org/10.1109/CVPR.2019.00017 ]

Liu C X, Zoph B, Neumann M, Shlens J, Hua W, Li L J, Li F F, Yuille A, Huang J and Murphy K. 2018a. Progressive neural architecture search//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 19-35[ DOI: 10.1007/978-3-030-01246-5_2 http://dx.doi.org/10.1007/978-3-030-01246-5_2 ]

Liu H, Simonyan K and Yang Y. 2018b. DARTS: differentiable architecture search[EB/OL].[2020-04-24] . https://arxiv.org/pdf/1806.09055.pdf https://arxiv.org/pdf/1806.09055.pdf

Liu N, Ma X L, Xu Z Y, Wang Y Z, Tang J and Ye J P. 2020. AutoCompress: an automatic DNN structured pruning framework for ultra-high compression rates//Proceedings of the 34th AAAI Conference on Artificial Intelligence, the 32nd Conference on Innovative Applications of Artificial Intelligence, the 10th Symposium on Educational Advances in Artificial Intelligence. Palo Alto, USA: AAAI: 4876-4883[ DOI: 10.1609/aaai.v34i04.5924 http://dx.doi.org/10.1609/aaai.v34i04.5924 ]

Liu Z C, Mu H Y, Zhang X Y, Guo Z C, Yang X, Cheng K T and Sun J. 2019b. MetaPruning: meta learning for automatic neural network channel pruning//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Seoul, South Korea: IEEE: 3296-3305[ DOI: 10.1109/ICCV.2019.00339 http://dx.doi.org/10.1109/ICCV.2019.00339 ]

Liu W, Anguelov D, Erhan D, Christian S, Scott R, Fu C Y and Alexander C. 2016. SSD: single shot multibox detector//Proceedings of 2016 European Conference on Computer Vision. Amsterdam, The Netherlands: Springer: 21-37

Long J, Shelhamer E and Darrell T. 2015. Fully convolutional networks for semantic segmentation//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 3431-3440

Lowe D G. 2004. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2): 91-110[DOI: 10.1023/B:VISI.0000029664.99615.94]

Newell A, Yang K Y and Deng J. 2016. Stacked hourglass networks for human pose estimation//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, The Netherlands: Springer: 483-499[ DOI: 10.1007/978-3-319-46484-8_29 http://dx.doi.org/10.1007/978-3-319-46484-8_29 ]

Noh H, Hong S and Han B Y. 2015. Learning deconvolution network for semantic segmentation//Proceedings of 2015 IEEE International Conference on Computer Vision (ICCV). Santiago, Chile: IEEE: 1520-1528[ DOI: 10.1109/ICCV.2015.178 http://dx.doi.org/10.1109/ICCV.2015.178 ]

Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z M, Gimelshein N, Antiga L, Desmaison A, Köpf A, Yang E, DeVito Z, Raison M, Tejani A, Chilamkurthy S, Steiner B, Fang L, Bai J J and Chintala S. 2019. PyTorch: an imperative style, high-performance deep learning library[EB/OL].[2020-04-24] . https://arxiv.org/pdf/1802.03268v2.pdf https://arxiv.org/pdf/1802.03268v2.pdf

Pham H, Guan M, Zoph B, Le Q V and Jeff D. 2018. Efficient Neural Architecture Search via Parameters Sharing[EB/OL].[2020-04-24] . https://arxiv.org/pdf/1802.03268v2.pdf https://arxiv.org/pdf/1802.03268v2.pdf

Rawal A and Miikkulainen R. 2018. From nodes to networks: evolving recurrent neural networks[EB/OL].[2020-04-24] . https://arxiv.org/pdf/1803.04439.pdf https://arxiv.org/pdf/1803.04439.pdf

Real E, Aggarwal A, Huang Y P and Le Q V. 2019. Regularized evolution for image classifier architecture search//Proceedings of the 33rd AAAI Conference on Artificial Intelligence. Palo Alto, USA: AAAI: 4780-4789[ DOI: 10.1609/aaai.v33i01.33014780 http://dx.doi.org/10.1609/aaai.v33i01.33014780 ]

Redmon J, Divvala S, Girshick R and Farhadi A. 2016. You only look once: unified, real-time object detection//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 779-788

Sandler M, Howard A, Zhu M L, ZhmoginovA and Chen L C. 2018. Mobilenetv2: inverted residuals and linear bottlenecks//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 4510-4520[ DOI: 10.1109/CVPR.2018.00474 http://dx.doi.org/10.1109/CVPR.2018.00474 ]

Saxena S and Verbeek J. 2016. Convolutional neural fabrics//Proceedings of the 30th International Conference on Neural Information Processing Systems. Barcelona, Spain: Curran Associates Inc.: 4053-4061

Sergeev A and Del Balso M. 2018. Horovod: fast and easy distributed deep learning in TensorFlow[EB/OL].[2020-04-24] . https://arxiv.org/pdf/1802.05799.pdf https://arxiv.org/pdf/1802.05799.pdf

Simonyan K and Zisserman A. 2014. Very deep convolutional networks for large-scale image recognition[EB/OL].[2020-04-24] . https://arxiv.org/pdf/1409.1556.pdf https://arxiv.org/pdf/1409.1556.pdf

Singh P, Verma V K, Rai P and Vinay P. 2019. Hetconv: heterogeneous kernel-based convolutions for deep CNNs//Proceedings of 2019 IEEE Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 4835-4844

Snoek J, Larochelle H and AdamsR P. 2012. Practical bayesian optimization of machine learning algorithms//Proceedings of the 25th International Conference on Neural Information Processing Systems. Lake Tahoe, USA: Curran Associates Inc.: 2951-2959

Snoek J, Rippel O, Swersky K, Kiros R, Satish N, Sundaram N, Patwary M M A, Prabhat and Adams R P. 2015. Scalable bayesian optimization using deep neural networks//Proceedings of the 32nd International Conference on Machine Learning. Lille, France: ICML: 2171-2180

Stamoulis D, Ding R Z, Wang D, Lymberopoulos D, Priyantha B, Liu J and Marculescu D. 2020. Single-path NAS: designing hardware-efficient ConvNets in less than 4 hours//Proceedings of 2019 European Conference on Machine Learning and Knowledge Discovery in Databases. Würzburg, Germany: Springer: 481-497[ DOI: 10.1007/978-3-030-46147-8_29 http://dx.doi.org/10.1007/978-3-030-46147-8_29 ]

Stanley K O, D'Ambrosio D B and Gauci J. 2009. A hypercube-based encoding for evolving large-scale neural networks.Artificial Life, 15(2): 185-212[DOI: 10.1162/artl.2009.15.2.15202]

Stanley K O and Miikkulainen R. 2002. Evolving neural networks through augmenting topologies. Evolutionary Computation, 10(2): 99-127[DOI: 10.1162/106365602320169811]

Sun Ke, Li M J, Liu D and Wang J D. 2018. Igcv3: interleaved low-rank group convolutions for efficient deep neural networks[EB/OL].[2020-04-24] . https://arxiv.org/pdf/1806.00178.pdf https://arxiv.org/pdf/1806.00178.pdf

Swersky K, Snoek J and Adams R P. 2014. Freeze-thaw Bayesian optimization[EB/OL].[2020-04-24] . https://arxiv.org/pdf/1406.3896.pdf https://arxiv.org/pdf/1406.3896.pdf

Szegedy C, Ioffe S, Vanhoucke V and Alemi A A. 2017. Inception-v4, inception-ResNet and the impact of residual connections on learning//Proceedings of the 31st AAAI Conference on Artificial Intelligence. San Francisco, USA: AAAI: 4278-4284

Szegedy C, Liu W, Jia Y Q, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V and Rabinovich A. 2015. Going deeper with convolutions//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston, USA: IEEE: 1-9[ DOI: 10.1109/CVPR.2015.7298594 http://dx.doi.org/10.1109/CVPR.2015.7298594 ]

Szegedy C, Vanhoucke V, Ioffe S, Shlens J and Wojna Z. 2016. Rethinking the inception architecture for computer vision//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA: IEEE: 2818-2826[ DOI: 10.1109/CVPR.2016.308 http://dx.doi.org/10.1109/CVPR.2016.308 ]

Tan M X, Chen B, Pang R M, Vasudevan V, Sandler M, Howard A and Le Q V. 2019. MnasNet: platform-aware neural architecture search for mobile//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE: 2820-2828[ DOI: 10.1109/CVPR.2019.00293 http://dx.doi.org/10.1109/CVPR.2019.00293 ]

Wan A, Dai X L, Zhang P Z, He Z J, Tian Y D, Xie S N, Wu B C, Yu M, Xu T, Chen K, Vajda P and Gonzalez J E. 2020. FBNetV2: differentiable neural architecture search for spatial and channel dimensions//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE: 12962-12971[ DOI: 10.1109/CVPR42600.2020.01298 http://dx.doi.org/10.1109/CVPR42600.2020.01298 ]

Wang N, Gao Y, Chen H, Wang P, Tian Z, Shen C H and Zhang Y N. 2020. NAS-FCOS: fast neural architecture search for object detection//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE: 11940-11948[ DOI: 10.1109/CVPR42600.2020.01196 http://dx.doi.org/10.1109/CVPR42600.2020.01196 ]

Wu B C, Dai X L, Zhang P Z, Wang Y H, Sun F, Wu Y M, Tian Y D, Vajda P, Jia Y Q and Keutzer K. 2019. FBNet: hardware-aware efficient ConvNet design via differentiable neural architecture search//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE: 10734-10742[ DOI: 10.1109/CVPR.2019.01099 http://dx.doi.org/10.1109/CVPR.2019.01099 ]

Xie S N, Girshick R, Dollár P, Tu Z W and He K M. 2017. Aggregated residual transformations for deep neural networks//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE: 5987-5995[ DOI: 10.1109/CVPR.2017.634 http://dx.doi.org/10.1109/CVPR.2017.634 ]

Xie S, Zheng H, Liu C and Lin L. 2018. SNAS: stochastic neural architecture search[EB/OL].[2020-04-24] . https://arxiv.org/pdf/1812.09926.pdf https://arxiv.org/pdf/1812.09926.pdf

Xu H, Yao L W, Li Z G, Liang X D and Zhang W. 2019. Auto-FPN: automatic network architecture adaptation for object detection beyond classification//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Seoul, South Korea: IEEE: 6648-6657[ DOI: 10.1109/ICCV.2019.00675 http://dx.doi.org/10.1109/ICCV.2019.00675 ]

Yang T J, Howard A, Chen B, Zhang X, Go A, Sandler M, Sze V and Adam H. 2018. NetAdapt: platform-aware neural network adaptation for mobile applications//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 289-304[ DOI: 10.1007/978-3-030-01249-6_18 http://dx.doi.org/10.1007/978-3-030-01249-6_18 ]

Ying C, Klein A, Christiansen E, Murphy K and Hutter F. 2019. NAS-Bench-101: Towards Reproducible Neural Architecture Search//Proceedings of the 36th International Conference on Machine Learning. Lugano, Switzerland: ICML: 63-77

Yu J H and Huang T. 2019a. Universally slimmable networks and improved training techniques//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Seoul, South Korea: IEEE: 1803-1811[ DOI: 10.1109/ICCV.2019.00189 http://dx.doi.org/10.1109/ICCV.2019.00189 ]

Yu J H and Huang T. 2019b. AutoSlim: towards one-shot architecture search for channel numbers[EB/OL].[2020-04-24] . https://arxiv.org/pdf/1903.11728.pdf https://arxiv.org/pdf/1903.11728.pdf

Yu J H, Yang L J, Xu N, Yang J C and Huang T. 2018. Slimmable Neural Networks[EB/OL].[2020-04-24] . https://arxiv.org/pdf/1812.08928.pdf https://arxiv.org/pdf/1812.08928.pdf

Zela A, Klein A, Falkner S and Hutter F. 2018. Towards automated deep learning: efficient joint neural architecture and hyperparameter search[EB/OL].[2020-04-24] . https://arxiv.org/pdf/1807.06906.pdf https://arxiv.org/pdf/1807.06906.pdf

Zela A, Siems J and Hutter F. 2019. NAS-Bench-1Shot1: Benchmarking and Dissecting One-shot Neural Architecture Search[EB/OL].[2020-04-24] . https://arxiv.org/pdf/2001.10422.pdf https://arxiv.org/pdf/2001.10422.pdf

Zheng X W, Ji R R, Tang L, Zhang B C, Liu J Z and Tian Q. 2019. Multinomial distribution learning for effective neural architecture search//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Seoul, South Korea: IEEE: 1304-1313[ DOI: 10.1109/ICCV.2019.00139 http://dx.doi.org/10.1109/ICCV.2019.00139 ]

Zhuo L A, Zhang B C, Chen H L, Yang L L, Chen C, Zhu Y J and Doermann D. 2020. CP-NAS: child-parent neural architecture search for binary neural networks[EB/OL].[2020-04-24] . https://arxiv.org/pdf/2005.00057 https://arxiv.org/pdf/2005.00057

Zoph B and Le Q V. 2016. Neural architecture search with reinforcement learning[EB/OL].[2020-04-24] . https://arxiv.org/pdf/1611.01578 https://arxiv.org/pdf/1611.01578

Zoph B, Vasudevan V, Shlens J and Le Q V. 2018. Learning transferable architectures for scalable image recognition//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 8697-8710[ DOI: 10.1109/CVPR.2018.00907 http://dx.doi.org/10.1109/CVPR.2018.00907 ]