发布时间: 2021-02-16
摘要点击次数:
全文下载次数:
DOI: 10.11834/jig.200202
2021 | Volume 26 | Number 2

学者观点

深度神经网络结构搜索综述

唐浪, 李慧霞, 颜晨倩, 郑侠武, 纪荣嵘

厦门大学信息学院人工智能系媒体分析与计算实验室, 厦门 361005

收稿日期: 2020-05-24; 修回日期: 2020-07-16; 预印本日期: 2020-07-23

基金项目: 国家自然科学基金项目（U1705262，61772443，61572410，61802324，61702136）；国家重点研发计划项目（2017YFC0113000，2016YFB1001503）；江西省重点研发计划项目（20171ACH80022）；广东省联合基金重点项目（2019B1515120049）

作者简介: 唐浪, 1996年生, 男, 硕士研究生, 主要研究方向为神经网络结构搜索、模型压缩。E-mail:langt@stu.xmu.edu.cn;
李慧霞, 女, 硕士研究生, 主要研究方向为神经网络加速与压缩。E-mail:hxlee@stu.xmu.edu.cn;
颜晨倩, 女, 硕士研究生, 主要研究方向为神经网络加速与压缩。E-mail:cqyan@stu.xmu.edu.cn;
郑侠武, 男, 博士研究生, 主要研究方向为神经网络结构搜索、神经网络加速与压缩。E-mail:zhengxiawu@stu.xmu.edu.cn;
纪荣嵘, 通信作者, 男, 教授, 主要研究方向为计算机视觉、多媒体技术和机器学习。E-mail:rrji@xmu.edu.cn

中图法分类号: TP37

文献标识码: A

文章编号: 1006-8961(2021)02-0245-20

摘要

深度神经网络在图像识别、语言识别和机器翻译等人工智能任务中取得了巨大进展，很大程度上归功于优秀的神经网络结构设计。神经网络大都由手工设计，需要专业的机器学习知识以及大量的试错。为此，自动化的神经网络结构搜索成为研究热点。神经网络结构搜索（neural architecture search，NAS）主要由搜索空间、搜索策略与性能评估方法3部分组成。在搜索空间设计上，出于计算量的考虑，通常不会搜索整个网络结构，而是先将网络分成几块，然后搜索块中的结构。根据实际情况的不同，可以共享不同块中的结构，也可以对每个块单独搜索不同的结构。在搜索策略上，主流的优化方法包含强化学习、进化算法、贝叶斯优化和基于梯度的优化等。在性能评估上，为了节省计算时间，通常不会将每一个网络都充分训练到收敛，而是通过权值共享、早停等方法尽可能减小单个网络的训练时间。与手工设计的网络相比，神经网络结构搜索得到的深度神经网络具有更好的性能。在ImageNet分类任务上，与手工设计的MobileNetV2相比，通过神经网络结构搜索得到的MobileNetV3减少了近30%的计算量，并且top-1分类精度提升了3.2%；在Cityscapes语义分割任务上，与手工设计的DeepLabv3+相比，通过神经网络结构搜索得到的Auto-DeepLab-L可以在没有ImageNet预训练的情况下，达到比DeepLabv3+更高的平均交并比（mean intersection over union，mIOU），同时减小一半以上的计算量。神经网络结构搜索得到的深度神经网络通常比手工设计的神经网络有着更好的表现，是未来神经网络设计的发展趋势。

关键词

人工智能; 计算机视觉; 深度神经网络; 强化学习; 进化算法; 神经网络结构搜索(NAS)

Survey on neural architecture search

Tang Lang, Li Huixia, Yan Chenqian, Zheng Xiawu, Ji Rongrong

Media Analytics and Computing Laboratory, Department of Artificial Intelligence, School of Informatics, Xiamen University, Xiamen 361005, China

Supported by: National Natural Science Foundation of China (U1705262, 61772443, 61572410, 61802324, 61702136); National Key Research and Development Program of China (2017YFC0113000, 2016YFB1001503)

Abstract

Deep neural networks(DNNs) have achieved remarkable progress over the past years on a variety of tasks, such as image recognition, speech recognition, and machine translation. One of the most crucial aspects for this progress is novel neural architectures, in which hierarchical feature extractors are learned from data in an end-to-end manner rather than manually designed. Neural network training can be considered an automatic feature engineering process, and its success has been accompanied by an increasing demand for architecture engineering. At present, most neural networks are developed by human experts; however, the process involved is time-consuming and error-prone. Consequently, interest in automated neural architecture search methods has increased recently. Neural architecture search can be regarded as a subfield of automated machine learning, and it significantly overlaps with hyperparameter optimization and meta learning. Neural architecture search can be categorized into three dimensions: search space, search strategy, and performance estimation strategy. The search space defines which architectures can be represented in principle. The choice of search space largely determines the difficulty of optimization and search time. To reduce search time, neural architecture search is typically not applied to the entire network, but instead, the neural network is divided into several blocks and the search space is designed inside the blocks. All the blocks are combined into a whole neural network by using a predefined paradigm. In this manner, the search space can be significantly reduced, saving search time. In accordance with different situations, the architecture of the searched block can be shared or not. If the architecture is not shared, then every block has a unique architecture; otherwise, all the blocks in the neural network exhibit the same architecture. In this manner, search time can be further reduced. The search strategy details how the search space can be explored. Many search strategies can be used to explore the space of neural architectures, including random search, reinforcement learning, evolution algorithm, Bayesian optimization, and gradient-based optimization. A search strategy encompasses the classical exploration-exploitation trade-off. The objective of neural architecture search is typically to find architectures that achieve high predictive performance on unseen data. Performance estimation refers to the process of estimating this performance. The most direct approach is performing complete training and validation of the architecture on target data. This technique is extremely time-consuming, in the order of thousands of graphics processing unit (GPU) days. Thus, we generally do not train each candidate to converge. Instead, methods, such as like weight sharing, early stopping, or searching smaller proxy datasets, are used in the performance estimation strategy, considerably reducing training time for each candidate architecture performance estimation. Weight sharing can be achieved by inheriting weights from pretrained models or searching a one-shot model, whose weights are then shared across different architectures that are merely subgraphs of the one-shot model. The early stopping method estimates performance in accordance with the early stage validation result via learning curve extrapolation. Training on a smaller proxy dataset finds a neural architecture on a small dataset, such as CIFAR-10. Then, the architecture is trained on the target large dataset, such as ImageNet. Compared with neural networks developed by human experts, models found via neural architecture search exhibit better performance on various tasks, such as image classification, image detection, and semantic segmentation. For the ImageNet classification task, for example, MobileNetV3, which was found via neural architecture search, reduced approximately 30% FLOPs compared with the MobileNetV2, which was designed by human experts, with more 3.2% top-1 accuracy. For the Cityscapes segmentation task, Auto-DeepLab-L found via neural architecture search has exhibited better performance than DeepLabv3+, with only half multi-adds. In this survey, we propose several neural architecture methods and applications, demonstrating that neural networks found via neural architecture search outperform manually designed architectures on certain tasks, such as image classification, object detection, and semantic segmentation. However, insights into why specific architectures work efficiently remain minimal. Identifying common motifs, providing an understanding why these motifs are important for high performance, and investigating whether these motifs can be generalized over different problems will be desirable.

Key words

artificial intelligence; computer vision; deep neural networks(DNNs); reinforcement learning; evolution algorithm; neural architecture search (NAS)

0 引言

随着计算机与GPU(graphics processing unit)算力的增长，深度神经网络不仅在语音识别、图像理解以及自然语言处理等人工智能领域(LeCun等，2015)得到广泛应用，而且在癌症检测、自动驾驶和复杂游戏等复杂系统中也得到应用。深度神经网络在识别任务中已经超越了人类的准确度，对比传统算法有着巨大的突破，如手工设计的方向梯度直方图(histogram of oriented gradient，HOG)特征(Dalal和Triggs，2005)和尺度不变特征变换(scale-invariant feature transform，SIFT)特征(Lowe，2004)等。这些性能的提升主要是由于深度神经网络能够提取很高层次的特征，从而得到对输入数据更加有效的表示。除此之外，GPU算力的提升使得神经网络的层数进一步增加，表征能力进一步增强。神经网络的结构从AlexNet(Krizhevsky等，2012)进化到VGGNet(Simonyan和Zisserman，2014)，再到GoogLeNet(Szegedy等，2015)和ResNet(He等，2016)，神经网络的结构变得越来越复杂。

深度神经网络之所以能取得很好的性能，很大程度上是由于优秀的网络结构设计。神经网络的设计需要考虑很多因素。以卷积神经网络为例，需要根据目标数据集来决定使用多少层的卷积神经网络。比如ResNet(He等，2016)有18层、34层、152层等不同版本，需要在准确率和计算量之间进行权衡。例如在较大的数据集上使用ResNet-18(He等，2016)可能会出现拟合能力较差的情况；在较小的数据集上使用ResNet-152(He等，2016)则很可能会过拟合，并且增大算力开销。除了层数之外还需要考虑卷积核的大小，如果全部使用3×3大小的卷积核，那么层数较少的情况下感知野会比较小，如果全部使用7×7大小的卷积核，算力要求会呈倍数增长。需要仔细权衡在哪些层使用3×3大小的卷积核，在哪些层使用5×5或者7×7大小的卷积核，既要保证获得足够大的感知野，又不会使得计算量过大而无法训练到收敛；另外还需要考虑在哪些层进行下采样(downsample)，是否使用最大池化层(max pooling)、平均池化层(average pooling)，或者使用步长(stride)大于1的卷积层进行下采样；在ResNet(He等，2016)证实残差学习(residual learning)能够极大地改善深度神经网络的收敛结果之后，还需要考虑不同的网络层之间的连接方式。从2014年VGGNet(Simonyan和Zisserman，2014)的线性结构，Inception(Szegedy等，2015, 2016, 2017；Ioffe和Szegedy，2015)网络的多路结构设计，到2016年ResNet(He等，2016)的残差连接和2017年DenseNet(Huang等，2017)的密集连接，再到2017年SENet(Hu等，2018)中注意力机制(attention)的加入，神经网络结构的设计变得愈发复杂，不再是简单的线性结构的设计，而是有向无环图(directed acyclic graph)的设计，这使得在特定场景下，设计兼顾性能和计算量的神经网络变得愈发困难。

2017年以前的神经网络大都是手工设计，根据特定的任务设计一个合适的神经网络需要花费大量精力和时间。因此，一些自动化的神经网络结构设计方法被提出，也称做神经网络结构搜索(neural architecture search，NAS)。神经网络结构搜索类似于机器学习中的超参数优化(hyperparameter optimization)，超参数优化在机器学习领域是一个很重要的研究主题(Bergstra等，2011；Bergstra和Bengio，2012；Snoek等，2012, 2015；Saxena和Verbeek，2016)。但是这些方法仍然局限在定长的搜索空间内，很难应用在变长度的搜索设定上，比如搜索神经网络的结构和每层的设定，因此这些方法缺少通用性和灵活性。

神经网络结构搜索可以分成3个维度：搜索空间(search space)、搜索策略(search strategy)和性能评估策略(performance estimation strategy)。搜索过程如图 1所示，搜索策略从搜索空间A中选取某个神经网络结构a，然后用性能估计策略来估计该网络结构的泛化性能，最后将估计的泛化性能反馈给搜索策略，进一步改进搜索策略(Elsken等，2019)。

图 1 神经网络结构搜索过程

Fig. 1 Illustration of neural architecture search

搜索空间定义了可以表征的神经网络结构，在搜索空间的设计上嵌入了很多先验知识，这样可以一定程度上减小搜索空间的大小，使搜索变得更简单。除此之外，嵌入的先验知识也使得搜索空间中的所有结构都有相对较好的性能。搜索空间通常分为两种，一种是比较简单的链式结构，另一种则是有向无环图的结构。在搜索空间中通常将单个神经网络结构编码成定长或者变长的串(token)。

搜索策略则是用来探索搜索空间，通常会面临探索—利用权衡(exploration-exploitation trade-off)，一方面需要尽快找到最优解，另一方面又不能过早地收敛到一个局部最优结构。常用的方法有强化学习、进化算法、贝叶斯优化和基于梯度的优化方法等。

神经网络结构搜索的目标是从定义的搜索空间内，面向某个未知数据集，找到具有最高的泛化性能的网络结构。性能评估策略在神经网络结构搜索中则用来估计采样到的神经网络结构的泛化性能。最简单直接的方式就是将每个神经网络在训练数据上都训练到收敛，然后将验证数据上的测试结果作为估计的泛化性能，但是需要花费相当多的时间，例如在Zoph和Le(2016)的方法中，仅在CIFAR-10数据集上的搜索就使用800个GPU并行搜索了一个月左右的时间。

本文在图像分类任务上，集中分析卷积神经网络的结构搜索。首先回顾较经典的深度神经网络的发展历程，然后从搜索空间、搜索策略、性能评估策略以及应用方向等角度详细介绍神经网络结构搜索的各种方法的优缺点，进而分析各种方法在不同场景和数据集下的性能表现，最后分析了未来的发展趋势。

1 深度神经网络相关概念与回顾

1.1 深度神经网络发展历程

神经网络的概念最早出现在20世纪40年代，由于在性能上改进甚微，神经网络的发展一度陷入沉寂，直到90年代LeCun等人(1998)提出里程碑式的LeNet模型后才被打破。LeNet模型基于对卷积神经网络(convolutional neural network，CNN)的改进，在原始像素中以分层的方式提取特征表示，成功推广到邮编识别等诸多任务和应用场景中，神经网络的概念又重新回到了人们的视野。

2011年微软开发的语音识别系统(Deng等，2013)和AlexNet(Krizhevsky等，2012)的问世，直接颠覆了整个图像识别领域的传统算法，这些方法使用了行之有效的深度神经网络训练策略(Hinton，2007)，大幅提升了模型的准确度，直接推动了深度神经网络的广泛应用。使得通过卷积、池化和非线性激活等操作提取高层次的特征、提高算法性能的一系列流程成为整个领域的潮流，推动了图像识别的检测领域甚至整个人工智能领域的发展浪潮。自此，深度卷积神经网络应用再也不局限于手写数字识别，DeepFace和DeepID作为两个高性能人脸识别与认证模型，开拓了DNN在人脸识别领域的疆土。此外，DNN作为一种新的特征学习算法引入后，Kalchbrenner等人(2014)成功地用DNN和Max池化等操作提取单词之间的关系，推动了自然语言处理(natural language processing，NLP)领域语言建模或语句建模的结构性变革。

鉴于DNN在图像和文本处理的卓越表现，DNN在物体检测(Liu等，2016；Redmon等，2016)、图像分割(Long等，2015)等领域开始大放异彩。2015年ResNet(He等，2016)问世，与先前的网络结构相比，ResNet以创造性的跳跃连接(shortcut)设计，缓解了深层网络结构梯度消失、无法迭代更新学习的问题，推动CNN向更深更宽的领域发展。在ImageNet数据集上，top-5的分类识别错误率仅3.75%，成功超越了人类所能达到的能力。自此，整个深度卷积神经网络领域进入了一个新的更深层次的研究阶段，在此推动下，在复杂游戏(如AlphaGo)、医疗诊断和自动驾驶等领域也有了更深层次的发展。深度神经网络的发展历程如表 1所示。

表 1 深度神经网络发展历程
Table 1 Development history of DNN

下载CSV

时间	深度神经网络发展历史
1942	提出神经网络的概念
1958	提出感知机的概念
1998	提出LeNet-5
2006	提出深度自动编码网络
2011	基于深度神经网络，微软提出语音识别技术
2012	提出AlexNet
2015	提出ResNet
2016年及以后	各种神经网络的研究进入繁盛期

随着硬件设备计算能力和存储能力的不断提升(如CPU(central processing unit)、GPU、TPU(tensor processing unit))，神经网络的计算时间耗费和开销大幅缩短。同时，硬件设备存储能力的提升使得大规模模型和数据集的存储和处理成为可能。深度卷积神经网络的网络层数和宽度越来越大，网络结构越来越复杂，各式各样的网络结构的数量呈井喷式增长。相继出现了DenseNet(Huang等，2017)、ResNext(Xie等，2017)和SENet(Hu等，2018)等。随着深度不断增加，神经网络的计算量和参数量也不断增加，以压缩与加速深度神经网络为目的的神经网络结构研究成为新的研究方向。深度可分离卷积(depthwise separable convolution)(Sandler等，2018)、交错组卷积(interleaved group convolution，IGC)(Sun等，2018)以及异构核卷积(heterogeneous kernel-based convolution) (Singh等，2019)等概念相继提出，成功推动了深度神经网络在移动端的部署应用。表 2总结了神经网络发展历程中比较特殊且具有里程碑意义的网络结构和对应的信息。

表 2 常用卷积神经网络总结
Table 2 Summary of popular DNNs

下载CSV

方法	top-1精度/%	top-5精度/%	输入分辨率/像素	卷积核大小	卷积层深度	参数量
AlexNet	56.6	80.2	224×224	3, 5, 11	5	6.14×10⁷
VGG-16	70.3	89.4	224×224	3	13	1.38×10⁸
GoogLeNet	68.9	89.1	224×224	1, 3, 5, 7	21	7.0×10⁶
ResNet-50	75.1	92.3	224×224	1, 3, 7	49	2.55×10⁷
MobileNetV2	72.2	90.5	224×224	3	20	3.4×10⁶
注：表中的精度与特定数据集相关联。

随着运算操作、连接方式等研究工作的相继问世，网络结构组合方式呈指数级爆炸式增长，如何在这些网络结构中确定最优的组合方式，手工设计越来越力不从心，因此自动化设计神经网络结构成为必然之选。

1.2 卷积神经网络的相关术语及核心部件

1.2.1 卷积层

卷积层(ConvLayer)是深度卷积神经网络的重要组成部分，每个卷积层都包含了比较复杂的高维卷积运算，其输出的每一幅特征图，称为通道(channel)，是对某一输入特征图与所有的卷积核/滤波(filter)进行卷积运算(通过stride窗口得到的小的特征图与卷积核之间，对应位置相乘再相加)得到的结果。以正常的RGB图像为例，其卷积过程如图 2所示，星号代表卷积运算，H、W、C与H′、W′、C′分别为输入/输出的高度，宽度，通道数。

图 2 高维卷积计算

Fig. 2 High-dimension convolutional operator

具体地，卷积的计算过程可以表示为

$ {\mathit{\boldsymbol{O}}_{h', w\prime, c\prime }} = \sum\limits_{i = 1}^k {\sum\limits_{j = 1}^k {\sum\limits_{c = 1}^C {{\mathit{\boldsymbol{K}}_{i, j, c, c\prime }}{\mathit{\boldsymbol{I}}_h}_{_i, {w_j}, c}} } } $

式中，$\mathit{\boldsymbol{I}} $是输入特征图，通道数为$ C$，空间维度分别是$ H$和$ W$，其中每个元素为$ {{\mathit{\boldsymbol{I}}_h}_{_i, {w_j}, c}}$，而$ \mathit{\boldsymbol{O}}$是有$C' $个通道的输出特征图，空间维度变成$ H'$和$W' $，其中每个元素为$ {\mathit{\boldsymbol{O}}_{h', w\prime, c\prime }}$，卷积核$ \mathit{\boldsymbol{K}}$是$ C'$个连续的3-D卷积核，其空间维度为$ k \times k \times C \times C'$。从图 2可以看出，输入特征图通过多层卷积操作，将空间信息经过线性变换转换到通道信息上，用以有效地提取特征信息。

1.2.2 全连接层

全连接层在整个深度卷积神经网络中起着分类器的作用，其将学到的“分布式特征表示”映射到样本标记空间，是一种特殊的、卷积核大小为1的卷积，输入特征图与输出特征图之间是密集连接的关系，其具体的计算方式为Z=WX，其中，X为全连接层的输入特征图，W为权重，Z为输出特征矩阵。

1.2.3 池化层

池化层模拟人眼视觉对物像降维和抽象的概念，是神经网络中一种非常重要的降采样操作，能够有效减少特征图的空间尺寸，减少计算量，使模型更加关注是否存在某些特征而不是特征具体的位置，并在一定程度上减少了过拟合现象。最常见的方法是最大池化(max pooling)和平均池化(average pooling)，这些操作通常将输入的图像划分为若干矩形区域，分别取这些矩形区域的最大值或者平均值，组成新的特征图。图 3是以2维矩阵的方式展现的两种不同池化的特征转换过程。同时，因为最大池化和平均池化分别提取了小patch特征区域的最大值和平均值，很多研究致力于探索这两种方法之间的平衡。

图 3 池化运算

Fig. 3 The operator of pooling

1.2.4 激活层

只有卷积操作的神经网络只是对输入特征图进行线性变化。为了使神经网络去学习、理解非线性的函数和场景，在神经网络中引入了激活层，使得神经网络可以任意逼近任何非线性函数。常用的激活函数有Sigmoid、ReLU和Tanh等。

1.2.5 归一化层

神经网络是通过多层卷积、池化和激活层的不同组合而形成的特征提取结构，随着训练的不断进行，每一层的权重和激活分布都会发生变化，随着网络层数的增加，这种变化会逐层传递，导致最终得到的特征图发生偏移，为了解决这个问题，Google于2015年提出了批归一化层(batch normalization)的概念。其计算过程如下：

1) 计算批量数据的均值($mean $)和方差($ var$)；

2) 通过($x-mean $)/$var $的方式对批数据进行规范化，得到$ x′$；

3) 通过可学习参数$\lambda $和$\beta $，对$X ′$做尺度变化和偏移，即 $ y = \lambda \cdot X' + \beta $。

1.3 经典的深度神经网络模型

自神经网络概念提出以来，各式各样的经典的网络模型层出不穷，每个网络模型都具有不同的网络结构，并各有侧重，包括层数、层类型、层参数(如卷积核尺寸和大小、输入输出通道)和层内/层间连接方式等。比较经典的深度卷积神经网络模型包括LeNet(LeCun等，1998)、AlexNet(Krizhevsky等，2012)、VGGNet(Simonyan和Zisserman，2014)、GoogLeNet(Szegedy等，2015)、ResNet(He等，2016)、MobileNet(Howard等，2017)等。

LeNet是LeCun等人(1998)提出的用于手写字体识别任务的最早的卷积神经网络，是第1个成功应用的神经网络结构。作为早期版本，其网络结构仅使用了2个卷积层和2个全连接层，每一层的卷积核的大小均设定为5，第1层用了20个卷积核，第2层用了50个卷积核，每个卷积后通过使用Sigmoid激活函数增加网络的非线性，并通过全局平均池化的方法对特征进行降采样。

AlexNet是ImageNet比赛中第1个冠军方案的神经网络结构。AlexNet为后续人工智能领域的发展奠定了坚实基础，主要源于：该网络使用空间维度为3×3的卷积核，极大减少了运算量。引入多种加速和提高神经网络分类精度的方法，如引入ReLU激活函数代替传统的Sigmoid、Tanh等。引入局部正则化(local response normalization，LRN)统一输出分布。在训练速度上，第1次将网络模型部署在两台机器上训练，有效提高了训练速度。另外，通过引入扩充数据和Dropout操作，缓解了模型的过拟合现象。

VGGNet也是ImageNet比赛的冠军方案，该网络结构统一使用3×3的卷积核代替传统的更大的5×5的卷积核，通过多层的堆叠策略获得相同的感受域，具有广泛的拓展、泛化和稳定的特征提取能力。

GoogLeNet是2014年ImageNet比赛的分类任务冠军方案，该网络结构创造性地引入了Inception Module的部件，如图 4所示。

图 4 GoogLeNet中的Inception结构

Fig. 4 Inception module of GoogLeNet

GoogLeNet致力于融合多种不同尺度的感受野的特征图信息，在每个Module中，同时使用了1×1、3×3、5×5三种不同的卷积核，在空间尺度大于1的卷积操作之后，通过MaxPooling进行下采样，最后将所有的卷积结果级连(concat)在一起，共同组成该Module的输出特征图。不同尺度的核保证了该网络结构对不同尺度物体的适应性，而该Module的结构也便于对网络进行扩充。

ResNet是2015年微软提出的一种网络结构，是第1次在ImageNet数据集上超过人类认知程度的网络结构，主要部件是残差块(residual block)。在残差块中，创造性地提出了跳跃(shortcut)连接的概念，将block的输入(恒等变换)与经过block映射(线性投影)后产生的输出结果进行融合，能够克服训练过程中梯度消失的问题，奠定了人工智能领域向更深更大的网络发展的基础。残差块的基本结构(He等，2016)如图 5所示。

图 5 ResNet中的残差模块(He等，2016)

Fig. 5 Residual block in ResNet(He et al., 2016)

随着网络深度和宽度的拓展，网络的计算量不断增加，为了能够将模型部署在端上，轻量级的网络结构设计研究应运而生，最具代表性的工作之一就是MobileNet，其创造性地提出了深度可分离卷积，使得每一个通道的特征图，有且仅有一个卷积核与其对应，极大减少了模型的运算量。

2 神经网络结构搜索算法

2.1 搜索空间

搜索空间定义了可以表征的神经网络结构，在Zoph和Le(2016)提出的方法中，卷积神经网络中每一层的搜索空间包括滤波器高度(filter height)、滤波器宽度(filter width)、纵向步长(stride height)、横向步长(stride width)、滤波器数量(number of filters)和残差连接点(anchor point)，在这样的搜索空间上进行搜索相当于是直接搜索整个神经网络结构，最后搜索到的卷积神经网络结构(Zoph和Le，2016)如图 6所示。

图 6 直接搜索整个网络得到的结构(Zoph和Le, 2016)

Fig. 6 The neural architecture that searched on the whole neural network(Zoph and Le, 2016)

为了减小计算时间，神经网络结构搜索通常只搜索某些单元(cell)，然后将这些单元堆积成整个神经网络的结构，堆积方式通常根据先验知识手动定义。在Zoph等人(2018)、Real等人(2019)、Pham等人(2018)、Liu等人(2018b)和Xie等人(2018)提出的方法中，将搜索的单元分为正常单元(normal cell)和衰减单元(reduction cell)两种。在衰减单元中进行下采样，将图像的尺寸减半，并将通道数加倍，通常在多个正常单元之后会有一个衰减单元，这样的结构重复多次(×N)构成整个网络(Zoph等，2018)，如图 7所示。

图 7 多个正常单元与衰减单元组合成整个神经网络(Zoph等, 2018)

Fig. 7 Normal cells and reduction cells are stacked into the full neural architecture(Zoph et al., 2018)

神经网络结构搜索通常在单元内部进行，这些单元组成了整个神经网络。为了节省计算时间，在Zoph等人(2018)、Real等(2019)、Liu等人(2018b)和Xie等人(2018)提出的方法中，所有的正常单元的内部结构都是相同的，所有的衰减单元的内部结构也是相同的，相当于仅搜索两种单元结构，并且在所有正常或衰减单元之间共享这些结构。这一类较典型的搜索空间如NASNet搜索空间(Zoph等，2018)。该搜索空间将每个单元内部分成多个元素，对其中每个元素的具体操作如下：

1) 从前面所有单元的输出中选择两个作为该元素的输入。

2) 对选择的两个输入分别采样两个操作。可以选择的操作包括恒等映射(identity)，1×3卷积加3×1卷积，1×7卷积加7×1卷积，3×3的空洞卷积(dilated convolution)，3×3的平均池化(average pooling, avg)，3×3、5×5、7×7的最大池化(max pooling, max)，1×1、3×3的卷积，3×3、5×5、7×7的深度分离卷积(depthwise-separable convolution, sep)。

3) 将选择的两个输入分别经过采样到的操作处理后，再将得到的两个输出通过点加(element-wise addition, add)或者通道维度上的级联(concatenation, concat)进行结合，得到最终输出。

4) 将单元内多个元素的输出连接在一起，得到该单元的输出。在Zoph等人(2018)提出的方法中，搜索到的正常单元和衰减单元(Zoph等，2018)如图 8所示(图中，h代表特征图，h_j是第j层的输入，h_j+1是第j层的输出)。Zoph等人(2018)、Real等人(2019)和Liu等人(2018b)提出的方法中都使用了类似的搜索空间。

图 8 在NASNet搜索空间中搜索到的正常单元与衰减单元(Zoph等，2018)

Fig. 8 Normal cell and reduction cell searched in the NASNet search space(Zoph et al., 2018)

DARTS(differentiable architecture search)(Liu等，2018b)类的搜索空间与NASNet(Zoph等，2018)的搜索空间类似，也是在单元级别进行搜索，同样搜索正常单元与衰减单元。不过DARTS的搜索空间相对更小，仅搜索路径上的操作。如图 9所示，网络中单元的拓扑结构是预先定义的一个全连接的有向无环图，再根据操作的权重大小保留有向无环图中的一部分边，而在搜索阶段仅对选择的操作进行搜索。可以选择的操作分别为3×3、5×5的深度分离卷积；3×3、5×5的空洞深度分离卷积(dilated separable convolution)、3×3的最大池化、3×3的平均池化、恒等映射和无操作(去掉该连接)。

图 9 DARTS类的搜索空间(Liu等，2018b)

Fig. 9 DARTS search space(Liu et al., 2018b)

NASNet和DARTS类的搜索空间都是搜索正常单元和衰减单元，整个神经网络结构中所有的正常单元或衰减单元都共享同样的结构。这类共享结构的做法有利于减少时间开销，但是缺乏灵活性。

另一类搜索空间也是在单元内进行搜索，但是不同的单元之间并没有共享结构，这类搜索空间通常是将已有的神经网络结构作为骨架。通常，手工设计的网络结构也是由多个相同的单元堆积而成，这类搜索空间对已有的网络单元进行进一步的调优。ProxylessNAS(Cai等，2018b)、MDENAS(Zheng等，2019)、MNasNet(Tan等，2019)、MobileNetV3(Howard等，2019)、FB-Net(Wu等，2019)和FBNetV2(Wan等，2020)都是在MobileNetV2(Sandler等，2018)的基础上进行搜索，改变其内部单元的卷积层数、卷积核大小和滤波器数量等。ProxylessNAS(Cai等，2018b)将MobileNetV2(Sandler等，2018)中的MBConv单元的卷积核大小从固定的3×3变为可选的{3×3, 5×5, 7×7}，将通道扩展率(expansion ratio)从固定的6变为可选的{3, 6}，并且以一定的概率跳过某些MBConv单元。ProxylessNAS(Cai等，2018b)将这些可选项应用到每一个MBConv单元上，以此作为搜索空间进行搜索。与NASNet(Zoph等，2018)和DARTS(Liu等，2018b)不同，在ProxylessNAS(Cai等，2018b)中，所有的MBConv单元并没有共享结构，所有单元都可能有不同的结构，具有更高的灵活性，并在MobileNetV2(Sandler等，2018)的基础上进行搜索时，合理嵌入了手工设计神经网络结构的经验。MNasNet(Tan等，2019)和MobileNetV3(Howard等，2019)也是在MobileNetV2(Sandler等，2018)的基础上搜索得到的，与ProxylessNAS(Cai等，2018b)搜索空间的主要区别是增加了是否使用squeeze-and-excitation模块的选项。MNasNet的搜索空间如图 10所示。

图 10 MNasNet的搜索空间(Tan等，2019)

Fig. 10 Search space of MNasNet(Tan et al., 2019)

2.2 搜索策略

搜索策略主要用于探索搜索空间中的神经网络结构，试图找到搜索空间中性能最好的神经网络结构。常用的搜索策略包括随机搜索(random search)、贝叶斯优化(Bayesian optimization)、进化算法(evolutionary methods)、强化学习(reinforcement learning)和基于梯度的方法(gradient-based methods)等。

在早期的研究(Angeline等，1994；Stanley和Miikkulainen，2002；Floreano等，2008；Stanley等，2009；Jozefowicz等，2015)中，进化算法在神经网络上已经有所应用，不过大部分是用于权重的优化而非网络结构的优化。贝叶斯优化也在早期的神经网络结构搜索中有所应用，Bergstra等人(2013)利用贝叶斯优化找到了state-of-the-art的网络结构。Domhan等人(2015)则在没有数据增强的情况下，在CIFAR-10数据集上找到了state-of-the-art的网络结构。近期神经网络结构搜索的发展主要源于Zoph和Le(2016)提出的方法，利用强化学习的方法来探索搜索空间。如图 11所示，Zoph和Le(2016)为了将神经网络结构表达为可变长度的编码，使用RNN(recurrent neural network)作为强化学习的代理(agent)，然后利用策略梯度(policy gradient)的方法优化RNN采样到的神经网络的期望性能。

图 11 基于强化学习的神经网络结构搜索(Zoph和Le, 2016)

Fig. 11 Neural architecture search based on reinforcement learning(Zoph and Le, 2016)

Real等人(2019)使用正则进化(regularized evolution)搜索神经网络，并在同样的搜索空间和数据集下与其他方法进行对比。如图 12所示，在同样的NASNet搜索空间下，在CIFAR-10数据集上进行搜索，正则进化相比强化学习的算法收敛更快，最后的收敛结果也更好(Real等，2019)。在总体效果上，正则进化优于强化学习，强化学习优于随机搜索。Cai等人(2019)同样使用进化算法在搜索空间中搜索最佳性能的神经网络结构。

图 12 正则进化、强化学习、随机搜索之间的对比(Real等, 2019)

Fig. 12 Comparion between evolution, reinforcement learning and random search(Real et al., 2019)

DARTS(Liu等，2018b)使用一种基于梯度的方法，配合参数共享的训练策略，将神经网络结构的搜索速度提升了很多。早期Zoph和Le(2016)的方法在800个GPU上训练了近1个月，而DARTS仅需在1个GPU上训练1天，大幅提高了神经网络结构搜索速度。ProxylessNAS(Cai等，2018b)同样利用强化学习或基于梯度的搜索算法，主要对DARTS显存开销过大问题进行改进，将其中的混合操作(mixed operation)替换成概率采样操作。这样在每条边有N个候选操作的情况下，可以将显存开销减少至1/N的大小。MDENAS(Zheng等，2019)在DARTS的方法上进一步改进，将每条边上操作的选择建模成从多项式分布的采样过程，在训练过程中优化该多项式分布的参数，进而在1个GPU上搜索4 h就能收敛到较好的结果。

2.3 性能评估策略

性能评估策略的主要作用是对采样到的网络结构进行性能估计，最简单直接的方法就是将搜索到的每个网络结构都训练到收敛，然后将验证集上的精度作为估计性能。Zoph和Le(2016)、Zoph等人(2018)、Liu等人(2019a)和Real等人(2019)提出的方法都是为每一次搜索到的网络结构重新初始化并单独进行一次训练。Zoph和Le(2016)的方法共搜索了12 800个神经网络结构，在800个GPU上花费了近一个月的时间，Real等人(2019)的方法在450个GPU上训练了7 d，代价都非常昂贵。

神经网络结构搜索的巨大计算量主要来自对神经网络的性能进行评估，很多工作都在尝试解决该问题，主要方法大致可以分为以下4类：

1) 低置信度的预测。降低训练步数，在部分数据集上进行训练或者在分辨率更低的数据集上进行训练。

2) 在现有预训练模型的基础上进行结构改动，并在训练时继承原模型的权重。

3) 对学习曲线进行插值，对精度进行预测。

4) 将搜索到的所有结构都当做某个更大的超网络结构的子结构，所有子结构都继承超网络的权重，这种方式通常称为one-shot。

例如，在Zoph等人(2018)和Zela等人(2018)提出的方法中，神经网络的训练时间相对较短；在Klein等人(2016)的方法中，则使用部分数据进行训练。前者应该是更为合理的方法，因为如果训练同样的迭代步数，在整个数据集上训练时，每一步迭代的梯度估计会更加准确。在Chrabaszcz等人(2017)的方法中使用低分辨率图像进行训练。在Zoph等人(2018)的方法中则使用更少的单元数进行训练。

Swersky等人(2014)、Domhan等人(2015)、Klein等人(2016)、Baker等人(2017)、Rawal和Miikkulainen(2018)采用对神经网络的训练曲线进行插值的方法加速模型的评估过程。Domhan等人(2015)对神经网络模型进行初步训练后，基于初始训练曲线进行插值，得到后续的训练精度。而Swersky等人(2014)、Klein等人(2016)、Baker等人(2017)、Rawal和Miikkulainen(2018)将网络结构的超参数用于部分训练曲线的预测。Liu等人(2018a)和Cai等人(2019)则是根据训练了一个额外的代理模型用于精度预测，输入神经网络结构的编码，输出预测性能。使用代理模型预测神经网络结构性能是目前速度最快的性能估计策略，甚至不需要将搜索到的结构在验证集上进行前向预测，但是对代理模型的准确度要求很高。

很多工作(Guo等，2020b；Liu等，2018b；Pham等，2018；Cai等，2019)都使用one-shot方式进行神经网络结构搜索，搜索空间中的所有结构都属于某个超网络的子网络，这种方法便于结构之间的权值共享，大幅缩短性能预测中神经网络结构的训练时间。以ENAS(efficient neural architecture search)(Pham等，2018)和DARTS(Liu等，2018b)为代表的做法在不同的网络结构间共享操作(比如3×3卷积)的权重。如图 13所示，ENAS(Pham等，2018)从一个大的超图中采样子图，并且子图中的操作继承超图上的权重。

图 13 从超图采样子图(Pham等，2018)

Fig. 13 Sample subgraph from hypernetwork(Pham et al., 2018)

在Guo等人(2020b)、Cai等人(2019)和Stamoulis等人(2020)的方法中，权重共享的粒度更加细化，除了不同子图之间的权重共享，不同的操作之间也进行了权重共享。如图 14所示，在Stamoulis等人(2020)的方法中，将3×3大小的卷积核的权重与5×5大小的卷积核中间部分的权重进行共享，5×5卷积核的权重则与7×7卷积核的中间部分权重进行共享。这类方法通常称为single-path。

图 14 不同大小卷积核之间的权值共享(Stamoulis等，2020)

Fig. 14 Weight sharing between different convolution kernels(Stamoulis et al., 2020)

((a) multi path search space; (b) single path search space)

Guo等人(2020b)、Yu等人(2018)、Yu和Huang(2019a, b)则使用另一种形式的single-path方法，在不同输入/输出通道的卷积操作之间进行权值共享，而非在不同大小的卷积核之间进行共享，如图 15(Guo等，2020b)所示。在OFA(once for all)(Cai等，2019)中则同时利用了不同子图操作之间的权值共享、不同大小卷积核之间的权值共享以及不同输入/输出通道数的卷积操作之间的权值共享。OFA先将最大的超网络训练至收敛，然后使用超网络通过知识蒸馏(knowledge distillation)的方式对子网络进行训练。实验表明，在one-shot方式下使用知识蒸馏可以加快子网络的收敛速度。

图 15 不同输入/输出通道数的卷积操作之间的权值共享(Guo等，2020b)

Fig. 15 Weight sharing between convolution operations with different number of input/output channels(Guo et al., 2020b)

2.4 应用

在计算机视觉方面，神经网络结构搜索的主要研究方向是面向分类任务，然后将这些搜索技术用于搜索其他计算机视觉任务(检测、分割等)上的骨架网络。

2.4.1 目标检测

神经网络结构搜索在检测方向的代表工作包括NAS-FCOS(Wang等，2020)、NAS-FPN(Ghiasi等，2019)、DetNAS(Chen等，2019b)等。Chen等人(2019b)引入SinglePathNAS的搜索算法，在ImageNet数据集上训练超网，利用目标检测任务作为指引进行骨干网络(backbone)的搜索。Ghiasi等人(2019)则使用强化学习搜索目标检测网络的特征金字塔结构(feature pyramid network，FPN)，如图 16所示。

图 16 特征金字塔结构搜索(Ghiasi等，2019)

Fig. 16 NAS-FPN(Ghiasi et al., 2019)

类似地，Auto-FPN(Xu等，2019)则借鉴了DARTS的思想，自动搜索检测网络的FPN和head部分。Hit-Detector(Guo等，2020a)通过组稀疏正则化，筛选出backbone、FPN和head合适的子搜索空间，在子搜索空间进行端到端的网络结构搜索。SpineNet(Du等，2020)通过搜索进行骨干网络特征模块的重排列，生成能够跨尺度连接的网络结构。

2.4.2 语义分割

除了目标检测，结构搜索在分割领域(Chen等，2018; Liu等，2019a)和图像生成领域(Gong等，2019)等任务上大放异彩。NAS-FCOS(Wang等，2020)主要利用神经网络结构搜索来搜索FPN和head部分。分割方向主要有Auto-DeepLab(Liu等，2019a)和DPC(dense prediction cell)(Chen等，2018)等。Auto-DeepLab(Liu等，2019a)自动选择骨干网络中进行下采样的路径，搜索空间中包括了DeepLabv3(Chen等，2017)、Conv-Deconv(Noh等，2015)和Stacked Hourglass(Newell等，2016)等手工设计的网络结构，如图 17所示(Liu等，2019a)。

图 17 Auto-DeepLab的搜索空间(Liu等，2019a)

Fig. 17 Search space of Auto-DeepLab(Liu et al., 2019a)

2.4.3 神经网络的加速与压缩

神经网络结构搜索的另一个主要应用方向是神经网络的压缩与加速。结构化剪枝是从训练好的深度网络模型中直接移除某些卷积核，从而减少网络模型的参数，压缩模型的存储空间和加速模型的计算。传统的结构化剪枝通常基于人工设计的准则(Li等，2016；He等，2019)，判断出模型中信息量较少、相对不重要的卷积核，通过将对应的卷积核移除，获得压缩后的模型。这类剪枝策略通常需要人为设定网络每层的压缩，在给定的资源限制下缺乏灵活性，难以准确达到预设的网络整体压缩率。随着结构搜索的兴起，自动搜索的思想逐渐涌入结构化剪枝中。He等人(2018)引入深度强化学习，将网络每层的压缩率定义为连续的动作空间，设计了限制资源和保证精度的奖励函数，通过直接在验证集上快速评估稀疏模型的表现，训练策略网络和评价网络来自动预测网络每层的压缩率。Gordon等人(2018)在损失函数中加入对批量归一化(batch normalization, BN)的训练参数进行资源正则化限制，通过移除BN参数值为0所对应卷积核来自动决定网络不同层的压缩率。NetAdapt(Yang等，2018)提出利用逐步缩紧预算资源的方式自动生成满足目标硬件平台资源限制的网络，如图 18所示。在每次迭代时更新当前预算资源，将网络每层单独看待，基于权重的L2范数修剪单个网络层以满足当前最大资源限制，在目标硬件平台上评估每个网络层的剪枝效果，从中选择精度最高的网络层进行剪枝。

图 18 NetAdapt算法框架(Yang等，2018)

Fig. 18 Framework of NetAdapt(Yang et al., 2018)

Liu等人(2019b)对剪枝网络的通道数进行结构编码，将结构编码输入进一个元网络(PruningNet)以预测剪枝模型的权重，通过生成的权重评估剪枝网络的表现，然后利用进化学习搜索最优的结构编码。Dong和Yang(2019)引入表示网络宽度和深度分布的结构参数进行剪枝结构的搜索，通过gumble-softmax技巧将从分布中采样的不可微操作转变成可微的过程来训练网络结构参数，从训练好的网络结构参数中选择概率最大的宽度或深度的设置构成剪枝网络。Liu等人(2020)基于模拟退火算法提出AutoCompress框架，该框架主要分为两个阶段。第1阶段利用ADMM(alternating direction method of mutlipliers)算法使得权重分布结构性稀疏化删除值为0的权重。第2阶段通过设置阈值删除稀疏化后值较小的冗余权重。两个阶段都由若干次迭代构成，每次迭代会下降温度T并对压缩行为进行采样，接受优的采样结果并以一定概率接受劣的采样结果，直到温度T小于某个阈值则获得剪枝网络。基于结构搜索的模型压缩与加速能够自动化地决定网络每层的压缩率以及需要剪枝的权重，减轻对手动设定大量超参数的依赖。AutoCompress算法框架如图 19所示。

图 19 AutoCompress算法框架(Liu等，2020)

Fig. 19 Framework of AutoCompress(Liu et al., 2020)

Lin等人(2020)引入人工蜂群算法提出了ABC剪枝器，实现剪枝结构的启发式搜索，通过从预训练的模型中随机采样卷积核权重作为剪枝模型的初始化，简化了人工选择压缩率和剪枝权重的过程。

此外，对网络的权重和激活值量化至低比特也能够大幅减少网络存储和加速网络计算。网络结构搜索也可以与量化相结合。Chen等人(2020)将PC-DARTS的搜索策略引入二值化网络的搜索，同时在搜索过程中逐渐删除概率较小的操作以缩小搜索空间。Zhuo等人(2020)提出了child-parent搜索策略，用一个全精度的模型(parent)指导搜索二值化的模型(child)。

3 数据集与方法性能比较

3.1 数据集

3.1.1 图像分类数据集

常见的数据集有MNIST(LeCun等，1998)、CIFAR-10/100(Hinton，2007)和ImageNet(Deng等，2009)等，基本统计信息如表 3所示。

表 3 不同数据集的基本统计信息
Table 3 Statistics information of different dataset

下载CSV

数据集	类别数	训练集	测试集	图像/像素
MNIST	10	5×10⁴	1×10⁴	28×28
CIFAR10	10	5×10⁴	1×10⁴	32×32
CIFAR100	100	5×10⁴	1×10⁴	32×32
ImageNet	1 000	1.28×10⁶	5×10⁵	-
注：“-”表示不固定。

MNIST(LeCun等，1998)是使用最为广泛的手写数字识别数据集，1998年公开作为字符识别算法评测的公共数据集，包含10个类别(手写体数字0~9)、28×28像素的手写体字符灰度图，共有6万训练数据和1万测试数据。LeNet-5是在MNIST上分类错误率达到0.9%的经典模型。

CIFAR(Hinton，2007)是用于小图像分类的数据集，每幅图像仅32×32像素，2009年用于小型彩色图像分类算法评测的公开数据集，其中包括CIFAR-10和CIFAR-100两个版本。CIFAR-10数据集包括10个类别，50 000幅训练图像和10 000幅测试图像。CIFAR-100包括100个类别，50 000幅训练图像和10 000幅测试图像。经典的ResNet-20在CIFAR-10数据集上的分类错误率达到了8.75%。很多神经网络结构搜索的目标都是尽可能提高在CIFAR-10数据集上的分类准确率。

ImageNet(Deng等，2009)是一个较大尺度的图像数据集，2010年首次提出，2012年稳定使用，包括1 000个类别的各种数据，图像具有不同的尺寸，通常统一缩放至256×256像素后进行处理。ImageNet数据集包含14 197 122幅图像和21 841个Synset索引。Synset是WordNet层次结构中的一个节点，是一组同义词集合。ImageNet数据集一直是评估图像分类算法性能的基准，是为了促进计算机图像识别技术的发展设立的一个大型图像数据集，最初具有超过100万幅图像，涵盖了大部分生活中看到的图像类别，并且每幅图像都手工标定类别标签，2016年ImageNet数据集中的图像已经超过千万幅，每年都会举办使用这个巨大数据集的ILSVRC(ImageNet Large Scale Visual Recognition Challenge)图像识别大赛。常用的ImageNet数据集是2012年ILSVRC竞赛使用的ILSVRC2012数据集，包括128万幅左右的训练图像和5万幅测试图像。通常使用验证集上的分类准确度top-1或top-5评价算法的性能。经典的AlexNet(Krizhevsky等，2012)、VGG-16(Simonyan和Zisserman，2014)和ResNet-50(He等，2016)模型在ImageNet数据集上的top-1分类错误率分别为42.24%、31.66%和24.64%，top-5分类错误率分别为11.55%、11.55%和7.76%。

3.1.2 NAS(neural architecture search)数据集

与用于图像分类的数据集不同，NAS-Bench-101(Ying等，2019)、NAS-Bench-201(Dong和Yang，2020)和NAS-Bench-1Shot1(Zela等，2020)数据集是面向神经网络结构搜索的数据集。

NAS-Bench-101数据集中包含了某种设定下的整个搜索空间，有423 624个卷积神经网络结构以及这些结构在CIFAR-10数据集上的多次训练过程和训练结果，制作该数据集的计算量相当于在单个TPU上训练一年的时间。该数据集的目的在于极大地减少神经网络结构性能估计的时间，从而使研究人员专注于搜索策略。整个数据集中网络结构的参数量、验证集上的平均错误率以及训练时间之间的关系如图 20(Ying等，2019)所示。

图 20 NAS-Bench-101数据集中所有网络结构的训练时间、可训练参数以及验证集精度(Ying等，2019)

Fig. 20 Training time vs trainable parameters of all architectures in NAS-Bench-101 dataset, colorcoded by validation accuracy(Ying et al., 2019)

NAS-Bench-201数据集(Dong和Yang，2020)对NAS-Bench-101数据集(Ying等，2019)进行了扩展，NAS-Bench-201数据集具有不同的搜索空间、多个数据集的结果和更多的诊断信息。NAS-Bench-201数据集具有固定的搜索空间，并为几乎所有最新的NAS算法提供了统一的基准。

NAS-Bench-1Shot1(Zela等，2020)则是专门面向one-shot类型神经网络结构搜索的数据集。

3.2 评价准则

在神经网络结构搜索的性能评测上，通常有两个指标，一个是通过NAS搜索出来的网络结构在目标数据集上的性能表现，另一个是搜索过程中需要的计算量。即对NAS的性能追求是“又快又好”，希望在尽可能短的时间内搜索到性能尽可能好的神经网络结构。

网络结构在目标数据集的性能评价准则主要取决于所解决的任务。目前NAS的算法研究面向分类任务的较多，性能评价准则主要是神经网络结构在CIFAR-10或ImageNet数据集上训练到收敛之后，在相应数据集的测试集上的分类正确率。也有一些工作是面向其他计算机视觉任务，例如目标检测和语义分割。目标检测常用COCO(common object in context)等数据集上的平均精准度(average precision，AP)作为评价准则，语义分割则常用PASCAL VOC(pattern analysis, statistical modeling and computational learning visual object classes) 2012，Cityscapes等数据集上的平均交并比(mean intersection over union，mIOU)作为评价准则。除了上述性能指标之外，如果对网络的速度有所要求还需要加上推理时间(inference time)等评价准则。这一类评价准则主要用于评价搜索到的神经网络结构在目标数据集上的性能。

NAS搜索过程中计算量的评价准则常用GPU hours或GPU days，即如果使用单个GPU计算，最后需要花费的小时数或者天数。但由于硬件平台性能、框架或驱动版本的不同，这样的评价指标可能会有偏差。这一类评价准则主要评价NAS搜索算法的速度。

3.3 NAS算法的神经网络结构搜索性能

大部分卷积神经网络的结构搜索都是面向分类任务的，常用的数据集是CIFAR-10和ImageNet。不同的NAS算法在CIFAR-10数据集上的性能对比如表 4所示。可以看出，基于参数共享的one-shot方法ENAS(Pham等，2018)、DARTS(Liu等，2018b)、P-DARTS(Chen等，2019a)和MDENAS(multinomial distribution ENAS)(Zheng等，2019)搜索速度普遍比较快，并且也能找到性能较好的网络结构。同时，基于梯度的优化算法和基于多项式分布的优化算法通常优于基于强化学习的方法。

表 4 不同NAS算法在CIFAR-10数据集的性能
Table 4 Performance of different NAS methods on CIFAR-10 dataset

下载CSV

方法	测试集错误率/%	参数量/M	搜索时间/ GPU days
NASNet-A	2.65	3.3	1 800
AmoebaNet-A	3.34	3.2	3 150
AmoebaNet-B	2.55	2.8	3 150
Hierarchical Evo	3.75	15.7	300
PNAS	3.41	3.2	225
ENAS	2.89	4.6	0.5
DARTS	2.83	3.4	4
MDENAS	2.55	3.6	0.16
P-DARTS	2.5	3.4	0.3
Proxyless-R	2.3	5.8	-
Proxyless-G	2.08	5.7	-
注：“-”表示未知。

NAS算法在ImageNet数据集上的表现如表 5所示。出于时间上的考量，面向ImageNet的搜索通常在基于MobileNet的搜索空间进行，相比NASNet或DARTS的搜索空间，在基于MobileNet的搜索空间上搜索到的网络结构具有更少的计算量，更适合在移动端部署。

表 5 不同NAS算法在ImageNet数据集的性能
Table 5 Performance of different NAS methods on ImageNet dataset

下载CSV

方法	测试集Top1 准确率/%	FLOPs/M	搜索时间/ GPU hours
NASNet-A	74	564	48 000
DARTS	73.1	595	96
MnasNet	74	317	40 000
ChamNet-B	73.8	323	28 000
FBNet-C	74.9	375	216
ProxylessNAS	74.6	320	200
SinglePathNAS	74.7	328	312
AutoSlim	74.2	305	180
MobileNetV3-L	75.2	219	-
OFA	76.4	238	40
FBNetV2-F4	76	238	200
注：“-”表示未知。

4 神经网络结构搜索的相关工具

PyTorch(Paszke等，2019)和TensorFlow(Abadi等，2016)是训练神经网络最常用的工具，可以很容易地搭建神经网络并进行自动梯度推导，同时还可以使用torchprofile对PyTorch的模型进行性能分析。在NAS中需要训练大量的神经网络模型，可以使用Horovod(Sergeev和Del Balso，2018)或者APEX(a PyTorch Extension)等进行分布式训练, 从而提升训练速度。如果是计算机视觉的相关任务可以使用DALI(NVIDIA’s Data Loading Library)将输入图像的前处理放到GPU上执行，从而大幅提高GPU的利用效率。

5 结语

本文描述了神经网络结构搜索相关技术的研究背景，对神经网络结构搜索代表方法进行了详细梳理与总结，回顾了主流神经网络结构搜索方法使用的数据集、评价准则及性能评估，介绍了神经网络结构搜索的常用工具并分析了神经网络结构搜索的未来发展趋势。

神经网络结构搜索的研究起步较晚，从研究成果来看还有很多比较有潜力的方向。1)设定合理的评价指标。分类准确率、mAP(mean average precision)、mIOU等度量标准可以评价网络结构在数据集上的性能，但面向移动平台等有实时性要求的场景，还需要更加统一的评测标准来综合考虑网络结构的推理速度、内存占用等性能指标。除推理速度外，搜索速度同样需要尽可能地在统一平台上进行对比。2)在网络结构的性能评估上依旧比较花费时间，面临着时间与准确度之间的权衡取舍，还需要探索更加兼顾速度和准确度的性能评估策略。3)无监督场景下的NAS方法。监督任务需要大量的标注信息，随着BERT(bidirectional encoder representations from transformers)(Devlin等，2018)等方法的兴起，产生了面向无监督任务的NAS方法的需求。

随着神经网络结构搜索研究的不断深入，希望本文能给当前及未来的研究提供一些帮助。

参考文献

Abadi M, Barham P, Chen J M, Chen Z F, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M, Kudlur M, Levenberg J, Monga R, Moore S, Murray D G, Steiner B, Tucker P, Vanhoucke V, Warden P, Wicke M, Yu Y and Zheng X Q. 2016. Tensorflow: a system for large-scale machine learning//Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation. Savannah, USA: USENIX Association: 265-283

Angeline P J, Saunders G M, Pollack J B. 1994. An evolutionary algorithm that constructs recurrent neural networks. IEEE Transactions on Neural Networks, 5(1): 54-65 [DOI:10.1109/72.265960]

Baker B, Gupta O, Raskar R and Naik N. 2017. Accelerating neural architecture search using performance prediction[EB/OL].[2020-05-21]. https://arxiv.org/pdf/1705.10823.pdf

Bergstra J, Bardenet R, Bengio Y and Kégl B. 2011. Algorithms for hyper-parameter optimization//Proceedings of the 24th International Conference on Neural Information Processing Systems. Granada, Spain: Curran Associates Inc.: 2546-2554

Bergstra J, Bengio Y. 2012. Random search for hyper-parameter optimization. Journal of Machine Learning Research, 13: 281-305

Bergstra J, Yamins D and Cox D D. 2013. Making a science of model search: hyperparameter optimization in hundreds of dimensions for vision architectures//Proceedings of the 30th International Conference on Machine Learning (ICML 2013). Atlanta, Gerorgia: ICML: 115-123

Cai H, Chen T Y, Zhang W N, Yu Y and Wang J. 2018a. Efficient architecture search by network transformation//Proceedings of the 32nd AAAI Conference on Artificial Intelligence. New Orleans, Louisiana, USA: AAAI: 2787-2794

Cai H, Gan C, Wang T, Zhang Z and Han S. 2019. Once-for-All: train one network and specialize it for efficient deployment[EB/OL].[2020-04-24]. https://arxiv.org/pdf/1908.09791.pdf

Cai H, Zhu L and Han S. 2018b. ProxylessNAS: direct neural architecture search on target task and hardware[EB/OL].[2020-04-24]. https://arxiv.org/pdf/1812.00332.pdf

Chen H, Zhuo L, Zhang B C, Zheng X W, Liu J Z, Ji R R, David D. 2020. Binarized neural architecture search for efficient object recognition. International Journal of Computer Vision: 1-16

Chen L C, Collins M D, Zhu Y K, Papandreou G, Zoph B, Schroff F, Adam H and Shlens J. 2018. Searching for efficient multi-scale architectures for dense image prediction//Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montreal, Canada: Curran Associates Inc.: 8713-8724

Chen L C, Papandreou G, Schroff F and Adam H. 2017. Rethinking atrous convolution for semantic image segmentation[EB/OL].[2020-04-24]. https: //arxiv.org/pdf/1706.05587.pdf

Chen X, Xie L X, Wu J and Tian Q. 2019a. Progressive differentiable architecture search: bridging the depth gap between search and evaluation//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Seoul, South Korea: IEEE: 1294-1303[DOI: 10.1109/ICCV.2019.00138]

Chen Y K, Yang T, Zhang X Y, Meng G F, Xiao X Y and Sun J. 2019b. DetNAS: backbone search for object detection//Proceedings of the 33rd International Conference on Neural Information Processing Systems. Vancouver, Canada: NIPS: 6638-6648

Chrabaszcz P, Loshchilov I and Hutter F. 2017. A downsampled variant of ImageNet as an alternative to the CIFAR datasets[EB/OL].[2020-04-24]. https://arxiv.org/pdf/1707.08819.pdf

Dalal N and Triggs B. 2005. Histograms of oriented gradients for human detection//Proceedings of 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05). San Diego, USA: IEEE: 886-893[DOI: 10.1109/CVPR.2005.177]

Deng J, Dong W, Socher R, Li L J, Li K and Li F F. 2009. ImageNet: a large-scale hierarchical image database//Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami, USA: IEEE: 248-255[DOI: 10.1109/CVPR.2009.5206848]

Deng L, Li J Y, Huang J T, Yao K S, Yu D, Seide F, Seltzer M, Zweig G, He X D, Williams J, Gong Y F and Acero A. 2013. Recent advances in deep learning for speech research at Microsoft//Proceedings of 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. Vancouver, Canada: IEEE: 8604-8608[DOI: 10.1109/ICASSP.2013.6639345]

Devlin J, Chang M W, Lee K and Kristina T. 2019. BERT: pre-training of deep bidirectional transformers for language understanding[EB/OL].[2020-04-24]. https://arxiv.org/pdf/1810.04805.pdf

Domhan T, Springenberg J T and Hutter F. 2015. Speeding up automatic hyperparameter optimization of deep neural networks by extrapolation of learning curves//Proceedings of the 24th International Conference on Artificial Intelligence. Buenos Aires, Argentina: AAAI: 3460-3468

Dong X Y and Yang Y. 2019. Network pruning via transformable architecture search//Proceedings of the 33rd Conference on Neural Information Processing Systems. Vancouver, Canada: NIPS: 759-770

Dong X and Yang Y. 2020. NAS-Bench-201: Extending the Scope of Reproducible Neural Architecture Search[EB/OL].[2020-04-24]. https://arxiv.org/pdf/2001.00326.pdf

Du X Z, Lin T Y, Jin P C, Ghiasi G, Tan M X, Cui Y, Le Q V and Song X D. 2020. SpineNet: learning scale-permuted backbone for recognition and localization//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE: 11589-11598[DOI: 10.1109/CVPR42600.2020.01161]

Elsken T, Metzen J H, Hutter F. 2018. Efficient multi-objective neural architecture search via Lamarckian evolution[EB/OL].[2020-04-24]. https://arxiv.org/pdf/1804.09081.pdf

Elsken T, Metzen J H, Hutter F. 2019. Neural architecture search: a survey. Journal of Machine Learning Research, 20: 1-21

Floreano D, Dürr P, Mattiussi C. 2008. Neuroevolution: from architectures to learning. Evolutionary Intelligence, 1(1): 47-62 [DOI:10.1007/s12065-007-0002-4]

Ghiasi G, Lin T Y and Le Q V. 2019. NAS-FPN: learning scalable feature pyramid architecture for object detection//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE: 7029-7038[DOI: 10.1109/CVPR.2019.00720]

Gong X Y, Chang S Y, Jiang Y F and Wang Z Y. 2019. AutoGAN: neural architecture search for generative adversarial networks//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Seoul, South Korea: IEEE: 3223-3233[DOI: 10.1109/ICCV.2019.00332]

Gordon A, Eban E, Nachum O, Chen B, Wu H, Yang T J and Chio E. 2018. Morphnet: fast and simple resource-constrained structure learning of deep networks//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 1586-1595[DOI: 10.1109/CVPR.2018.00171]

Guo J Y, Han K, Wang Y H, Zhang C, Yang Z H, Wu H, Chen X H and Xu C. 2020a. Hit-Detector: hierarchical trinity architecture search for object detection//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE: 11402-11411[DOI: 10.1109/CVPR42600.2020.01142]

Guo Z C, Zhang X Y, Mu H Y, Heng W, Liu Z C, Wei Y C and Sun J. 2020b. Single path one-shot neural architecture search with uniform sampling//Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer: 544-560[DOI: 10.1007/978-3-030-58517-4_32]

He K M, Zhang X Y, Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA: IEEE: 770-778[DOI: 10.1109/CVPR.2016.90]

He Y, Liu P, Wang Z W, Hu Z L and Yang Y. 2019. Filter pruning via geometric median for deep convolutional neural networks acceleration//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE: 4340-4349[DOI: 10.1109/CVPR.2019.00447]

He Y H, Lin J, Liu Z J, Wang H R, Li L J and Han S. 2018. AMC: automl for model compression and acceleration on mobile devices//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 815-832[DOI: 10.1007/978-3-030-01234-2_48]

Hinton G E. 2007. Learning multiple layers of representation. Trends in Cognitive Sciences, 11(10): 428-434 [DOI:10.1016/j.tics.2007.09.004]

Howard A, Sandler M, Chen B, Wang W J, Chen L C, Tan M X, Chu G, Vasudevan V, Zhu Y K, Pang R M, Adam H and Le Q. 2019. Searching for mobilenetv3//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Seoul, South Korea: IEEE: 1314-1324[DOI: 10.1109/ICCV.2019.00140]

Howard A G, Zhu M L, Chen B, Kalenichenko D, Wang W J, Weyand T, Andreetto M and Adam H. 2017. Mobilenets: efficient convolutional neural networks for mobile vision applications[EB/OL].[2020-04-24]. https://arxiv.org/pdf/1704.04861.pdf

Hu J, Shen L and Sun G. 2018. Squeeze-and-excitation networks//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 7132-7141[DOI: 10.1109/CVPR.2018.00745]

Huang G, Liu Z, Van Der Maaten L and Weinberger K Q. 2017. Densely connected convolutional networks//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE: 4700-4708[DOI: 10.1109/CVPR.2017.243]

Ioffe S and Szegedy C. 2015. Batch normalization: accelerating deep network training by reducing internal covariate shift//Proceedings of the 32nd International Conference on Machine Learning. Lile, France: ICML: 448-456

Jozefowicz R, Zaremba W and Sutskever I. 2015. An empirical exploration of recurrent network architectures//Proceedings of the 32nd International Conference on Machine Learning. Lille, France: ICML: 2342-2350

Klein A, Falkner S, Springenberg J T and Hutter F. 2016. Learning curve prediction with Bayesian neural networks[EB/OL].[2020-04-24]. https://openreview.net/forum?id=S11KBYclx

Krizhevsky A, Sutskever I and Hinton G E. 2012. ImageNet classification with deep convolutional neural networks//Proceedings of the 25th International Conference on Neural Information Processing Systems. Lake Tahoe, USA: Curran Associates Inc.: 1097-1105

Kalchbrenner N, Grefenstette E and Blunsom P. 2014. A convolutional neural network for modelling sentences[EB/OL].[2020-04-24]. https://arxiv.org/pdf/1404.2188.pdf

LeCun Y, Bengio Y, Hinton G. 2015. Deep learning. Nature, 521(7553): 436-444 [DOI:10.1038/nature14539]

LeCun Y, Bottou L, Bengio Y and Huffner P. 1998. Gradient-based learning applied to document recognition//Proceedings of 1998 IEEE, 86(11): 2278-2324[DOI: 10.1109/5.726791]

Li H, Kadav A, Durdanovic I, Samet H and Graf H P. 2016. Pruning filters for efficient convnets[EB/OL].[2020-04-24]. https://arxiv.org/pdf/1608.08710.pdf

Li L S, Jamieson K, DeSalvo G, Rostamizadeh A, Talwalkar A. 2017. Hyperband: a novel bandit-based approach to hyperparameter optimization. The Journal of Machine Learning Research, 18(1): 6765-6816

Lin M B, Ji R R, Zhang Y X, Zhang B C, Wu Y J and Tian Y H. 2020. Channel pruning via automatic structure search[EB/OL].[2020-04-24]. https://arxiv.org/pdf/2001.08565.pdf

Liu C X, Chen L C, Schroff F, Adam H, Hua W, Yuille A L and Li F F. 2019a. Auto-DeepLab: hierarchical neural architecture search for semantic image segmentation//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE: 82-92[DOI: 10.1109/CVPR.2019.00017]

Liu C X, Zoph B, Neumann M, Shlens J, Hua W, Li L J, Li F F, Yuille A, Huang J and Murphy K. 2018a. Progressive neural architecture search//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 19-35[DOI: 10.1007/978-3-030-01246-5_2]

Liu H, Simonyan K and Yang Y. 2018b. DARTS: differentiable architecture search[EB/OL].[2020-04-24]. https://arxiv.org/pdf/1806.09055.pdf

Liu N, Ma X L, Xu Z Y, Wang Y Z, Tang J and Ye J P. 2020. AutoCompress: an automatic DNN structured pruning framework for ultra-high compression rates//Proceedings of the 34th AAAI Conference on Artificial Intelligence, the 32nd Conference on Innovative Applications of Artificial Intelligence, the 10th Symposium on Educational Advances in Artificial Intelligence. Palo Alto, USA: AAAI: 4876-4883[DOI: 10.1609/aaai.v34i04.5924]

Liu Z C, Mu H Y, Zhang X Y, Guo Z C, Yang X, Cheng K T and Sun J. 2019b. MetaPruning: meta learning for automatic neural network channel pruning//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Seoul, South Korea: IEEE: 3296-3305[DOI: 10.1109/ICCV.2019.00339]

Liu W, Anguelov D, Erhan D, Christian S, Scott R, Fu C Y and Alexander C. 2016. SSD: single shot multibox detector//Proceedings of 2016 European Conference on Computer Vision. Amsterdam, The Netherlands: Springer: 21-37

Long J, Shelhamer E and Darrell T. 2015. Fully convolutional networks for semantic segmentation//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 3431-3440

Lowe D G. 2004. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2): 91-110 [DOI:10.1023/B:VISI.0000029664.99615.94]

Newell A, Yang K Y and Deng J. 2016. Stacked hourglass networks for human pose estimation//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, The Netherlands: Springer: 483-499[DOI: 10.1007/978-3-319-46484-8_29]

Noh H, Hong S and Han B Y. 2015. Learning deconvolution network for semantic segmentation//Proceedings of 2015 IEEE International Conference on Computer Vision (ICCV). Santiago, Chile: IEEE: 1520-1528[DOI: 10.1109/ICCV.2015.178]

Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z M, Gimelshein N, Antiga L, Desmaison A, Köpf A, Yang E, DeVito Z, Raison M, Tejani A, Chilamkurthy S, Steiner B, Fang L, Bai J J and Chintala S. 2019. PyTorch: an imperative style, high-performance deep learning library[EB/OL].[2020-04-24]. https://arxiv.org/pdf/1802.03268v2.pdf

Pham H, Guan M, Zoph B, Le Q V and Jeff D. 2018. Efficient Neural Architecture Search via Parameters Sharing[EB/OL].[2020-04-24]. https://arxiv.org/pdf/1802.03268v2.pdf

Rawal A and Miikkulainen R. 2018. From nodes to networks: evolving recurrent neural networks[EB/OL].[2020-04-24]. https://arxiv.org/pdf/1803.04439.pdf

Real E, Aggarwal A, Huang Y P and Le Q V. 2019. Regularized evolution for image classifier architecture search//Proceedings of the 33rd AAAI Conference on Artificial Intelligence. Palo Alto, USA: AAAI: 4780-4789[DOI: 10.1609/aaai.v33i01.33014780]

Redmon J, Divvala S, Girshick R and Farhadi A. 2016. You only look once: unified, real-time object detection//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 779-788

Sandler M, Howard A, Zhu M L, ZhmoginovA and Chen L C. 2018. Mobilenetv2: inverted residuals and linear bottlenecks//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 4510-4520[DOI: 10.1109/CVPR.2018.00474]

Saxena S and Verbeek J. 2016. Convolutional neural fabrics//Proceedings of the 30th International Conference on Neural Information Processing Systems. Barcelona, Spain: Curran Associates Inc.: 4053-4061

Sergeev A and Del Balso M. 2018. Horovod: fast and easy distributed deep learning in TensorFlow[EB/OL].[2020-04-24]. https://arxiv.org/pdf/1802.05799.pdf

Simonyan K and Zisserman A. 2014. Very deep convolutional networks for large-scale image recognition[EB/OL].[2020-04-24]. https://arxiv.org/pdf/1409.1556.pdf

Singh P, Verma V K, Rai P and Vinay P. 2019. Hetconv: heterogeneous kernel-based convolutions for deep CNNs//Proceedings of 2019 IEEE Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 4835-4844

Snoek J, Larochelle H and Adams R P. 2012. Practical bayesian optimization of machine learning algorithms//Proceedings of the 25th International Conference on Neural Information Processing Systems. Lake Tahoe, USA: Curran Associates Inc.: 2951-2959

Snoek J, Rippel O, Swersky K, Kiros R, Satish N, Sundaram N, Patwary M M A, Prabhat and Adams R P. 2015. Scalable bayesian optimization using deep neural networks//Proceedings of the 32nd International Conference on Machine Learning. Lille, France: ICML: 2171-2180

Stamoulis D, Ding R Z, Wang D, Lymberopoulos D, Priyantha B, Liu J and Marculescu D. 2020. Single-path NAS: designing hardware-efficient ConvNets in less than 4 hours//Proceedings of 2019 European Conference on Machine Learning and Knowledge Discovery in Databases. Würzburg, Germany: Springer: 481-497[DOI: 10.1007/978-3-030-46147-8_29]

Stanley K O, D'Ambrosio D B, Gauci J. 2009. A hypercube-based encoding for evolving large-scale neural networks. Artificial Life, 15(2): 185-212 [DOI:10.1162/artl.2009.15.2.15202]

Stanley K O, Miikkulainen R. 2002. Evolving neural networks through augmenting topologies. Evolutionary Computation, 10(2): 99-127 [DOI:10.1162/106365602320169811]

Sun Ke, Li M J, Liu D and Wang J D. 2018. Igcv3: interleaved low-rank group convolutions for efficient deep neural networks[EB/OL].[2020-04-24]. https://arxiv.org/pdf/1806.00178.pdf

Swersky K, Snoek J and Adams R P. 2014. Freeze-thaw Bayesian optimization[EB/OL].[2020-04-24]. https://arxiv.org/pdf/1406.3896.pdf

Szegedy C, Ioffe S, Vanhoucke V and Alemi A A. 2017. Inception-v4, inception-ResNet and the impact of residual connections on learning//Proceedings of the 31st AAAI Conference on Artificial Intelligence. San Francisco, USA: AAAI: 4278-4284

Szegedy C, Liu W, Jia Y Q, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V and Rabinovich A. 2015. Going deeper with convolutions//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston, USA: IEEE: 1-9[DOI: 10.1109/CVPR.2015.7298594]

Szegedy C, Vanhoucke V, Ioffe S, Shlens J and Wojna Z. 2016. Rethinking the inception architecture for computer vision//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA: IEEE: 2818-2826[DOI: 10.1109/CVPR.2016.308]

Tan M X, Chen B, Pang R M, Vasudevan V, Sandler M, Howard A and Le Q V. 2019. MnasNet: platform-aware neural architecture search for mobile//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE: 2820-2828[DOI: 10.1109/CVPR.2019.00293]

Wan A, Dai X L, Zhang P Z, He Z J, Tian Y D, Xie S N, Wu B C, Yu M, Xu T, Chen K, Vajda P and Gonzalez J E. 2020. FBNetV2: differentiable neural architecture search for spatial and channel dimensions//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE: 12962-12971[DOI: 10.1109/CVPR42600.2020.01298]

Wang N, Gao Y, Chen H, Wang P, Tian Z, Shen C H and Zhang Y N. 2020. NAS-FCOS: fast neural architecture search for object detection//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE: 11940-11948[DOI: 10.1109/CVPR42600.2020.01196]

Wu B C, Dai X L, Zhang P Z, Wang Y H, Sun F, Wu Y M, Tian Y D, Vajda P, Jia Y Q and Keutzer K. 2019. FBNet: hardware-aware efficient ConvNet design via differentiable neural architecture search//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE: 10734-10742[DOI: 10.1109/CVPR.2019.01099]

Xie S N, Girshick R, Dollár P, Tu Z W and He K M. 2017. Aggregated residual transformations for deep neural networks//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE: 5987-5995[DOI: 10.1109/CVPR.2017.634]

Xie S, Zheng H, Liu C and Lin L. 2018. SNAS: stochastic neural architecture search[EB/OL].[2020-04-24]. https://arxiv.org/pdf/1812.09926.pdf

Xu H, Yao L W, Li Z G, Liang X D and Zhang W. 2019. Auto-FPN: automatic network architecture adaptation for object detection beyond classification//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Seoul, South Korea: IEEE: 6648-6657[DOI: 10.1109/ICCV.2019.00675]

Yang T J, Howard A, Chen B, Zhang X, Go A, Sandler M, Sze V and Adam H. 2018. NetAdapt: platform-aware neural network adaptation for mobile applications//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 289-304[DOI: 10.1007/978-3-030-01249-6_18]

Ying C, Klein A, Christiansen E, Murphy K and Hutter F. 2019. NAS-Bench-101: Towards Reproducible Neural Architecture Search//Proceedings of the 36th International Conference on Machine Learning. Lugano, Switzerland: ICML: 63-77

Yu J H and Huang T. 2019a. Universally slimmable networks and improved training techniques//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Seoul, South Korea: IEEE: 1803-1811[DOI: 10.1109/ICCV.2019.00189]

Yu J H and Huang T. 2019b. AutoSlim: towards one-shot architecture search for channel numbers[EB/OL].[2020-04-24]. https://arxiv.org/pdf/1903.11728.pdf

Yu J H, Yang L J, Xu N, Yang J C and Huang T. 2018. Slimmable Neural Networks[EB/OL].[2020-04-24]. https://arxiv.org/pdf/1812.08928.pdf

Zela A, Klein A, Falkner S and Hutter F. 2018. Towards automated deep learning: efficient joint neural architecture and hyperparameter search[EB/OL].[2020-04-24]. https://arxiv.org/pdf/1807.06906.pdf

Zela A, Siems J and Hutter F. 2019. NAS-Bench-1Shot1: Benchmarking and Dissecting One-shot Neural Architecture Search[EB/OL].[2020-04-24]. https://arxiv.org/pdf/2001.10422.pdf

Zheng X W, Ji R R, Tang L, Zhang B C, Liu J Z and Tian Q. 2019. Multinomial distribution learning for effective neural architecture search//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Seoul, South Korea: IEEE: 1304-1313[DOI: 10.1109/ICCV.2019.00139]

Zhuo L A, Zhang B C, Chen H L, Yang L L, Chen C, Zhu Y J and Doermann D. 2020. CP-NAS: child-parent neural architecture search for binary neural networks[EB/OL].[2020-04-24]. https://arxiv.org/pdf/2005.00057

Zoph B and Le Q V. 2016. Neural architecture search with reinforcement learning[EB/OL].[2020-04-24]. https://arxiv.org/pdf/1611.01578

Zoph B, Vasudevan V, Shlens J and Le Q V. 2018. Learning transferable architectures for scalable image recognition//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 8697-8710[DOI: 10.1109/CVPR.2018.00907]