唐浪,李慧霞,颜晨倩,郑侠武,纪荣嵘(厦门大学信息学院人工智能系媒体分析与计算实验室, 厦门 361005)
深度神经网络在图像识别、语言识别和机器翻译等人工智能任务中取得了巨大进展，很大程度上归功于优秀的神经网络结构设计。神经网络大都由手工设计，需要专业的机器学习知识以及大量的试错。为此，自动化的神经网络结构搜索成为研究热点。神经网络结构搜索（neural architecture search，NAS）主要由搜索空间、搜索策略与性能评估方法3部分组成。在搜索空间设计上，出于计算量的考虑，通常不会搜索整个网络结构，而是先将网络分成几块，然后搜索块中的结构。根据实际情况的不同，可以共享不同块中的结构，也可以对每个块单独搜索不同的结构。在搜索策略上，主流的优化方法包含强化学习、进化算法、贝叶斯优化和基于梯度的优化等。在性能评估上，为了节省计算时间，通常不会将每一个网络都充分训练到收敛，而是通过权值共享、早停等方法尽可能减小单个网络的训练时间。与手工设计的网络相比，神经网络结构搜索得到的深度神经网络具有更好的性能。在ImageNet分类任务上，与手工设计的MobileNetV2相比，通过神经网络结构搜索得到的MobileNetV3减少了近30%的计算量，并且top-1分类精度提升了3.2%；在Cityscapes语义分割任务上，与手工设计的DeepLabv3+相比，通过神经网络结构搜索得到的Auto-DeepLab-L可以在没有ImageNet预训练的情况下，达到比DeepLabv3+更高的平均交并比（mean intersection over union，mIOU），同时减小一半以上的计算量。神经网络结构搜索得到的深度神经网络通常比手工设计的神经网络有着更好的表现，是未来神经网络设计的发展趋势。
Survey on neural architecture search
Tang Lang,Li Huixia,Yan Chenqian,Zheng Xiawu,Ji Rongrong(Media Analytics and Computing Laboratory, Department of Artificial Intelligence, School of Informatics, Xiamen University, Xiamen 361005, China)
Deep neural networks(DNNs) have achieved remarkable progress over the past years on a variety of tasks, such as image recognition, speech recognition, and machine translation. One of the most crucial aspects for this progress is novel neural architectures, in which hierarchical feature extractors are learned from data in an end-to-end manner rather than manually designed. Neural network training can be considered an automatic feature engineering process, and its success has been accompanied by an increasing demand for architecture engineering. At present, most neural networks are developed by human experts; however, the process involved is time-consuming and error-prone. Consequently, interest in automated neural architecture search methods has increased recently. Neural architecture search can be regarded as a subfield of automated machine learning, and it significantly overlaps with hyperparameter optimization and meta learning. Neural architecture search can be categorized into three dimensions: search space, search strategy, and performance estimation strategy. The search space defines which architectures can be represented in principle. The choice of search space largely determines the difficulty of optimization and search time. To reduce search time, neural architecture search is typically not applied to the entire network, but instead, the neural network is divided into several blocks and the search space is designed inside the blocks. All the blocks are combined into a whole neural network by using a predefined paradigm. In this manner, the search space can be significantly reduced, saving search time. In accordance with different situations, the architecture of the searched block can be shared or not. If the architecture is not shared, then every block has a unique architecture; otherwise, all the blocks in the neural network exhibit the same architecture. In this manner, search time can be further reduced. The search strategy details how the search space can be explored. Many search strategies can be used to explore the space of neural architectures, including random search, reinforcement learning, evolution algorithm, Bayesian optimization, and gradient-based optimization. A search strategy encompasses the classical exploration-exploitation trade-off. The objective of neural architecture search is typically to find architectures that achieve high predictive performance on unseen data. Performance estimation refers to the process of estimating this performance. The most direct approach is performing complete training and validation of the architecture on target data. This technique is extremely time-consuming, in the order of thousands of graphics processing unit (GPU) days. Thus, we generally do not train each candidate to converge. Instead, methods, such as like weight sharing, early stopping, or searching smaller proxy datasets, are used in the performance estimation strategy, considerably reducing training time for each candidate architecture performance estimation. Weight sharing can be achieved by inheriting weights from pretrained models or searching a one-shot model, whose weights are then shared across different architectures that are merely subgraphs of the one-shot model. The early stopping method estimates performance in accordance with the early stage validation result via learning curve extrapolation. Training on a smaller proxy dataset finds a neural architecture on a small dataset, such as CIFAR-10. Then, the architecture is trained on the target large dataset, such as ImageNet. Compared with neural networks developed by human experts, models found via neural architecture search exhibit better performance on various tasks, such as image classification, image detection, and semantic segmentation. For the ImageNet classification task, for example, MobileNetV3, which was found via neural architecture search, reduced approximately 30% FLOPs compared with the MobileNetV2, which was designed by human experts, with more 3.2% top-1 accuracy. For the Cityscapes segmentation task, Auto-DeepLab-L found via neural architecture search has exhibited better performance than DeepLabv3+, with only half multi-adds. In this survey, we propose several neural architecture methods and applications, demonstrating that neural networks found via neural architecture search outperform manually designed architectures on certain tasks, such as image classification, object detection, and semantic segmentation. However, insights into why specific architectures work efficiently remain minimal. Identifying common motifs, providing an understanding why these motifs are important for high performance, and investigating whether these motifs can be generalized over different problems will be desirable.