Current Issue Cover
端到端自动驾驶系统研究综述

陈妍妍, 田大新, 林椿眄, 殷鸿博(北京航空航天大学)

摘 要
近年来,深度学习技术助力端到端自动驾驶框架的发展和进步,并涌现出一系列创新研究议题与应用部署方案。本文首先以经典的模块化系统切入,对自动驾驶感知-预测-规划-决策四大功能模块进行简要概述,分析传统的模块化和多任务方法的局限性;其次从输入-输出模态到系统架构角度对当前新兴的端到端自动驾驶框架进行广泛地调研,详细描述弱解释性端到端与模块化联合端到端两大主流范式,深入探究现有研究工作存在的不足和弊端;再次简单介绍了端到端自动驾驶系统的开环-闭环评估方法及适用场景;最后总结了端到端自动驾驶系统的研究工作,并从数据挖掘和架构设计角度展望领域潜在挑战和亟待解决的关键问题。
关键词
A survey of end-to-end autonomous driving system

Chen Yanyan, Tian Daxin, Lin Chunmian, Yin Hongbo(Beihang university)

Abstract
Deep learning technologies have accelerated the development and advancement of end-to-end autonomous driving frameworks in recent years, sparking the emergence of a number of cutting-edge research topics and application deployment solutions. The "divide and conquer" architecture design concept, which aims to construct multiple independent but related module components, integrate them into the developed software system in a specific semantic or geometric order, and ultimately deploy them to the actual vehicle, is the foundation for the majority of the autonomous driving systems currently in use, also known as modular systems. However, a well-developed modular design typically consists of thousands of components, placing a significant burden on the graphics memory and processing capacity of automotive CPUs. Furthermore, the intrinsic mistakes of each stacked module during prediction will rise with the number of stacked modules, and upstream flaws cannot be fixed in downstream modules, which presents a significant risk to vehicle safety. To reduce computational consumption, a multi-task architecture based on the "task parallelism" principle aims to efficiently infer multiple tasks in parallel by designing a various decoded heads with a shared backbone network. Nevertheless, there"s a chance that the optimization goals for various tasks won"t line up, and sharing features mindlessly can even degrade the system"s overall performance. The end-to-end technology paradigm, in contrast to the previous two system architectures, eliminates information bottlenecks and cumulative errors brought about by integrating numerous intermediate components based on rule interfaces, allowing the network to continually optimize towards a unified objective. Using a large model to generate low-level control signals or vehicle motion planning based on inputs like sensor data and vehicle status allows for this to be done. With sensors serving as inputs, the early end-to-end design based on imitation and reinforcement learning directly outputs the final control commands for steering, braking, and acceleration. Nevertheless, because there is no explicit representation of driving scenarios in this completely "black box" network—also referred to as weakly interpretable end-to-end methods—it is difficult for humans to understand the reasoning behind a vehicle"s decision or prediction, making it challenging to debug, validate, and optimize. Even worse, once the model malfunctions or unexpected situations occur, it becomes difficult to accurately detect, avoid, and repair problems in a timely manner—all of which are crucial for maintaining the safe operation of intelligent vehicles. The component decoupling approach facilitates the development and optimization of individual modules in the conventional modular system, hence guaranteeing steady representation performance and strong interpretability of each submodule. Regrettably, this method falls short of achieving unified goals at the optimization level, i.e. integrating optimization and learning toward the ultimate planning goal. In order to ensure that every module has sufficient interpretability and overall automatic optimization capabilities, a modular joint end-to-end autonomous driving architecture that preserves the modular driving system while allowing each module to be differentiable seems like a workable solution. The basic idea behind this technology is to create a unique neural network that connects all independent modules and enables the gradients from the planning modules to be fed back down to the initial sensor input for end-to-end execution. In other words, this kind of approach merely modify the submodule connection mechanism while maintaining the classic modular technology stack; that is, it substitutes a new implicit interface for the previous explicit interfaces, which were rule-based and required manual creation. Modular joint end-to-end procedures offer a certain interpretability because of the distinct separation between modules. Essentially, the explicit end-to-end system is a relative decoupling based on overall design, and it exhibits some degree of logic in its sequential functioning from perception to prediction, and then to planning modules during decision inference. When the model encounters unknown and uncontrollable results, it can be intentionally adjusted by understanding the operational logic underlying the explicit solution. Furthermore, visualization methods, such as internal features or intermediate results of specific tasks or modules, can be utilized to analyze the decision-making operation mechanism, which can prevent potential risks caused by black box models and ensure the safe and efficient driving of intelligent vehicles. Given this, this article conducts comprehensive analysis and research on the emerging field of end-to-end autonomous driving with promising development prospects, which summarizes the main technical routes and representative research methods around the development path of end-to-end driving systems. More specifically, this article, which begins with the classic modular system, analyzes the shortcomings of conventional modular and multitasking approaches while providing a brief introduction to the four functional modules of the autonomous driving system, which primarily include perception, prediction, planning, and decision making. Subsequently, extensive research was investigated on the emerging end-to-end autonomous driving frameworks from the perspective of input-output modality to system architecture, describing in detail the two dominant paradigms and delving into the shortcomings and drawbacks of existing research work. The existing end-to-end architecture can be categorized into two categories based on interpretable performance: weakly interpretable end-to-end, which is explored from the aspects of imitation learning, reinforcement learning, and interpretability; or modular joint end-to-end, which is progressively investigated from BEV (Bird’s Eye View) representation, to joint perception prediction, and ultimately, planning oriented end-to-end methods. After that, a thorough discussion of the end-to-end driving system assessment is provided for both closed-loop and open-loop evaluations, along with the corresponding situations. Finally, the research works on end-to-end autonomous driving systems are summarized, and the potential challenges and key problems that still need to be overcome are prospected from the perspectives of data mining and architecture design.
Keywords

订阅号|日报