无人驾驶突发紧要场景下基于平行视觉的风险增强感知方法

苟超; 刘欣欣; 郭子鹏; 周昱臣; 王飞跃

发布时间： 2024-03-07
摘要点击次数： 149
全文下载次数： 211
DOI: :10.11834/jig.230748
| Volume | Number

无人驾驶突发紧要场景下基于平行视觉的风险增强感知方法

苟超¹, 刘欣欣¹, 郭子鹏¹, 周昱臣¹, 王飞跃²(1.中山大学智能工程学院;2.中国科学院自动化研究所复杂系统管理与控制国家重点实验室)

摘要

目的随着视觉感知技术的快速发展，无人驾驶已经可以应用于简单场景。但是在实际的复杂城市道路应用中，仍然存在一些挑战，尤其是在其他车辆的突然变道、行人的闯入、障碍物的出现等突发紧要场景。然而，真实世界中此类紧要场景数据存在长尾分布问题，导致数据驱动为主的无人驾驶风险感知面临技术瓶颈，因此，本文提出一种基于平行视觉的风险增强感知方法。方法该方法基于交互式ACP理论，在平行视觉框架下整合描述、指示、预测智能，实现基于视觉的风险增强感知。具体的，基于描述与指示学习，在人工图像系统中引入改进扩散模型，设计背景自适应模块以及特征融合编码器，通过控制生成行人等危险要素的具体位置，实现突发紧要场景风险序列的可控生成；其次，采用基于空间规则的方法，提取交通实体之间的空间关系和交互关系，实现认知场景图的构建；最后，在预测学习框架下，提出了一种新的基于图模型的风险增强感知方法，融合关系图注意力网络和Transformer编码器模块对场景图序列数据进行时空建模，最终实现风险的感知与预测。结果为验证提出方法的有效性，在3个数据集上（MRSG-144、IESG、1043-carla-sg）与5种主流风险感知方法进行了对比实验。提出的方法在三个数据集上分别取得了0.956、0.944、0.916的F1-score，均超越了现有主流方法，达到最优结果。结论本文是平行视觉在无人驾驶风险感知领域的实际应用，对于提高无人驾驶的复杂交通场景风险感知能力，保障无人驾驶系统的安全性具有重要意义。

关键词

无人驾驶平行视觉认知场景图扩散生成风险感知

Enhanced risk perception method based on parallel vision for autonomous vehicles in safety-critical scenarios

Gou Chao, Liu Xinxin¹, Guo Zipeng, Zhou Yuchen, Wang Feiyue²(1.School of Intelligent Systems Engineering, Sun Yat-sen University;2.The State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences)

Abstract

Objective With the rapid development of visual perception technology, autonomous driving can already be applied to simple scenarios. However, in actual complex urban road applications, especially in safety-critical scenarios such as sudden lane changes by other vehicles, intrusion of pedestrians, and the appearance of obstacles, there are still some challenges that need to be solved. First, most existing autonomous driving systems still use the vast majority of daily natural scenes or heuristically generated adversarial scenes for training and evaluation. Among them, safety-critical scenarios play an important role in the safety performance evaluation of autonomous driving systems, that is, a collection of scenes in areas near where cars are in danger of collision, especially scenes involving vulnerable traffic groups such as pedestrians. However, this type of scenario generally has a low probability of occurring in the real world and such critical scene data has a long-tail distribution problem, causing data-driven autonomous driving risk perception to face technical bottlenecks. Secondly, it is difficult to create new scenes using current scene generation methods or virtual simulation scene automatic generation frameworks based on certain rules, and the generated driving scenes are often not realistic enough and lack a certain degree of diversity. In contrast, the scene generation method based on the diffusion model can not only fully explore the characteristics of real data and supplement the gaps in the existing collected real data, but also the scene generation is interpretable and controllable. In addition, in safety-critical scenarios, we still face the difficult problem of limited system risk perception capabilities. For risk-aware safety assessment technology, traditional methods based on convolutional neural networks can achieve simple extraction of features of each object in the scene, but cannot obtain higher-level semantic information, that is, the relationship between various traffic entities. Obtaining such high-level information remains a challenge since most potential risks are hidden at the semantic and behavioral levels. Autonomous driving risk assessment based on traffic scene graphs has become a hot research topic in recent years. By constructing and analyzing traffic scene graphs and capturing the relationships and interactions in the traffic scene as a whole, potential risks can be better understood and predicted, providing a basis for autonomous driving. The system provides more accurate decision support. Starting from the visual perception of human drivers, different traffic entities have different risk impacts on autonomous vehicles. However, risk perception methods based on traffic scene graphs generally use graph convolution to iteratively update the feature representation of each node. This method ignores the importance of different types of edges between nodes during message transmission. Based on these challenges and difficulties, this paper proposes a risk-enhanced perception framework based on the parallel vision to realize the automatic generation of safety-critical scene data and consider the importance of different types of edges between adjacent traffic entities. Method This method is based on the interactive ACP theory and integrates descriptive, prescriptive, and predictive intelligence under a parallel vision framework to achieve vision-based enhanced risk perception. Specifically, based on descriptive and prescriptive learning, a background adaptive module and a feature fusion encoder are introduced into the diffusion model structure, thereby reducing the boundary contours of pedestrians and improving image quality. By controlling the specific locations where dangerous elements such as pedestrians are generated, the controllable generation of risk sequences in safety-critical scenarios can be achieved. Secondly, a cognitive scene graph construction method based on spatial rules is used to obtain the spatial position of each entity in the scene through target detection. Based on the spatial relative position information and setting relevant threshold information, the distance relationship, orientation relationship and affiliation relationship between entities in the traffic scene are extracted. The extraction of interactive relationships is mainly based on the changes in spatial information between traffic entities over time. Finally, under the predictive learning framework, a new graph model-based risk enhancement perception method is proposed, which integrates the relational graph attention network and the Transformer encoder module to perform spatio-temporal modeling of scene graph sequence data. The relational graph attention network (RGAT) introduces an attention mechanism, assigns different weight values to different neighborhood relationships, and obtains the feature representation of nodes through weighted summation. The Temporal Transformer Encoder module is used to model the temporal dynamics of scene graph sequence data, and ultimately outputs risk-aware visual reasoning results. Result In order to verify the effectiveness of this method, experiments were conducted on three datasets (MRSG-144, IESG, 1043-carla-sg) to compare the performance with five mainstream risk perception methods based on graph-structured data. The proposed method achieved F1-score values of 0.956, 0.944, and 0.916 on the three datasets respectively, surpassing the existing mainstream methods and achieving optimal results. Additionally, ablation experiments revealed the contributions of each module to the model"s performance. The introduction of virtual scene data significantly boosted the performance of the risk perception model, with Acc (accuracy) increased by 0.4%, AUC (area under curve) increased by 1.1%, and F1-score increased by 1.2%. Conclusion This article is a practical application of parallel vision in the field of autonomous driving risk perception, which holds significant importance in enhancing the risk perception capabilities of autonomous vehicles in complex traffic scenarios and ensuring the safety of autonomous driving systems.

Keywords

autonomous driving parallel vision cognitive scene graph diffusion generation risk perception

在线采编平台

在线出版

年度会议

下载中心

年度信息