改善信号交叉点与对抗性学习的驾驶政策的概括

论文标题

改善信号交叉点与对抗性学习的驾驶政策的概括

Improve Generalization of Driving Policy at Signalized Intersections with Adversarial Learning

论文作者

Ren, Yangang, Zhan, Guojian, Tang, Liye, Li, Shengbo Eben, Jiang, Jianhua, Duan, Jingliang

论文摘要

在各种驾驶场景中，交叉路口非常具有挑战性，在这些驾驶场景中，信号灯的相互作用和不同的交通行为者在学习明智而强大的驾驶政策方面非常困难。当前的研究很少考虑交通参与者的交叉点和随机行为的多样性。对于实际应用，随机性通常会导致一些毁灭性的事件，这应该是自动驾驶的重点。本文介绍了一种对抗性学习范式，以提高驾驶政策的智能和鲁棒性，以发出信号交叉路口，并具有密集的交通流量。首先，我们设计了一个静态路径计划器，该计划者能够为具有多元化拓扑的多个交叉路口生成可跟踪的候选路径。接下来，基于这些候选路径构建了受限的最佳控制问题（COCP），其中考虑动态模型的有限不确定性以捕获驱动环境的随机性。我们建议对抗性政策梯度（APG）解决COCP，其中引入了对抗性政策，以通过寻求最严重的不确定性来提供骚乱，而驾驶政策学会通过竞争来应对这种情况。最后，建立了一个综合系统，以进行培训和测试，其中引入了感知模块，并纳入了人类体验以解决黄光困境。实验表明，训练有素的策略可以灵活处理信号灯，同时通过人形范式意识到平稳有效的传递。此外，APG可以大大提高对异常行为的阻力，从而确保自动驾驶汽车的高安全水平。

Intersections are quite challenging among various driving scenes wherein the interaction of signal lights and distinct traffic actors poses great difficulty to learn a wise and robust driving policy. Current research rarely considers the diversity of intersections and stochastic behaviors of traffic participants. For practical applications, the randomness usually leads to some devastating events, which should be the focus of autonomous driving. This paper introduces an adversarial learning paradigm to boost the intelligence and robustness of driving policy for signalized intersections with dense traffic flow. Firstly, we design a static path planner which is capable of generating trackable candidate paths for multiple intersections with diversified topology. Next, a constrained optimal control problem (COCP) is built based on these candidate paths wherein the bounded uncertainty of dynamic models is considered to capture the randomness of driving environment. We propose adversarial policy gradient (APG) to solve the COCP wherein the adversarial policy is introduced to provide disturbances by seeking the most severe uncertainty while the driving policy learns to handle this situation by competition. Finally, a comprehensive system is established to conduct training and testing wherein the perception module is introduced and the human experience is incorporated to solve the yellow light dilemma. Experiments indicate that the trained policy can handle the signal lights flexibly meanwhile realizing the smooth and efficient passing with a humanoid paradigm. Besides, APG enables a large-margin improvement of the resistance to the abnormal behaviors and thus ensures a high safety level for the autonomous vehicle.

下载PDF全文

下载文献需遵守相关版权规定

论文标题