论文标题
使用基于强化学习的参数优化技术进行自动驾驶验证的有效伪造方法
Efficient falsification approach for autonomous vehicle validation using a parameter optimisation technique based on reinforcement learning
论文作者
论文摘要
尽管有许多安全挑战尚未解决,但自动驾驶汽车(AV)的广泛部署似乎迫在眉睫。众所周知,没有普遍商定的验证和验证(VV)方法可以保证绝对安全,这对于接受该技术至关重要。交通参与者的行为和动态世界的不确定性在先进的自主系统中引起随机反应。与传统方法相比,ML算法和概率技术的添加为现实测试的过程增加了显着的复杂性。该领域的大多数研究都着重于产生具有挑战性的具体场景或测试用例,以通过查看从实际数据中收集的提取参数的频率分布来评估系统性能。这些方法通常采用蒙特卡洛模拟和重要性抽样来产生关键案例。本文提出了一种评估正在测试的系统的有效伪造方法。该方法基于参数优化问题,以搜索具有挑战性的方案。优化过程旨在找到具有最大回报的挑战性案例。该方法应用了政策梯度增强学习算法来实现学习。场景的风险通过已建立的RSS安全度量,欧几里得距离和碰撞实例来衡量。我们证明,通过使用提出的方法,我们可以更有效地搜索具有挑战性的方案,这可能导致系统无法满足安全要求。
The widescale deployment of Autonomous Vehicles (AV) appears to be imminent despite many safety challenges that are yet to be resolved. It is well-known that there are no universally agreed Verification and Validation (VV) methodologies guarantee absolute safety, which is crucial for the acceptance of this technology. The uncertainties in the behaviour of the traffic participants and the dynamic world cause stochastic reactions in advanced autonomous systems. The addition of ML algorithms and probabilistic techniques adds significant complexity to the process for real-world testing when compared to traditional methods. Most research in this area focuses on generating challenging concrete scenarios or test cases to evaluate the system performance by looking at the frequency distribution of extracted parameters as collected from the real-world data. These approaches generally employ Monte-Carlo simulation and importance sampling to generate critical cases. This paper presents an efficient falsification method to evaluate the System Under Test. The approach is based on a parameter optimisation problem to search for challenging scenarios. The optimisation process aims at finding the challenging case that has maximum return. The method applies policy-gradient reinforcement learning algorithm to enable the learning. The riskiness of the scenario is measured by the well established RSS safety metric, euclidean distance, and instance of a collision. We demonstrate that by using the proposed method, we can more efficiently search for challenging scenarios which could cause the system to fail in order to satisfy the safety requirements.