使用温暖的开始方法减少时间来进行加固学习，基于混合电动汽车应用中的监督控制

论文标题

使用温暖的开始方法减少时间来进行加固学习，基于混合电动汽车应用中的监督控制

Learning Time Reduction Using Warm Start Methods for a Reinforcement Learning Based Supervisory Control in Hybrid Electric Vehicle Applications

论文作者

Xu, Bin, Hou, Jun, Shi, Junzhe, Li, Huayi, Rathod, Dhruvang, Wang, Zhe, Filipi, Zoran

论文摘要

强化学习（RL）被广泛用于机器人技术领域，因此，它正在混合电动汽车（HEV）监督控制中逐渐实施。即使RL在模拟中的燃油消耗最小化方面表现出色，但大型学习迭代编号需要很长的学习时间，因此几乎不适用于现实世界中的车辆。此外，初始学习阶段的燃料消耗远比基线控制差得多。这项研究旨在减少HEV应用中Q学习的学习迭代，并利用温暖的开始方法在初始学习阶段改善燃料消耗。与以前从零或随机Q值启动Q学习的研究不同，本研究通过不同的监督控制（即等效的消费策略控制和启发式控制）启动Q学习，并给出了详细的分析。结果表明，所提出的温暖开始Q学习需要比冷Q学习的迭代次数少68.8％。在两个不同的驾驶周期中验证了训练有素的Q-学习，结果与同等消费策略控制相比，结果显示10-16％的MPG改善。此外，分析了实时可行性，并提供了车辆实施的指导。这项研究的结果可用于促进RL在车辆监管控制应用中的部署。

Reinforcement Learning (RL) is widely utilized in the field of robotics, and as such, it is gradually being implemented in the Hybrid Electric Vehicle (HEV) supervisory control. Even though RL exhibits excellent performance in terms of fuel consumption minimization in simulation, the large learning iteration number needs a long learning time, making it hardly applicable in real-world vehicles. In addition, the fuel consumption of initial learning phases is much worse than baseline controls. This study aims to reduce the learning iterations of Q-learning in HEV application and improve fuel consumption in initial learning phases utilizing warm start methods. Different from previous studies, which initiated Q-learning with zero or random Q values, this study initiates the Q-learning with different supervisory controls (i.e., Equivalent Consumption Minimization Strategy control and heuristic control), and detailed analysis is given. The results show that the proposed warm start Q-learning requires 68.8% fewer iterations than cold start Q-learning. The trained Q-learning is validated in two different driving cycles, and the results show 10-16% MPG improvement when compared to Equivalent Consumption Minimization Strategy control. Furthermore, real-time feasibility is analyzed, and the guidance of vehicle implementation is provided. The results of this study can be used to facilitate the deployment of RL in vehicle supervisory control applications.

下载PDF全文

下载文献需遵守相关版权规定

论文标题