增强学到的分布式多机器人导航，并具有相互速度障碍物形状的奖励

论文标题

增强学到的分布式多机器人导航，并具有相互速度障碍物形状的奖励

Reinforcement Learned Distributed Multi-Robot Navigation with Reciprocal Velocity Obstacle Shaped Rewards

论文作者

Han, Ruihua, Chen, Shengduo, Wang, Shuaijun, Zhang, Zeqing, Gao, Rui, Hao, Qi, Pan, Jia

论文摘要

解决碰撞避免问题的挑战在于在充满互动障碍的复杂场景中自适应选择最佳的机器人速度。在本文中，我们提出了一种用于多机器人导航的分布式方法，该方法结合了相互速度障碍（RVO）的概念和深钢筋学习方案（DRL），以解决有限信息下的相互碰撞避免问题。这项工作的新颖性是三倍：（1）使用一组顺序VO和RVO向量分别代表静态和动态障碍的交互式环境状态；（2）开发基于双向复发模块的神经网络，该网络将各种障碍物数量的状态直接映射到了动作；（3）开发一个RVO区域和预期的基于碰撞时间的奖励功能，以鼓励相互碰撞行为，并在碰撞风险和旅行时间之间进行权衡。拟议的政策通过模拟方案进行培训，并由基于参与者的DRL算法进行更新。我们在具有各种差分驱动机器人和障碍物的复杂环境中验证策略。实验结果表明，在成功率，旅行时间和平均速度方面，我们的方法优于最先进的方法和其他基于学习的方法。该方法的源代码可在https://github.com/hanruihua/rl_rvo_nav上获得。

The challenges to solving the collision avoidance problem lie in adaptively choosing optimal robot velocities in complex scenarios full of interactive obstacles. In this paper, we propose a distributed approach for multi-robot navigation which combines the concept of reciprocal velocity obstacle (RVO) and the scheme of deep reinforcement learning (DRL) to solve the reciprocal collision avoidance problem under limited information. The novelty of this work is threefold: (1) using a set of sequential VO and RVO vectors to represent the interactive environmental states of static and dynamic obstacles, respectively; (2) developing a bidirectional recurrent module based neural network, which maps the states of a varying number of surrounding obstacles to the actions directly; (3) developing a RVO area and expected collision time based reward function to encourage reciprocal collision avoidance behaviors and trade off between collision risk and travel time. The proposed policy is trained through simulated scenarios and updated by the actor-critic based DRL algorithm. We validate the policy in complex environments with various numbers of differential drive robots and obstacles. The experiment results demonstrate that our approach outperforms the state-of-art methods and other learning based approaches in terms of the success rate, travel time, and average speed. Source code of this approach is available at https://github.com/hanruihua/rl_rvo_nav.

下载PDF全文

下载文献需遵守相关版权规定

论文标题