使用基于图的策略学习的开放临时团队合作的一般学习框架

论文标题

使用基于图的策略学习的开放临时团队合作的一般学习框架

A General Learning Framework for Open Ad Hoc Teamwork Using Graph-based Policy Learning

论文作者

Rahman, Arrasy, Carlucho, Ignacio, Höpner, Niklas, Albrecht, Stefano V.

论文摘要

公开临时团队合作是培训单个代理商以有效地与一个未知的队友进行合作的问题，他们的组成可能会随着时间而变化。可变的团队组成为代理带来挑战，例如适应新的团队动态并应对不断变化的国家向量规模的要求。这些挑战在现实世界应用中加剧了，在现实世界中，受控代理只对环境有部分视图。在这项工作中，我们在完全和部分可观察性的情况下开发了一类用于开放临时团队合作的解决方案。我们首先为完全可观察的情况开发一个解决方案，该解决方案利用图形神经网络体系结构获得基于强化学习的最佳策略。然后，我们通过提出不同的方法论来维持对潜在环境状态和团队组成的信念估计的不同方法，将此解决方案扩展到了可观察到的方案。这些信念估计与我们的解决方案相结合，以便在公开的临时团队中，在部分可观察性下计算代理商的最佳政策。经验结果表明，我们的解决方案可以在完全和部分可观察的情况下学习开放临时团队的有效政策。进一步的分析表明，我们的方法的成功是有效地学习队友行为的影响的结果，同时也推断了部分可观察性下的环境状态。

Open ad hoc teamwork is the problem of training a single agent to efficiently collaborate with an unknown group of teammates whose composition may change over time. A variable team composition creates challenges for the agent, such as the requirement to adapt to new team dynamics and dealing with changing state vector sizes. These challenges are aggravated in real-world applications in which the controlled agent only has a partial view of the environment. In this work, we develop a class of solutions for open ad hoc teamwork under full and partial observability. We start by developing a solution for the fully observable case that leverages graph neural network architectures to obtain an optimal policy based on reinforcement learning. We then extend this solution to partially observable scenarios by proposing different methodologies that maintain belief estimates over the latent environment states and team composition. These belief estimates are combined with our solution for the fully observable case to compute an agent's optimal policy under partial observability in open ad hoc teamwork. Empirical results demonstrate that our solution can learn efficient policies in open ad hoc teamwork in fully and partially observable cases. Further analysis demonstrates that our methods' success is a result of effectively learning the effects of teammates' actions while also inferring the inherent state of the environment under partial observability.

下载PDF全文

下载文献需遵守相关版权规定

论文标题