信息状态嵌入部分可观察到的合作多代理增强学习中

论文标题

信息状态嵌入部分可观察到的合作多代理增强学习中

Information State Embedding in Partially Observable Cooperative Multi-Agent Reinforcement Learning

论文作者

Mao, Weichao, Zhang, Kaiqing, Miehling, Erik, Başar, Tamer

论文摘要

长期以来，在部分可观察性下的多代理增强学习（MARL）一直被认为是具有挑战性的，这主要是由于每个代理人都要求维持对所有其他代理人的本地历史的信念 - 这个领域通常会随着时间的流逝而成倍增长。在这项工作中，我们调查了一个可观察到的MARL问题，其中代理是合作的。为了启用可访问算法的开发，我们介绍了一种信息状态嵌入的概念，该信息状态嵌入，以压缩代理的历史。我们量化压缩误差如何影响分散控制的结果值函数。此外，我们提出了一个基于复发神经网络（RNN）的嵌入实例。然后将嵌入用作近似信息状态，并可以馈入任何MARL算法。提出的嵌入式元素管道打开了现有（部分可观察到的）MARL算法的黑框，使我们能够建立一些理论保证（价值函数的误差范围），同时仍然通过许多端到端方法来实现竞争性能。

Multi-agent reinforcement learning (MARL) under partial observability has long been considered challenging, primarily due to the requirement for each agent to maintain a belief over all other agents' local histories -- a domain that generally grows exponentially over time. In this work, we investigate a partially observable MARL problem in which agents are cooperative. To enable the development of tractable algorithms, we introduce the concept of an information state embedding that serves to compress agents' histories. We quantify how the compression error influences the resulting value functions for decentralized control. Furthermore, we propose an instance of the embedding based on recurrent neural networks (RNNs). The embedding is then used as an approximate information state, and can be fed into any MARL algorithm. The proposed embed-then-learn pipeline opens the black-box of existing (partially observable) MARL algorithms, allowing us to establish some theoretical guarantees (error bounds of value functions) while still achieving competitive performance with many end-to-end approaches.

下载PDF全文

下载文献需遵守相关版权规定

论文标题