MA2QL：一种极简主义的方法，用于完全分散的多机构增强学习

论文标题

MA2QL：一种极简主义的方法，用于完全分散的多机构增强学习

MA2QL: A Minimalist Approach to Fully Decentralized Multi-Agent Reinforcement Learning

论文作者

Su, Kefan, Zhou, Siyuan, Jiang, Jiechuan, Gan, Chuang, Wang, Xiangjun, Lu, Zongqing

论文摘要

分散的学习对合作多代理增强学习（MARL）表现出了巨大的希望。但是，在完全分散的学习中，非平稳性仍然是一个重大挑战。在本文中，我们以最简单和基本的方式解决了非平稳性问题，并提出了多代理Q-Learning（MA2QL），在该Q-Learning中，代理转弯会更新其Q-函数。 MA2QL是一种极简主义的方法，用于完全分散的合作社，但理论上是扎根的。我们证明，当每个代理商保证$ \ varepsilon $ -convergence时，他们的联合政策会融合到NASH平衡。实际上，MA2QL仅需要对独立Q学习（IQL）的最小变化。我们经验评估MA2QL对各种合作多代理任务。结果表明，MA2QL始终胜过IQL，尽管这种变化很小，但它验证了MA2QL的有效性。

Decentralized learning has shown great promise for cooperative multi-agent reinforcement learning (MARL). However, non-stationarity remains a significant challenge in fully decentralized learning. In the paper, we tackle the non-stationarity problem in the simplest and fundamental way and propose multi-agent alternate Q-learning (MA2QL), where agents take turns updating their Q-functions by Q-learning. MA2QL is a minimalist approach to fully decentralized cooperative MARL but is theoretically grounded. We prove that when each agent guarantees $\varepsilon$-convergence at each turn, their joint policy converges to a Nash equilibrium. In practice, MA2QL only requires minimal changes to independent Q-learning (IQL). We empirically evaluate MA2QL on a variety of cooperative multi-agent tasks. Results show MA2QL consistently outperforms IQL, which verifies the effectiveness of MA2QL, despite such minimal changes.

下载PDF全文

下载文献需遵守相关版权规定

论文标题