论文标题
MA2QL:一种极简主义的方法,用于完全分散的多机构增强学习
MA2QL: A Minimalist Approach to Fully Decentralized Multi-Agent Reinforcement Learning
论文作者
论文摘要
分散的学习对合作多代理增强学习(MARL)表现出了巨大的希望。但是,在完全分散的学习中,非平稳性仍然是一个重大挑战。在本文中,我们以最简单和基本的方式解决了非平稳性问题,并提出了多代理Q-Learning(MA2QL),在该Q-Learning中,代理转弯会更新其Q-函数。 MA2QL是一种极简主义的方法,用于完全分散的合作社,但理论上是扎根的。我们证明,当每个代理商保证$ \ varepsilon $ -convergence时,他们的联合政策会融合到NASH平衡。实际上,MA2QL仅需要对独立Q学习(IQL)的最小变化。我们经验评估MA2QL对各种合作多代理任务。结果表明,MA2QL始终胜过IQL,尽管这种变化很小,但它验证了MA2QL的有效性。
Decentralized learning has shown great promise for cooperative multi-agent reinforcement learning (MARL). However, non-stationarity remains a significant challenge in fully decentralized learning. In the paper, we tackle the non-stationarity problem in the simplest and fundamental way and propose multi-agent alternate Q-learning (MA2QL), where agents take turns updating their Q-functions by Q-learning. MA2QL is a minimalist approach to fully decentralized cooperative MARL but is theoretically grounded. We prove that when each agent guarantees $\varepsilon$-convergence at each turn, their joint policy converges to a Nash equilibrium. In practice, MA2QL only requires minimal changes to independent Q-learning (IQL). We empirically evaluate MA2QL on a variety of cooperative multi-agent tasks. Results show MA2QL consistently outperforms IQL, which verifies the effectiveness of MA2QL, despite such minimal changes.