帕累托演员评分，用于多代理增强学习中的平衡选择

论文标题

帕累托演员评分，用于多代理增强学习中的平衡选择

Pareto Actor-Critic for Equilibrium Selection in Multi-Agent Reinforcement Learning

论文作者

Christianos, Filippos, Papoudakis, Georgios, Albrecht, Stefano V.

论文摘要

这项工作着重于无引起的多代理游戏中的平衡选择，我们专门研究了在几种现有的均衡中选择帕累托最佳纳什均衡的问题。已经表明，由于每个代理商在训练过程中对其他代理商的政策的不确定性，许多最先进的多机构增强学习（MARL）算法容易融合到帕累托主导的平衡。为了解决次级最佳平衡选择，我们提出了帕累托演员 - 批评（Pareto-AC），这是一种参与者批评算法，利用了无自由游戏的简单属性（合作游戏超集）的简单属性：无数级别的超级竞争者的临时范围均可在所有范围内均均可选择所有的均值。我们在各种多种多样的游戏中评估了帕累托-AC，并表明它与七种最先进的MARL算法相比，它会收敛到更高的情节回报，并且在一系列矩阵游戏中，它成功地融合到了帕累托最佳平衡。最后，我们提出了PACDCG，PACDCG是帕累托-AC的图形神经网络扩展，该图显示可在具有大量代理的游戏中有效扩展。

This work focuses on equilibrium selection in no-conflict multi-agent games, where we specifically study the problem of selecting a Pareto-optimal Nash equilibrium among several existing equilibria. It has been shown that many state-of-the-art multi-agent reinforcement learning (MARL) algorithms are prone to converging to Pareto-dominated equilibria due to the uncertainty each agent has about the policy of the other agents during training. To address sub-optimal equilibrium selection, we propose Pareto Actor-Critic (Pareto-AC), which is an actor-critic algorithm that utilises a simple property of no-conflict games (a superset of cooperative games): the Pareto-optimal equilibrium in a no-conflict game maximises the returns of all agents and, therefore, is the preferred outcome for all agents. We evaluate Pareto-AC in a diverse set of multi-agent games and show that it converges to higher episodic returns compared to seven state-of-the-art MARL algorithms and that it successfully converges to a Pareto-optimal equilibrium in a range of matrix games. Finally, we propose PACDCG, a graph neural network extension of Pareto-AC, which is shown to efficiently scale in games with a large number of agents.

下载PDF全文

下载文献需遵守相关版权规定

论文标题