多代理增强学习中的差异建议

论文标题

多代理增强学习中的差异建议

Differential Advising in Multi-Agent Reinforcement Learning

论文作者

Ye, Dayong, Zhu, Tianqing, Cheng, Zishuo, Zhou, Wanlei, Yu, Philip S.

论文摘要

代理建议是通过使代理商分享建议来提高代理学习绩效的主要方法之一。现有的咨询方法具有共同的限制，即只有在与咨询者相关状态相同的州创建建议的情况下，顾问代理才能向咨询代理提供建议。但是，在复杂的环境中，非常强烈要求两个状态是相同的，因为一个状态可能由多个维度组成，两个状态是相同的，意味着两个状态中的所有这些维度都相同相同。因此，此要求可能会限制现有的建议方法对复杂环境的适用性。在本文中，受差异隐私计划的启发，我们提出了一种差异建议方法，即使在稍有不同的状态下创建了建议，也可以通过使代理在状态下使用建议来放松这一要求。与现有方法相比，使用建议方法的代理有更多机会从他人那里获得建议。本文是第一个采用差异隐私概念来建议提高代理学习绩效而不是解决安全问题的概念。实验结果表明，在复杂环境中，所提出的方法比现有方法更有效。

Agent advising is one of the main approaches to improve agent learning performance by enabling agents to share advice. Existing advising methods have a common limitation that an adviser agent can offer advice to an advisee agent only if the advice is created in the same state as the advisee's concerned state. However, in complex environments, it is a very strong requirement that two states are the same, because a state may consist of multiple dimensions and two states being the same means that all these dimensions in the two states are correspondingly identical. Therefore, this requirement may limit the applicability of existing advising methods to complex environments. In this paper, inspired by the differential privacy scheme, we propose a differential advising method which relaxes this requirement by enabling agents to use advice in a state even if the advice is created in a slightly different state. Compared with existing methods, agents using the proposed method have more opportunity to take advice from others. This paper is the first to adopt the concept of differential privacy on advising to improve agent learning performance instead of addressing security issues. The experimental results demonstrate that the proposed method is more efficient in complex environments than existing methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题