来自随机不确定的社会偏好的新兴互惠和团队形成

论文标题

来自随机不确定的社会偏好的新兴互惠和团队形成

Emergent Reciprocity and Team Formation from Randomized Uncertain Social Preferences

论文作者

Baker, Bowen

论文摘要

多代理增强学习（MARL）显示了越来越复杂的固定团队零和环境中的成功。但是，现实世界并不是零和固定的团队。人类面临许多社会困境，必须学习何时合作和何时竞争。为了成功地将代理商部署到人类世界中，重要的是他们能够理解和帮助我们的冲突。不幸的是，在面对社会困境时，自私的Marl特工通常会失败。在这项工作中，我们展示了新兴直接互惠，间接互惠和声誉的证据，以及在培训具有随机不确定社会偏好的代理商（RUSP）时，团队形成了，这是一种新型的环境增强，扩大了环境代理的分布。rusp的分布是通用且可扩展的；它可以应用于任何多代理环境，而无需更改原始的基础游戏动力或目标。特别是，我们表明，在经典的抽象社会困境中，这些行为可以出现并导致更高的社会福利平衡，例如迭代的囚犯困境以及更复杂的跨乘数环境。

Multi-agent reinforcement learning (MARL) has shown recent success in increasingly complex fixed-team zero-sum environments. However, the real world is not zero-sum nor does it have fixed teams; humans face numerous social dilemmas and must learn when to cooperate and when to compete. To successfully deploy agents into the human world, it may be important that they be able to understand and help in our conflicts. Unfortunately, selfish MARL agents typically fail when faced with social dilemmas. In this work, we show evidence of emergent direct reciprocity, indirect reciprocity and reputation, and team formation when training agents with randomized uncertain social preferences (RUSP), a novel environment augmentation that expands the distribution of environments agents play in. RUSP is generic and scalable; it can be applied to any multi-agent environment without changing the original underlying game dynamics or objectives. In particular, we show that with RUSP these behaviors can emerge and lead to higher social welfare equilibria in both classic abstract social dilemmas like Iterated Prisoner's Dilemma as well in more complex intertemporal environments.

下载PDF全文

下载文献需遵守相关版权规定

论文标题