多个渠道上的多播安排：一种分配的深入加固学习方法

论文标题

多个渠道上的多播安排：一种分配的深入加固学习方法

Multicast Scheduling over Multiple Channels: A Distribution-Embedding Deep Reinforcement Learning Method

论文作者

Li, Ran, Huang, Chuan, Qin, Xiaoqi, Jiang, Shengpei

论文摘要

多播是一种有效的技术，用于同时将常见消息从基站（BS）传输到多个移动用户（MUS）。多个渠道上的多播时间表，旨在共同最大程度地减少BS的能源消耗以及来自MUS的异步请求的潜伏期，以无限马可比夫（Markov）决策过程（MDP）出现，具有大量离散的动作空间，多个离散的限制性约束，并多次定时化的约束。为了应对这些挑战，本文提出了一个新颖的分布式多代理近端政策优化（DE-MAPPO）算法，该算法由一个经过修改的Mappo和一个分配式模块组成：前者可以通过修改Actor网络的结构和训练Kernel的结构来处理大型离散的行动空间和时间变化的限制。后一个迭代地调整了动作分布以满足时间不变的约束。此外，通过解决两步优化问题来得出所考虑的MDP的性能上限。最后，数值结果表明，我们提出的算法优于现有的算法，并实现与衍生基准相当的性能。

Multicasting is an efficient technique for simultaneously transmitting common messages from the base station (BS) to multiple mobile users (MUs). Multicast scheduling over multiple channels, which aims to jointly minimize the energy consumption of the BS and the latency of serving asynchronized requests from the MUs, is formulated as an infinite-horizon Markov decision process (MDP) problem with a large discrete action space, multiple time-varying constraints, and multiple time-invariant constraints. To address these challenges, this paper proposes a novel distribution-embedding multi-agent proximal policy optimization (DE-MAPPO) algorithm, which consists of one modified MAPPO and one distribution-embedding module: The former one handles the large discrete action space and time-varying constraints by modifying the structure of the actor networks and the training kernel of the conventional MAPPO; and the latter one iteratively adjusts the action distribution to satisfy the time-invariant constraints. Moreover, a performance upper bound of the considered MDP is derived by solving a two-step optimization problem. Finally, numerical results demonstrate that our proposed algorithm outperforms the existing ones and achieves comparable performance to the derived benchmark.

下载PDF全文

下载文献需遵守相关版权规定

论文标题