关于分销式贝尔曼方程的解决方案

论文标题

关于分销式贝尔曼方程的解决方案

On solutions of the distributional Bellman equation

论文作者

Gerstenberg, Julian, Neininger, Ralph, Spiegel, Denis

论文摘要

在分配加强学习中，不仅要考虑预期的回报，而且还要考虑一项策略的完整回报分布。固定策略的返回分配作为解决方案的解决方案。在本说明中，我们考虑了一般的分布钟形方程，并研究其解决方案的存在和独特性以及回报分布的尾部特性。我们为返回分布的存在和独特性提供了必要的条件，并确定了常规变化的情况。我们将分布式的钟形方程与多元仿射分布方程联系起来。我们表明，分布钟形方程的任何解决方案都可以作为对多元仿射分布方程解决方案的边际定律的向量。这使得这种方程的一般理论适用于分布强化学习设置。

In distributional reinforcement learning not only expected returns but the complete return distributions of a policy are taken into account. The return distribution for a fixed policy is given as the solution of an associated distributional Bellman equation. In this note we consider general distributional Bellman equations and study existence and uniqueness of their solutions as well as tail properties of return distributions. We give necessary and sufficient conditions for existence and uniqueness of return distributions and identify cases of regular variation. We link distributional Bellman equations to multivariate affine distributional equations. We show that any solution of a distributional Bellman equation can be obtained as the vector of marginal laws of a solution to a multivariate affine distributional equation. This makes the general theory of such equations applicable to the distributional reinforcement learning setting.

下载PDF全文

下载文献需遵守相关版权规定

论文标题