离线RL的状态优势加权

论文标题

离线RL的状态优势加权

State Advantage Weighting for Offline RL

论文作者

Lyu, Jiafei, Gong, Aicheng, Wan, Le, Lu, Zongqing, Li, Xiu

论文摘要

我们为离线增强学习（RL）提供了国家优势加权。与我们在QSA学习中通常采用的Action Advantage $ A（S，A）$相反，我们利用了状态优势$ A（S，S，S^\ Prime）$和QSS学习用于离线RL，因此将动作从值中解耦。我们希望代理商可以到达高回报状态，并且该诉讼取决于代理如何到达相应状态。 D4RL数据集上的实验表明，我们提出的方法可以针对共同基准实现出色的性能。此外，我们的方法在从离线转移到在线时显示出良好的概括能力。

We present state advantage weighting for offline reinforcement learning (RL). In contrast to action advantage $A(s,a)$ that we commonly adopt in QSA learning, we leverage state advantage $A(s,s^\prime)$ and QSS learning for offline RL, hence decoupling the action from values. We expect the agent can get to the high-reward state and the action is determined by how the agent can get to that corresponding state. Experiments on D4RL datasets show that our proposed method can achieve remarkable performance against the common baselines. Furthermore, our method shows good generalization capability when transferring from offline to online.

下载PDF全文

下载文献需遵守相关版权规定

论文标题