迈向因果意见的RL：州智慧动作的时间差异

论文标题

迈向因果意见的RL：州智慧动作的时间差异

Toward Causal-Aware RL: State-Wise Action-Refined Temporal Difference

论文作者

Sun, Hao, Wang, Taiyi

论文摘要

尽管众所周知，勘探在加强学习（RL）中起着关键作用，但RL中连续控制任务的普遍探索策略主要基于幼稚的各向同性高斯噪声，而不管行动空间与任务之间的因果关系与该任务之间的因果关系以及行动的所有维度如何同样重要。在这项工作中，我们建议对原始行动空间进行干预，以发现行动空间与任务奖励之间的因果关系。我们提出了提炼州行动的方法（SWAR），该方法解决了行动空间冗余问题并促进RL中的因果关系发现。我们将RL任务中的因果关系发现作为国家依赖的行动空间选择问题，并提出了两种实际算法作为解决方案。第一种方法TD-SWAR检测到时间差异学习期间与任务相关的动作，而第二种方法Dyn-Swar通过动态模型预测揭示了重要的动作。从经验上讲，两种方法都提供了理解RL代理做出的决策并提高动作冗余任务中的学习效率的方法。

Although it is well known that exploration plays a key role in Reinforcement Learning (RL), prevailing exploration strategies for continuous control tasks in RL are mainly based on naive isotropic Gaussian noise regardless of the causality relationship between action space and the task and consider all dimensions of actions equally important. In this work, we propose to conduct interventions on the primal action space to discover the causal relationship between the action space and the task reward. We propose the method of State-Wise Action Refined (SWAR), which addresses the issue of action space redundancy and promote causality discovery in RL. We formulate causality discovery in RL tasks as a state-dependent action space selection problem and propose two practical algorithms as solutions. The first approach, TD-SWAR, detects task-related actions during temporal difference learning, while the second approach, Dyn-SWAR, reveals important actions through dynamic model prediction. Empirically, both methods provide approaches to understand the decisions made by RL agents and improve learning efficiency in action-redundant tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题