强化学习的潜在变量表示

论文标题

强化学习的潜在变量表示

Latent Variable Representation for Reinforcement Learning

论文作者

Ren, Tongzheng, Xiao, Chenjun, Zhang, Tianjun, Li, Na, Wang, Zhaoran, Sanghavi, Sujay, Schuurmans, Dale, Dai, Bo

论文摘要

深层可变模型在基于模型的增强学习（RL）中取得了显着的经验成功，因为它们在建模复杂过渡动力学方面具有表现力。另一方面，从理论和经验上仍不清楚潜在变量模型如何促进学习，计划和探索以提高RL的样本效率。在本文中，我们为状态行动价值函数提供了潜在变量模型的表示，该模型既可以允许易于变化学习算法，又可以有效实施乐观/悲观原理，面对不确定性的探索。特别是，我们通过结合潜在变量模型的内核嵌入，提出了一种使用UCB探索的计算有效计划算法。从理论上讲，我们在在线和离线设置中建立了拟议方法的样本复杂性。从经验上讲，我们在各种基准测试中表现出优于当前最新算法的性能。

Deep latent variable models have achieved significant empirical successes in model-based reinforcement learning (RL) due to their expressiveness in modeling complex transition dynamics. On the other hand, it remains unclear theoretically and empirically how latent variable models may facilitate learning, planning, and exploration to improve the sample efficiency of RL. In this paper, we provide a representation view of the latent variable models for state-action value functions, which allows both tractable variational learning algorithm and effective implementation of the optimism/pessimism principle in the face of uncertainty for exploration. In particular, we propose a computationally efficient planning algorithm with UCB exploration by incorporating kernel embeddings of latent variable models. Theoretically, we establish the sample complexity of the proposed approach in the online and offline settings. Empirically, we demonstrate superior performance over current state-of-the-art algorithms across various benchmarks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题