梦想：基于模型的强化学习通过潜在的想象力而无需重建

论文标题

梦想：基于模型的强化学习通过潜在的想象力而无需重建

Dreaming: Model-based Reinforcement Learning by Latent Imagination without Reconstruction

论文作者

Okada, Masashi, Taniguchi, Tadahiro

论文摘要

在本文中，我们提出了Dreamer的无解码扩展，这是一种基于模型的强化学习（MBRL）方法。 Dreamer是一种针对机器人学习的样本和成本效益的解决方案，因为它用于基于各种自动编码器训练潜在的状态空间模型，并通过潜在的轨迹想象来进行策略优化。但是，这种基于自动编码的方法通常会导致对象消失，其中自动编码器未能感知到解决控制任务的关键对象，从而显着限制了Dreamer的潜力。这项工作旨在减轻梦想家的瓶颈，并通过删除解码器来增强其性能。为此，我们首先从梦想家的下限的证据中得出对比度学习的可能性和信息的目标。其次，我们结合了两个组件，（i）独立的线性动力学和（ii）随机作物数据增强，以提高训练性能。与Dreamer和其他最新的无模型增强学习方法相比，我们新设计的Dreamer使用InfoMax，没有生成的解码器（Dreaming）在5个难以模拟的机器人技术任务上取得了最佳分数，在该任务中，Dreamer遭受了对象消失的影响。

In the present paper, we propose a decoder-free extension of Dreamer, a leading model-based reinforcement learning (MBRL) method from pixels. Dreamer is a sample- and cost-efficient solution to robot learning, as it is used to train latent state-space models based on a variational autoencoder and to conduct policy optimization by latent trajectory imagination. However, this autoencoding based approach often causes object vanishing, in which the autoencoder fails to perceives key objects for solving control tasks, and thus significantly limiting Dreamer's potential. This work aims to relieve this Dreamer's bottleneck and enhance its performance by means of removing the decoder. For this purpose, we firstly derive a likelihood-free and InfoMax objective of contrastive learning from the evidence lower bound of Dreamer. Secondly, we incorporate two components, (i) independent linear dynamics and (ii) the random crop data augmentation, to the learning scheme so as to improve the training performance. In comparison to Dreamer and other recent model-free reinforcement learning methods, our newly devised Dreamer with InfoMax and without generative decoder (Dreaming) achieves the best scores on 5 difficult simulated robotics tasks, in which Dreamer suffers from object vanishing.

下载PDF全文

下载文献需遵守相关版权规定

论文标题