通过脑启发的计算有效的非政策钢筋学习

论文标题

通过脑启发的计算有效的非政策钢筋学习

Efficient Off-Policy Reinforcement Learning via Brain-Inspired Computing

论文作者

Ni, Yang, Abraham, Danny, Issa, Mariam, Kim, Yeseong, Mercati, Pietro, Imani, Mohsen

论文摘要

强化学习（RL）为增强现有的智能系统提供了新的机会，这些系统通常包括复杂的决策过程。但是，现代的RL算法，例如深Q-Networks（DQN），基于深度神经网络，导致了较高的计算成本。在本文中，我们提出了QHD，这是一种基于差异价值的高维增强学习，它模仿了大脑特性，以实现鲁棒和实时学习。 QHD依靠轻巧的大脑启发模型来学习未知环境中的最佳政策。在台式机和功率有限的嵌入式平台上，QHD的总体效率明显优于DQN，同时提供更高或可比的奖励。 QHD也适用于高效的强化学习，具有在线和实时学习的巨大潜力。我们的解决方案支持一个小的体验重播批量尺寸，与DQN相比，它提供了12.3倍的速度，同时确保质量损失最小。我们的评估显示了实时学习的QHD能力，比DQN提供了34.6倍的速度，并且学习质量明显更好。

Reinforcement Learning (RL) has opened up new opportunities to enhance existing smart systems that generally include a complex decision-making process. However, modern RL algorithms, e.g., Deep Q-Networks (DQN), are based on deep neural networks, resulting in high computational costs. In this paper, we propose QHD, an off-policy value-based Hyperdimensional Reinforcement Learning, that mimics brain properties toward robust and real-time learning. QHD relies on a lightweight brain-inspired model to learn an optimal policy in an unknown environment. On both desktop and power-limited embedded platforms, QHD achieves significantly better overall efficiency than DQN while providing higher or comparable rewards. QHD is also suitable for highly-efficient reinforcement learning with great potential for online and real-time learning. Our solution supports a small experience replay batch size that provides 12.3 times speedup compared to DQN while ensuring minimal quality loss. Our evaluation shows QHD capability for real-time learning, providing 34.6 times speedup and significantly better quality of learning than DQN.

下载PDF全文

下载文献需遵守相关版权规定

论文标题