论文标题

通过脑启发的计算有效的非政策钢筋学习

Efficient Off-Policy Reinforcement Learning via Brain-Inspired Computing

论文作者

Ni, Yang, Abraham, Danny, Issa, Mariam, Kim, Yeseong, Mercati, Pietro, Imani, Mohsen

论文摘要

强化学习(RL)为增强现有的智能系统提供了新的机会,这些系统通常包括复杂的决策过程。但是,现代的RL算法,例如深Q-Networks(DQN),基于深度神经网络,导致了较高的计算成本。在本文中,我们提出了QHD,这是一种基于差异价值的高维增强学习,它模仿了大脑特性,以实现鲁棒和实时学习。 QHD依靠轻巧的大脑启发模型来学习未知环境中的最佳政策。在台式机和功率有限的嵌入式平台上,QHD的总体效率明显优于DQN,同时提供更高或可比的奖励。 QHD也适用于高效的强化学习,具有在线和实时学习的巨大潜力。我们的解决方案支持一个小的体验重播批量尺寸,与DQN相比,它提供了12.3倍的速度,同时确保质量损失最小。我们的评估显示了实时学习的QHD能力,比DQN提供了34.6倍的速度,并且学习质量明显更好。

Reinforcement Learning (RL) has opened up new opportunities to enhance existing smart systems that generally include a complex decision-making process. However, modern RL algorithms, e.g., Deep Q-Networks (DQN), are based on deep neural networks, resulting in high computational costs. In this paper, we propose QHD, an off-policy value-based Hyperdimensional Reinforcement Learning, that mimics brain properties toward robust and real-time learning. QHD relies on a lightweight brain-inspired model to learn an optimal policy in an unknown environment. On both desktop and power-limited embedded platforms, QHD achieves significantly better overall efficiency than DQN while providing higher or comparable rewards. QHD is also suitable for highly-efficient reinforcement learning with great potential for online and real-time learning. Our solution supports a small experience replay batch size that provides 12.3 times speedup compared to DQN while ensuring minimal quality loss. Our evaluation shows QHD capability for real-time learning, providing 34.6 times speedup and significantly better quality of learning than DQN.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源