基于收敛的增强学习，控制具有许多自由度的连续量子系统

论文标题

基于收敛的增强学习，控制具有许多自由度的连续量子系统

Control of Continuous Quantum Systems with Many Degrees of Freedom based on Convergent Reinforcement Learning

论文作者

Wang, Zhikang

论文摘要

随着实验量子技术的发展，量子控制因实现可控制的人造量子系统而引起了越来越多的关注。但是，由于量子力学系统通常很难在分析上处理，因此采用了寻求适当控制协议的启发式策略和数值算法，并且深度学习，尤其是深度强化学习（RL），是用于控制问题的有希望的通用候选解决方案。尽管Deep RL在量子控制问题上有一些成功的应用，但大多数现有的RL算法都遭受了不稳定性和不令人满意的可重复性，并且需要大量的微调和大量的计算预算，这两者都限制了其适用性。为了解决不稳定性问题，在本文中，我们研究了Q学习的非缔约国问题。然后，我们研究了已提出的现有收敛方法的弱点，并开发了一种新的收敛Q学习算法，我们称之为Contiongent Deep Q Network（C-DQN）算法，作为常规深Q网络（DQN）算法的替代方法。我们证明了C-DQN的收敛性，并将其应用于Atari 2600基准。我们表明，当DQN失败时，C-DQN仍然成功学习。然后，我们将算法应用于量子四分之一振荡器和被困量子刚体的测量反馈冷却问题。我们建立了物理模型并分析其特性，我们表明，尽管C-DQN和DQN都可以学会为系统冷却，但是C-DQN倾向于表现更稳定，而DQN遭受不稳定性的影响，C-DQN可以实现更好的性能。由于DQN的性能可能会有很大的差异且缺乏一致性，因此C-DQN可以成为研究复杂控制问题的更好选择。

With the development of experimental quantum technology, quantum control has attracted increasing attention due to the realization of controllable artificial quantum systems. However, because quantum-mechanical systems are often too difficult to analytically deal with, heuristic strategies and numerical algorithms which search for proper control protocols are adopted, and, deep learning, especially deep reinforcement learning (RL), is a promising generic candidate solution for the control problems. Although there have been a few successful applications of deep RL to quantum control problems, most of the existing RL algorithms suffer from instabilities and unsatisfactory reproducibility, and require a large amount of fine-tuning and a large computational budget, both of which limit their applicability. To resolve the issue of instabilities, in this dissertation, we investigate the non-convergence issue of Q-learning. Then, we investigate the weakness of existing convergent approaches that have been proposed, and we develop a new convergent Q-learning algorithm, which we call the convergent deep Q network (C-DQN) algorithm, as an alternative to the conventional deep Q network (DQN) algorithm. We prove the convergence of C-DQN and apply it to the Atari 2600 benchmark. We show that when DQN fail, C-DQN still learns successfully. Then, we apply the algorithm to the measurement-feedback cooling problems of a quantum quartic oscillator and a trapped quantum rigid body. We establish the physical models and analyse their properties, and we show that although both C-DQN and DQN can learn to cool the systems, C-DQN tends to behave more stably, and when DQN suffers from instabilities, C-DQN can achieve a better performance. As the performance of DQN can have a large variance and lack consistency, C-DQN can be a better choice for researches on complicated control problems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题