通过知识引导的深入强化学习，提高培训虚拟治疗计划者网络的效率，用于智能自动治疗计划

论文标题

通过知识引导的深入强化学习，提高培训虚拟治疗计划者网络的效率，用于智能自动治疗计划

Improving Efficiency of Training a Virtual Treatment Planner Network via Knowledge-guided Deep Reinforcement Learning for Intelligent Automatic Treatment Planning of Radiotherapy

论文作者

Shen, Chenyang, Chen, Liyuan, Gonzalez, Yesenia, Jia, Xun

论文摘要

我们以前提出了放射疗法的智能自动治疗计划框架，其中使用深度强化学习（DRL）建立了虚拟治疗计划者网络（VTPN），以操作治疗计划系统（TPS）。尽管取得了成功，但通过DRL对VTPN的培训很耗时。同样，培训时间预计将随着治疗计划问题的复杂性而增长，从而阻止VTPN的发展，以实现更复杂但临床相关的情况。在这项研究中，我们提出了一个知识引导的DRL（KGDRL），该DRL（KGDRL）纳入了人类规划者的知识，以指导培训过程以提高培训效率。使用前列腺癌强度调节辐射疗法作为测试床，我们首先总结了运营我们内部TPS的许多规则。在训练中，除了使用Epsilon-Greedy算法在DRL中随机导航州行动空间外，我们还采样了由规则定义的动作。在培训过程中，从规则进行采样行动的优先级减少，以鼓励VTPN探索规则不涵盖的新政策。我们使用KGDRL训练了VTPN，并将其性能与另一位使用DRL训练的VTPN进行了比较。通过KGDRL和DRL培训的VTPN自发地学会了运营TP，以生成高质量的计划，分别达到计划质量分别为8.82和8.43。两种VTPN纯粹基于规则，其计划得分为7.81。使用KGDRL进行了8集培训的VTPN能够与使用100集的DRL训练那些相似的表现。培训时间从一周多减少到13个小时。拟议的KGDRL框架通过合并人类知识来加速培训过程，这将促进VTPN的开发，以实现更复杂的治疗计划方案。

We previously proposed an intelligent automatic treatment planning framework for radiotherapy, in which a virtual treatment planner network (VTPN) was built using deep reinforcement learning (DRL) to operate a treatment planning system (TPS). Despite the success, the training of VTPN via DRL was time consuming. Also the training time is expected to grow with the complexity of the treatment planning problem, preventing the development of VTPN for more complicated but clinically relevant scenarios. In this study we proposed a knowledge-guided DRL (KgDRL) that incorporated knowledge from human planners to guide the training process to improve the training efficiency. Using prostate cancer intensity modulated radiation therapy as a testbed, we first summarized a number of rules of operating our in-house TPS. In training, in addition to randomly navigating the state-action space, as in the DRL using the epsilon-greedy algorithm, we also sampled actions defined by the rules. The priority of sampling actions from rules decreased over the training process to encourage VTPN to explore new policy that was not covered by the rules. We trained a VTPN using KgDRL and compared its performance with another VTPN trained using DRL. Both VTPNs trained via KgDRL and DRL spontaneously learned to operate the TPS to generate high-quality plans, achieving plan quality scores of 8.82 and 8.43, respectively. Both VTPNs outperformed treatment planning purely based on the rules, which had a plan score of 7.81. VTPN trained with 8 episodes using KgDRL was able to perform similarly to that trained using DRL with 100 episodes. The training time was reduced from more than a week to 13 hours. The proposed KgDRL framework accelerated the training process by incorporating human knowledge, which will facilitate the development of VTPN for more complicated treatment planning scenarios.

下载PDF全文

下载文献需遵守相关版权规定

论文标题