论文标题
估计长期治疗效果的增强学习方法
A Reinforcement Learning Approach to Estimating Long-term Treatment Effects
论文作者
论文摘要
随机实验(又称A/B测试)是估计治疗效果,为业务,医疗保健和其他应用做出决定的强大工具。在许多问题中,这种治疗的持久作用会随着时间的流逝而发展。随机实验的一个限制是,它们不容易扩展以测量长期效果,因为进行长期实验是耗时且昂贵的。在本文中,我们采用了强化学习(RL)方法,以估计马尔可夫进程中的平均奖励。在现实世界中,观察到的状态过渡是非本质的,我们为一类非组织问题开发了一种新算法,并在两个合成数据集和一个在线商店数据集中展示了有希望的结果。
Randomized experiments (a.k.a. A/B tests) are a powerful tool for estimating treatment effects, to inform decisions making in business, healthcare and other applications. In many problems, the treatment has a lasting effect that evolves over time. A limitation with randomized experiments is that they do not easily extend to measure long-term effects, since running long experiments is time-consuming and expensive. In this paper, we take a reinforcement learning (RL) approach that estimates the average reward in a Markov process. Motivated by real-world scenarios where the observed state transition is nonstationary, we develop a new algorithm for a class of nonstationary problems, and demonstrate promising results in two synthetic datasets and one online store dataset.