论文标题

通过合成干预措施的因果归因

Causal Imputation via Synthetic Interventions

论文作者

Squires, Chandler, Shen, Dennis, Agarwal, Anish, Shah, Devavrat, Uhler, Caroline

论文摘要

考虑确定化合物对特定细胞类型的影响的问题。为了回答这个问题,研究人员传统上需要运行将感兴趣的药物应用于该细胞类型的实验。这种方法是不可扩展的:给定大量不同的动作(化合物)和大量不同的上下文(单元格类型),对于每个动作范围内对,进行实验是不可避免的。在这种情况下,理想情况下,人们希望预测每对的结果,而只需要在一小部分对中进行实验。我们标记为“因果归因”的这项任务是因果运输问题的概括。为了应对这一挑战,我们扩展了最近引入的合成干预措施(SI)估计器,以处理更多的一般数据稀疏模式。我们证明,在潜在因素模型下,我们的估计器为因果归档任务提供了有效的估计。我们通过建立与线性结构因果模型文献的联系来激励这种模型。最后,我们考虑了明显的CMAP数据集在预测化合物对细胞类型基因表达的影响方面。我们发现我们的估计器的表现优于标准基准,从而证实了其在生物应用中的效用。

Consider the problem of determining the effect of a compound on a specific cell type. To answer this question, researchers traditionally need to run an experiment applying the drug of interest to that cell type. This approach is not scalable: given a large number of different actions (compounds) and a large number of different contexts (cell types), it is infeasible to run an experiment for every action-context pair. In such cases, one would ideally like to predict the outcome for every pair while only having to perform experiments on a small subset of pairs. This task, which we label "causal imputation", is a generalization of the causal transportability problem. To address this challenge, we extend the recently introduced synthetic interventions (SI) estimator to handle more general data sparsity patterns. We prove that, under a latent factor model, our estimator provides valid estimates for the causal imputation task. We motivate this model by establishing a connection to the linear structural causal model literature. Finally, we consider the prominent CMAP dataset in predicting the effects of compounds on gene expression across cell types. We find that our estimator outperforms standard baselines, thus confirming its utility in biological applications.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源