论文标题
从受控到未纪律的数据:使用潜在结果框架估算数据科学时代的因果效应
From controlled to undisciplined data: estimating causal effects in the era of data science using a potential outcome framework
论文作者
论文摘要
本文讨论了因果推断的基本原理 - 统计的领域,估计特定事件,治疗,干预措施和暴露对实验和观察数据给定结果的影响。我们解释了确定因果效应所需的关键假设,并强调了与观察数据的使用相关的挑战。我们强调实验思维对于因果推断至关重要。数据质量(不一定是数量),研究设计,达到假设的程度以及统计分析的严格性,使我们能够可靠地推断出因果关系。尽管我们主张利用大数据的使用以及机器学习(ML)算法的应用来估计因果效应,但它们并不能替代周到的研究设计。概念通过示例说明。
This paper discusses the fundamental principles of causal inference - the area of statistics that estimates the effect of specific occurrences, treatments, interventions, and exposures on a given outcome from experimental and observational data. We explain the key assumptions required to identify causal effects, and highlight the challenges associated with the use of observational data. We emphasize that experimental thinking is crucial in causal inference. The quality of the data (not necessarily the quantity), the study design, the degree to which the assumptions are met, and the rigor of the statistical analysis allow us to credibly infer causal effects. Although we advocate leveraging the use of big data and the application of machine learning (ML) algorithms for estimating causal effects, they are not a substitute of thoughtful study design. Concepts are illustrated via examples.