论文标题
统计建模:三种文化
Statistical modeling: the three cultures
论文作者
论文摘要
二十年前,利奥·布赖曼(Leo Breiman)确定了两种用于统计建模的文化。数据建模文化(DMC)是指旨在对一定数量兴趣进行统计推断的实践。算法建模培养(AMC)是指定义机器学习(ML)程序的实践,该过程产生有关感兴趣事件的准确预测。布雷曼认为,由于ML在适应数据方面的优势,统计学家应该对AMC更加关注AMC。二十年后,由于数据科学革命,DMC在统计中失去了一些主要作用,但我们观察到这种文化仍然是自然和社会科学中的主要实践。 DMC是作案手法,因为已建立的科学方法的影响,称为假设脱落科学方法。尽管AMC与这种科学方法不相容,但在一些研究组中,AMC和DMC培养物强烈混合。我们认为,这种混合已经形成了一个肥沃的产卵池,用于突变培养物,我们称之为混合建模培养物(HMC),其中预测和推理融合了它们相互加强的新程序。本文确定了HMC的关键特征,从而促进了科学努力,并推动了统计文化向更好实践的发展。好的,我们的意思是在分析因果关系时越来越可靠,有效和有效的统计实践。在结合推理和预测时,HMC的结果是预测与推理之间的区别,被带到其极限,融化了。我们通过描述三种HMC实践来鉴定我们的融化论点,其中每种实践都捕获了科学周期的一个方面,即因果推理的ML,用于数据获取的ML,以及用于理论预测的ML。
Two decades ago, Leo Breiman identified two cultures for statistical modeling. The data modeling culture (DMC) refers to practices aiming to conduct statistical inference on one or several quantities of interest. The algorithmic modeling culture (AMC) refers to practices defining a machine-learning (ML) procedure that generates accurate predictions about an event of interest. Breiman argued that statisticians should give more attention to AMC than to DMC, because of the strengths of ML in adapting to data. While twenty years later, DMC has lost some of its dominant role in statistics because of the data-science revolution, we observe that this culture is still the leading practice in the natural and social sciences. DMC is the modus operandi because of the influence of the established scientific method, called the hypothetico-deductive scientific method. Despite the incompatibilities of AMC with this scientific method, among some research groups, AMC and DMC cultures mix intensely. We argue that this mixing has formed a fertile spawning pool for a mutated culture that we called the hybrid modeling culture (HMC) where prediction and inference have fused into new procedures where they reinforce one another. This article identifies key characteristics of HMC, thereby facilitating the scientific endeavor and fueling the evolution of statistical cultures towards better practices. By better, we mean increasingly reliable, valid, and efficient statistical practices in analyzing causal relationships. In combining inference and prediction, the result of HMC is that the distinction between prediction and inference, taken to its limit, melts away. We qualify our melting-away argument by describing three HMC practices, where each practice captures an aspect of the scientific cycle, namely, ML for causal inference, ML for data acquisition, and ML for theory prediction.