论文标题
多胎交叉验证模型平均广义添加节线性模型
Multifold Cross-Validation Model Averaging for Generalized Additive Partial Linear Models
论文作者
论文摘要
广义添加剂部分线性模型(GAPLM)吸引了模型解释和预测。但是,对于GAPLM,在实践中通常很难确定非参数部分的协变量和平滑程度。为了解决此模型选择不确定性问题,我们开发了一个计算可行的模型平均(MA)程序。模型权重是数据驱动的,并基于多胎交叉验证(CV)(而不是保留)以进行计算节省。当所有候选模型都被弄清楚时,我们表明,在实现最低可能的kullback-leibler损失的意义上,提出的GAPLMS的MA估计器在渐近上是最佳的。在其他情况下,候选模型集包含至少一个正确的模型,多率CV选择的权重渐近地集中在正确的模型上。作为副产品,我们提出了一个可变的重要性度量,以根据MA权重量化GAPLMS中预测变量的重要性。证明它能够渐近地识别真实模型中的变量。此外,当候选模型的数量很大时,提供了一种模型筛选方法。数值实验显示了所提出的MA方法比某些现有模型平均和选择方法的优越性。
Generalized additive partial linear models (GAPLMs) are appealing for model interpretation and prediction. However, for GAPLMs, the covariates and the degree of smoothing in the nonparametric parts are often difficult to determine in practice. To address this model selection uncertainty issue, we develop a computationally feasible model averaging (MA) procedure. The model weights are data-driven and selected based on multifold cross-validation (CV) (instead of leave-one-out) for computational saving. When all the candidate models are misspecified, we show that the proposed MA estimator for GAPLMs is asymptotically optimal in the sense of achieving the lowest possible Kullback-Leibler loss. In the other scenario where the candidate model set contains at least one correct model, the weights chosen by the multifold CV are asymptotically concentrated on the correct models. As a by-product, we propose a variable importance measure to quantify the importances of the predictors in GAPLMs based on the MA weights. It is shown to be able to asymptotically identify the variables in the true model. Moreover, when the number of candidate models is very large, a model screening method is provided. Numerical experiments show the superiority of the proposed MA method over some existing model averaging and selection methods.