论文标题
基于能量和球统计的稀疏尺寸缩小
Sparse dimension reduction based on energy and ball statistics
论文作者
论文摘要
顾名思义,从包含所有足以解释因变量的数据的数据中估算一个子空间的足够尺寸降低(SDR)目标。 SDR存在充分的方法,其中一些最近依赖于最小化或没有模型假设的方法。这些是根据优化标准来定义的,该标准最大化了非参数的关联度量。原始估计量是非孔子,这意味着所有变量都对模型有所贡献。但是,在许多实际应用中,可以将SDR技术称为稀疏,因此本质上执行了足够的可变选择(SVS)。本文研究了如何构建这样的稀疏SDR估计器。研究了三种变体,具体取决于不同的关联措施:距离协方差,Martingale差异差异和球协方差。一项仿真研究表明,这些估计器中的每一个都可以在高度非线性的环境中实现正确的变量选择,但对离群值和计算密集型敏感。该研究阐明了方法之间的细微差异。有两个示例说明了如何在实践中应用这些新估计器,并且基于Martingale差异差异在生物信息学示例中略微偏爱该选项。
As its name suggests, sufficient dimension reduction (SDR) targets to estimate a subspace from data that contains all information sufficient to explain a dependent variable. Ample approaches exist to SDR, some of the most recent of which rely on minimal to no model assumptions. These are defined according to an optimization criterion that maximizes a nonparametric measure of association. The original estimators are nonsparse, which means that all variables contribute to the model. However, in many practical applications, an SDR technique may be called for that is sparse and as such, intrinsically performs sufficient variable selection (SVS). This paper examines how such a sparse SDR estimator can be constructed. Three variants are investigated, depending on different measures of association: distance covariance, martingale difference divergence and ball covariance. A simulation study shows that each of these estimators can achieve correct variable selection in highly nonlinear contexts, yet are sensitive to outliers and computationally intensive. The study sheds light on the subtle differences between the methods. Two examples illustrate how these new estimators can be applied in practice, with a slight preference for the option based on martingale difference divergence in the bioinformatics example.