论文标题

高维潜在因子模型中的最佳判别分析

Optimal Discriminant Analysis in High-Dimensional Latent Factor Models

论文作者

Bing, Xin, Wegkamp, Marten

论文摘要

在高维分类问题中,一种常用的方法是将高维特征首先投射到较低的维空间中,并将分类基于所得的较低维投影。在本文中,我们制定了一个带有隐藏的低维结构的潜在可变性模型,以证明这一两步过程是合理的,并指导要选择哪些投影。我们提出了一个具有计算高效的分类器,该分类器将观察到的特征作为投影的某些主要组件(PC),并以数据驱动方式选择保留的PC数量。建立了一般理论,用于根据任何预测分析此类两步分类器。我们得出了提议的基于PC的分类器的多余风险的明确收敛速率。在Minimax意义上,获得的速率进一步证明是对数因素的最佳选择。我们的理论允许较低维度随样本量增长,即使特征维度(极大)超过样本量,也是有效的。广泛的模拟证实了我们的理论发现。所提出的方法还相对于其他三个真实数据示例的其他现有判别方法表现出色。

In high-dimensional classification problems, a commonly used approach is to first project the high-dimensional features into a lower dimensional space, and base the classification on the resulting lower dimensional projections. In this paper, we formulate a latent-variable model with a hidden low-dimensional structure to justify this two-step procedure and to guide which projection to choose. We propose a computationally efficient classifier that takes certain principal components (PCs) of the observed features as projections, with the number of retained PCs selected in a data-driven way. A general theory is established for analyzing such two-step classifiers based on any projections. We derive explicit rates of convergence of the excess risk of the proposed PC-based classifier. The obtained rates are further shown to be optimal up to logarithmic factors in the minimax sense. Our theory allows the lower-dimension to grow with the sample size and is also valid even when the feature dimension (greatly) exceeds the sample size. Extensive simulations corroborate our theoretical findings. The proposed method also performs favorably relative to other existing discriminant methods on three real data examples.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源