论文标题

半监督分位数估计:高维设置中的鲁棒和有效推断

Semi-Supervised Quantile Estimation: Robust and Efficient Inference in High Dimensional Settings

论文作者

Chakrabortty, Abhishek, Dai, Guorong, Carroll, Raymond J.

论文摘要

我们考虑在半监督的设置中进行分位数估计,其特征在于两个可用的数据集:(i)一个小的或中等大小的标记数据集,其中包含响应的观察值和一组可能高维协变量的观测值,(ii)仅在相互仪的情况下,一个更大的未标记数据集。我们建议基于两个数据集的响应分位数的半监督估计量家族,以提高与监督估计器相比的估计准确性,即标记的数据中的样本分位数。这些估计器使用适用于估计方程的灵活插补策略以及一个偏差步骤,该步骤允许完全鲁棒性,以抗归类模型的错误指定。此外,采用了一步更新策略来轻松实施我们的方法,并处理分位数估计方程的非线性性质的复杂性。在温和的假设下,我们的估计量完全适合滋扰插补模型的选择,这是始终保持根N的一致性和渐近正态性的意义,同时相对于监督估计量提高了效率。如果响应与协变量之间的关系通过插补模型正确指定,则它们还可以达到半参数最优性。作为估计滋扰功能的说明,我们考虑了较低的尺寸和可能的估计转换的内核平滑型估计器,我们在高维度中的均匀收敛速率上建立了新的结果,涉及函数类别和减少尺寸降低技术用途的响应。这些结果可能具有独立的兴趣。在估计和推理方面,模拟和真实数据的数值结果都证实了我们半监督方法的提高性能。

We consider quantile estimation in a semi-supervised setting, characterized by two available data sets: (i) a small or moderate sized labeled data set containing observations for a response and a set of possibly high dimensional covariates, and (ii) a much larger unlabeled data set where only the covariates are observed. We propose a family of semi-supervised estimators for the response quantile(s) based on the two data sets, to improve the estimation accuracy compared to the supervised estimator, i.e., the sample quantile from the labeled data. These estimators use a flexible imputation strategy applied to the estimating equation along with a debiasing step that allows for full robustness against misspecification of the imputation model. Further, a one-step update strategy is adopted to enable easy implementation of our method and handle the complexity from the non-linear nature of the quantile estimating equation. Under mild assumptions, our estimators are fully robust to the choice of the nuisance imputation model, in the sense of always maintaining root-n consistency and asymptotic normality, while having improved efficiency relative to the supervised estimator. They also attain semi-parametric optimality if the relation between the response and the covariates is correctly specified via the imputation model. As an illustration of estimating the nuisance imputation function, we consider kernel smoothing type estimators on lower dimensional and possibly estimated transformations of the high dimensional covariates, and we establish novel results on their uniform convergence rates in high dimensions, involving responses indexed by a function class and usage of dimension reduction techniques. These results may be of independent interest. Numerical results on both simulated and real data confirm our semi-supervised approach's improved performance, in terms of both estimation and inference.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源