论文标题
基于高维逻辑回归的遗传相关性的统计推断
Statistical Inference for Genetic Relatedness Based on High-Dimensional Logistic Regression
论文作者
论文摘要
本文研究了基于个人水平全基因组关联数据的二进制特征之间遗传相关性的统计推断问题。具体而言,在高维Logistic回归模型下,我们定义了表征跨性状遗传相关性,遗传协方差和性状特异性遗传方差的参数。为逻辑拉索估计量开发了一种新型的加权词汇方法,并提出了计算有效的估计量。研究了这些估计量的收敛速率,并在轻度条件下建立了它们的渐近正态性。此外,我们为这些参数构建置信区间和统计测试,并为这些方法提供理论上的理由,包括置信区间的覆盖概率和预期长度,以及提议的测试的大小和功率。数值研究都是在两个模型生成的数据和模拟遗传数据下进行的,以显示所提出的方法的优越性。通过分析有关自身免疫性疾病的真实数据集,我们证明了其获得有关十种儿科自身免疫性疾病之间共享遗传结构的新见解的能力。
This paper studies the problem of statistical inference for genetic relatedness between binary traits based on individual-level genome-wide association data. Specifically, under the high-dimensional logistic regression models, we define parameters characterizing the cross-trait genetic correlation, the genetic covariance and the trait-specific genetic variance. A novel weighted debiasing method is developed for the logistic Lasso estimator and computationally efficient debiased estimators are proposed. The rates of convergence for these estimators are studied and their asymptotic normality is established under mild conditions. Moreover, we construct confidence intervals and statistical tests for these parameters, and provide theoretical justifications for the methods, including the coverage probability and expected length of the confidence intervals, as well as the size and power of the proposed tests. Numerical studies are conducted under both model generated data and simulated genetic data to show the superiority of the proposed methods. By analyzing a real data set on autoimmune diseases, we demonstrate its ability to obtain novel insights about the shared genetic architecture between ten pediatric autoimmune diseases.