论文标题
具有组成数据的直接协方差矩阵估计
Direct covariance matrix estimation with compositional data
论文作者
论文摘要
组成数据在自然和生物医学科学的许多研究领域都产生。一个突出的例子是在人类肠道微生物组的研究中,可以测量受试者肠道中许多不同微生物的相对丰度。通常,从业者有兴趣学习微生物之间的依赖性在不同的人群或实验条件下如何变化。用统计术语,目标是估计每个人群中微生物的(潜在)对数的协方差矩阵。但是,数据的组成性质阻止了这些协方差矩阵使用标准估计值。在本文中,我们提出了一个估计器的多个协方差矩阵,该矩阵允许在不同的样本种群中共享信息。与某些现有的估计器相比,这些估计器间接地估算了关注的协方差矩阵,我们的估计器是直接的,可确保积极的确定性,并且是解决凸优化问题的解决方案。我们使用近端梯度下降算法计算估计器。我们估计器的渐近特性表明,它可以在高维设置中表现良好。通过模拟研究,我们证明了我们的估计器可以胜过现有的估计器。我们表明,与竞争对手相比,在对慢性疲劳综合征受试者的微生物组数据的分析中,我们的方法提供了更多可靠的估计。
Compositional data arise in many areas of research in the natural and biomedical sciences. One prominent example is in the study of the human gut microbiome, where one can measure the relative abundance of many distinct microorganisms in a subject's gut. Often, practitioners are interested in learning how the dependencies between microbes vary across distinct populations or experimental conditions. In statistical terms, the goal is to estimate a covariance matrix for the (latent) log-abundances of the microbes in each of the populations. However, the compositional nature of the data prevents the use of standard estimators for these covariance matrices. In this article, we propose an estimator of multiple covariance matrices which allows for information sharing across distinct populations of samples. Compared to some existing estimators, which estimate the covariance matrices of interest indirectly, our estimator is direct, ensures positive definiteness, and is the solution to a convex optimization problem. We compute our estimator using a proximal-proximal gradient descent algorithm. Asymptotic properties of our estimator reveal that it can perform well in high-dimensional settings. Through simulation studies, we demonstrate that our estimator can outperform existing estimators. We show that our method provides more reliable estimates than competitors in an analysis of microbiome data from subjects with chronic fatigue syndrome.