论文标题
用于分析环境混合物的分层综合组拉索(Higlasso)框架
A Hierarchical Integrative Group LASSO (HiGLASSO) Framework for Analyzing Environmental Mixtures
论文作者
论文摘要
环境健康研究越来越多地测量多种污染物,以表征归因于暴露混合物的关节健康效应。但是,有毒物质与感兴趣的健康结果之间的潜在剂量反应关系可能是高度非线性的,可能具有非线性相互作用效应。现有的惩罚回归方法来解释暴露互动的情况,在保持强烈遗传的同时无法适应非线性相互作用,或者在样本量有限的应用中计算不稳定。在本文中,我们提出了一个一般的收缩和选择框架,以确定一组暴露量之间值得注意的非线性和相互作用效应。我们将分层综合群体拉索(Higlasso)设计为(a)对双向相互作用效应(层次结构)施加强大的遗传约束,(b)在不需要初始系数估计(综合)的情况下进行适应性权重,而(c)在尊重群组结构(组Lasso)的情况下诱导可变性的选择量(C)。我们证明了所提出的方法的稀疏性,并将Higlasso从生命码出生队列中应用于环境有毒物质数据集,在该数据集中,研究人员有兴趣了解21种毒性生物标志物对尿液8-异丙烷的关节作用,这是氧化应激的量度。 Higlasso的实现在Higlasso R软件包中可用,可通过综合R档案网络访问。
Environmental health studies are increasingly measuring multiple pollutants to characterize the joint health effects attributable to exposure mixtures. However, the underlying dose-response relationship between toxicants and health outcomes of interest may be highly nonlinear, with possible nonlinear interaction effects. Existing penalized regression methods that account for exposure interactions either cannot accommodate nonlinear interactions while maintaining strong heredity or are computationally unstable in applications with limited sample size. In this paper, we propose a general shrinkage and selection framework to identify noteworthy nonlinear main and interaction effects among a set of exposures. We design hierarchical integrative group LASSO (HiGLASSO) to (a) impose strong heredity constraints on two-way interaction effects (hierarchical), (b) incorporate adaptive weights without necessitating initial coefficient estimates (integrative), and (c) induce sparsity for variable selection while respecting group structure (group LASSO). We prove sparsistency of the proposed method and apply HiGLASSO to an environmental toxicants dataset from the LIFECODES birth cohort, where the investigators are interested in understanding the joint effects of 21 urinary toxicant biomarkers on urinary 8-isoprostane, a measure of oxidative stress. An implementation of HiGLASSO is available in the higlasso R package, accessible through the Comprehensive R Archive Network.