论文标题
重复密度的非参数估计,具有异质样本大小
Nonparametric Estimation of Repeated Densities with Heterogeneous Sample Sizes
论文作者
论文摘要
我们考虑了多个亚群中密度的估计,其中每个亚群中的可用样本大小都大不相同。例如,这种问题发生在流行病学中,其中不同的疾病可能具有相似的致病机制,但患病率有所不同。我们提出的方法在没有指定参数形式的情况下,以数据驱动的方式汇总了人群中的信息,并估算每个亚群中的密度。从功能数据分析中借鉴,以指数家族形式的低维近似密度家族是由对数浓度变化的主要变化模式构建的。随后,根据可能性原理和收缩,亚种群密度随后拟合在近似家庭中。随着组件数量的增加,近似家庭的柔韧性提高,并且可以近似任意的无限维密度。我们还通过离散观察得出密度估计的收敛结果。所提出的方法显示在模拟以及对电子病历和降雨数据的应用方面是可解释且有效的。
We consider the estimation of densities in multiple subpopulations, where the available sample size in each subpopulation greatly varies. This problem occurs in epidemiology, for example, where different diseases may share similar pathogenic mechanism but differ in their prevalence. Without specifying a parametric form, our proposed method pools information from the population and estimate the density in each subpopulation in a data-driven fashion. Drawing from functional data analysis, low-dimensional approximating density families in the form of exponential families are constructed from the principal modes of variation in the log-densities. Subpopulation densities are subsequently fitted in the approximating families based on likelihood principles and shrinkage. The approximating families increase in their flexibility as the number of components increases and can approximate arbitrary infinite-dimensional densities. We also derive convergence results of the density estimates with discrete observations. The proposed methods are shown to be interpretable and efficient in simulation as well as applications to electronic medical record and rainfall data.