论文标题

通过分层贝叶斯信息标准选择因素分析中的因素数量

Choosing the number of factors in factor analysis with incomplete data via a hierarchical Bayesian information criterion

论文作者

Zhao, Jianhua, Shang, Changchun, Li, Shulan, Xin, Ling, Yu, Philip L. H.

论文摘要

贝叶斯信息标准(BIC)定义为基于样本量$ n $的观察到的数据日志可能性减去罚款项,是使用完整数据进行因子分析的流行模型选择标准。对于不完整的数据也建议了此定义。但是,无论在完整的数据案例还是不完整的数据案例中,基于“完整”样本量$ n $的罚款都是相同的。对于不完整的数据,通常只有$ n_i <n $观察值,这意味着使用“完整的”样本大小$ n $令人难以置信地忽略了不完整数据中固有的丢失信息的数量。鉴于此观察结果,提出了一个称为分层BIC(HBIC)的新标准,用于使用不完整的数据进行因子分析。新颖性是它仅使用罚款中的实际信息,即$ n_i $。从理论上讲,这表明HBIC是差异贝叶斯(VB)下限的大型样本近似,而BIC是HBIC的进一步近似,这意味着HBIC具有BIC的理论一致性。进行了合成和实际数据集的实验,以访问HBIC,BIC和相关标准的有限样本性能,并具有不同的速率。结果表明,当缺失率很小时,HBIC和BIC的性能类似,但是当缺失率不小时,HBIC更准确。

The Bayesian information criterion (BIC), defined as the observed data log likelihood minus a penalty term based on the sample size $N$, is a popular model selection criterion for factor analysis with complete data. This definition has also been suggested for incomplete data. However, the penalty term based on the `complete' sample size $N$ is the same no matter whether in a complete or incomplete data case. For incomplete data, there are often only $N_i<N$ observations for variable $i$, which means that using the `complete' sample size $N$ implausibly ignores the amounts of missing information inherent in incomplete data. Given this observation, a novel criterion called hierarchical BIC (HBIC) for factor analysis with incomplete data is proposed. The novelty is that it only uses the actual amounts of observed information, namely $N_i$'s, in the penalty term. Theoretically, it is shown that HBIC is a large sample approximation of variational Bayesian (VB) lower bound, and BIC is a further approximation of HBIC, which means that HBIC shares the theoretical consistency of BIC. Experiments on synthetic and real data sets are conducted to access the finite sample performance of HBIC, BIC, and related criteria with various missing rates. The results show that HBIC and BIC perform similarly when the missing rate is small, but HBIC is more accurate when the missing rate is not small.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源