稀疏编码概率矩阵分解中更解释的特征选择表示形式

论文标题

稀疏编码概率矩阵分解中更解释的特征选择表示形式

Sparse encoding for more-interpretable feature-selecting representations in probabilistic matrix factorization

论文作者

Chang, Joshua C., Fletcher, Patrick, Han, Jungmin, Chang, Ted L., Vattikuti, Shashaank, Desmet, Bart, Zirikly, Ayah, Chow, Carson C.

论文摘要

计数数据的降低方法降低方法对于医学信息学和其他模型可解释性至关重要的领域的广泛应用至关重要。对于此类数据，分层泊松矩阵分解（HPF）和其他稀疏概率的非阴性矩阵分解（NMF）方法被认为是可解释的生成模型。它们由稀疏的转换组成，用于将其学习的表示形式解码为预测。但是，在表示解码中的稀疏性并不一定意味着从原始数据特征编码表示形式中的稀疏性。 HPF在文献中通常会错误地解释，就像它具有编码器的稀疏性一样。解码器的稀疏性和编码器稀疏性之间的区别很微妙，但很重要。由于缺乏编码器的稀疏性，HPF不具有经典NMF的列群集属性 - 因子加载矩阵不能充分定义从原始特征形成每个因子的方式。我们通过使用广义加法模型（GAM）自谨慎地执行编码器稀疏性来解决这种缺陷，从而允许一个人将每个表示坐标与原始数据特征的子集相关联。这样，该方法还获得了执行特征选择的能力。我们演示了我们的模拟数据方法，并举例说明了编码器的稀疏性如何在代表Medicare患者住院合并症的具体应用中实际使用。

Dimensionality reduction methods for count data are critical to a wide range of applications in medical informatics and other fields where model interpretability is paramount. For such data, hierarchical Poisson matrix factorization (HPF) and other sparse probabilistic non-negative matrix factorization (NMF) methods are considered to be interpretable generative models. They consist of sparse transformations for decoding their learned representations into predictions. However, sparsity in representation decoding does not necessarily imply sparsity in the encoding of representations from the original data features. HPF is often incorrectly interpreted in the literature as if it possesses encoder sparsity. The distinction between decoder sparsity and encoder sparsity is subtle but important. Due to the lack of encoder sparsity, HPF does not possess the column-clustering property of classical NMF -- the factor loading matrix does not sufficiently define how each factor is formed from the original features. We address this deficiency by self-consistently enforcing encoder sparsity, using a generalized additive model (GAM), thereby allowing one to relate each representation coordinate to a subset of the original data features. In doing so, the method also gains the ability to perform feature selection. We demonstrate our method on simulated data and give an example of how encoder sparsity is of practical use in a concrete application of representing inpatient comorbidities in Medicare patients.

下载PDF全文

下载文献需遵守相关版权规定

论文标题