用梯度回滚解释神经矩阵分解

论文标题

用梯度回滚解释神经矩阵分解

Explaining Neural Matrix Factorization with Gradient Rollback

论文作者

Lawrence, Carolin, Sztyler, Timo, Niepert, Mathias

论文摘要

解释神经黑盒模型的预测是一个重要的问题，尤其是当在用户信任至关重要的应用中使用此类模型时。估计训练示例对学习的神经模型行为的影响使我们能够确定对给定预测负责的培训示例，因此，忠实地解释了黑盒模型的输出。最普遍适用的现有方法基于影响功能，该方法对于较大的样本量和模型而言，其扩展很差。我们提出了梯度回滚，这是一种影响估计的一般方法，适用于神经模型，即即使参数的总数很大，梯度下降期间的每个参数更新步骤都会触及较小的参数。接受梯度下降训练的神经矩阵分解模型是该模型类别的一部分。这些模型很受欢迎，并且在行业中发现了广泛的应用。尤其是属于此类的知识图嵌入方法。我们表明，梯度回滚在训练时间和测试时间均高效。此外，从理论上讲，我们表明梯度回滚的影响近似与模型行为的真实影响之间的差异小于已知的界限对随机梯度下降的稳定性的范围。这就确定了梯度回滚是强大的估计示例影响的。我们还进行实验，表明梯度回滚为知识库完成和推荐数据集提供了忠实的解释。

Explaining the predictions of neural black-box models is an important problem, especially when such models are used in applications where user trust is crucial. Estimating the influence of training examples on a learned neural model's behavior allows us to identify training examples most responsible for a given prediction and, therefore, to faithfully explain the output of a black-box model. The most generally applicable existing method is based on influence functions, which scale poorly for larger sample sizes and models. We propose gradient rollback, a general approach for influence estimation, applicable to neural models where each parameter update step during gradient descent touches a smaller number of parameters, even if the overall number of parameters is large. Neural matrix factorization models trained with gradient descent are part of this model class. These models are popular and have found a wide range of applications in industry. Especially knowledge graph embedding methods, which belong to this class, are used extensively. We show that gradient rollback is highly efficient at both training and test time. Moreover, we show theoretically that the difference between gradient rollback's influence approximation and the true influence on a model's behavior is smaller than known bounds on the stability of stochastic gradient descent. This establishes that gradient rollback is robustly estimating example influence. We also conduct experiments which show that gradient rollback provides faithful explanations for knowledge base completion and recommender datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题