删除功能是模型解释方法的统一原理

论文标题

删除功能是模型解释方法的统一原理

Feature Removal Is a Unifying Principle for Model Explanation Methods

论文作者

Covert, Ian, Lundberg, Scott, Lee, Su-In

论文摘要

研究人员提出了多种模型的解释方法，但尚不清楚大多数方法如何相关或一种方法比另一种方法更可取。我们研究了文献，发现许多方法是基于通过删除来解释的共同原理 - 本质上，测量了从模型中删除一组特征的影响。这些方法在几个方面有所不同，因此我们为基于删除的解释开发了一个沿三个维度来表征每个方法的框架：1）该方法如何删除特征，2）该方法解释的模型行为以及3）该方法如何汇总每个特征的影响。我们的框架统一了26种现有方法，其中包括几种最广泛使用的方法（塑形，石灰，有意义的扰动，排列测试）。揭露这些方法之间的基本相似性使用户有能力推理使用哪种工具，并为正在进行的模型解释性研究提出了有希望的方向。

Researchers have proposed a wide variety of model explanation approaches, but it remains unclear how most methods are related or when one method is preferable to another. We examine the literature and find that many methods are based on a shared principle of explaining by removing - essentially, measuring the impact of removing sets of features from a model. These methods vary in several respects, so we develop a framework for removal-based explanations that characterizes each method along three dimensions: 1) how the method removes features, 2) what model behavior the method explains, and 3) how the method summarizes each feature's influence. Our framework unifies 26 existing methods, including several of the most widely used approaches (SHAP, LIME, Meaningful Perturbations, permutation tests). Exposing the fundamental similarities between these methods empowers users to reason about which tools to use, and suggests promising directions for ongoing model explainability research.

下载PDF全文

下载文献需遵守相关版权规定

论文标题