论文标题

概念梯度:基于概念的解释没有线性假设

Concept Gradient: Concept-based Interpretation Without Linear Assumption

论文作者

Bai, Andrew, Yeh, Chih-Kuan, Ravikumar, Pradeep, Lin, Neil Y. C., Hsieh, Cho-Jui

论文摘要

基于概念的黑框模型的解释通常更为直观,让人可以理解。基于概念的解释最广泛采用的方法是概念激活向量(CAV)。 CAV依赖于学习给定模型和概念的某些潜在表示之间的线性关系。线性可分离性通常是隐式假定的,但通常不正确。在这项工作中,我们从基于概念的解释和提出的概念梯度(CG)的原始意图开始,将基于概念的解释扩展到线性概念函数之外。我们表明,对于一般(潜在的非线性)概念,我们可以数学上评估如何影响模型预测的概念的小变化,从而导致将基于梯度的解释扩展到概念空间。我们从经验上证明,在玩具示例和现实世界数据集中,CG表现优于CAV。

Concept-based interpretations of black-box models are often more intuitive for humans to understand. The most widely adopted approach for concept-based interpretation is Concept Activation Vector (CAV). CAV relies on learning a linear relation between some latent representation of a given model and concepts. The linear separability is usually implicitly assumed but does not hold true in general. In this work, we started from the original intent of concept-based interpretation and proposed Concept Gradient (CG), extending concept-based interpretation beyond linear concept functions. We showed that for a general (potentially non-linear) concept, we can mathematically evaluate how a small change of concept affecting the model's prediction, which leads to an extension of gradient-based interpretation to the concept space. We demonstrated empirically that CG outperforms CAV in both toy examples and real world datasets.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源