论文标题

评估解释:教师援助学生的解释是多少?

Evaluating Explanations: How much do explanations from the teacher aid students?

论文作者

Pruthi, Danish, Bansal, Rachit, Dhingra, Bhuwan, Soares, Livio Baldini, Collins, Michael, Lipton, Zachary C., Neubig, Graham, Cohen, William W.

论文摘要

尽管许多方法可以通过突出突出特征来解释预测,但这些解释的目的以及应如何评估它们的目的通常会持续存在。在这项工作中,我们介绍了一个框架,以量化解释的价值,通过他们在训练有素模拟教师模型的学生模型上获得的准确性提高。至关重要的是,在培训期间,学生可以使用这些解释,但在测试时不可用。与先前的建议相比,我们的方法不太容易被赋予对归因的原则性,自动,模型无关的评估。使用我们的框架,我们比较了许多用于文本分类和问题答案的归因方法,并观察到不同学生模型架构和学习策略之间一致(在高度到高度)的定量差异。

While many methods purport to explain predictions by highlighting salient features, what aims these explanations serve and how they ought to be evaluated often go unstated. In this work, we introduce a framework to quantify the value of explanations via the accuracy gains that they confer on a student model trained to simulate a teacher model. Crucially, the explanations are available to the student during training, but are not available at test time. Compared to prior proposals, our approach is less easily gamed, enabling principled, automatic, model-agnostic evaluation of attributions. Using our framework, we compare numerous attribution methods for text classification and question answering, and observe quantitative differences that are consistent (to a moderate to high degree) across different student model architectures and learning strategies.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源