论文标题
贝叶斯 - 特雷克斯(Bayes-Trex):以身作用的贝叶斯抽样方法来模拟透明度
Bayes-TrEx: a Bayesian Sampling Approach to Model Transparency by Example
论文作者
论文摘要
事后解释方法在解释,理解和调试神经网络方面越来越受欢迎。大多数使用此类方法的分析解释了对从测试集提取的输入的响应的决定。但是,测试集可能几乎没有触发某些模型行为的示例,例如高信心失败或模棱两可的分类。为了应对这些挑战,我们引入了灵活的模型检查框架:贝叶斯 - 特雷克斯。给定数据分布,贝叶斯 - 特雷克斯(Bayes-Trex)找到了具有指定预测置信度的分布示例。我们演示了几种贝叶斯 - 特雷克斯(Bayes-Trex)的用例,包括揭示高度自信(MIS)的分类,通过模棱两可的例子可视化阶级边界,了解新颖的班级外推行为以及暴露神经网络过度自信。我们使用Bayes-Trex来研究接受CLEVR,MNIST和时尚态培训的分类器,并且我们表明,该框架可使更灵活的整体模型分析,而不仅仅是检查测试集。代码可在https://github.com/serenabooth/bayes-trex上找到。
Post-hoc explanation methods are gaining popularity for interpreting, understanding, and debugging neural networks. Most analyses using such methods explain decisions in response to inputs drawn from the test set. However, the test set may have few examples that trigger some model behaviors, such as high-confidence failures or ambiguous classifications. To address these challenges, we introduce a flexible model inspection framework: Bayes-TrEx. Given a data distribution, Bayes-TrEx finds in-distribution examples with a specified prediction confidence. We demonstrate several use cases of Bayes-TrEx, including revealing highly confident (mis)classifications, visualizing class boundaries via ambiguous examples, understanding novel-class extrapolation behavior, and exposing neural network overconfidence. We use Bayes-TrEx to study classifiers trained on CLEVR, MNIST, and Fashion-MNIST, and we show that this framework enables more flexible holistic model analysis than just inspecting the test set. Code is available at https://github.com/serenabooth/Bayes-TrEx.