论文标题
可解释性及其统计影响:准确性和可解释性之间的权衡
Enforcing Interpretability and its Statistical Impacts: Trade-offs between Accuracy and Interpretability
论文作者
论文摘要
迄今为止,还没有正式研究机器学习中可解释性的统计成本。因此,围绕潜在权衡的论述通常是非正式的和误解。在这项工作中,我们旨在对这些权衡进行正式研究。看似无法克服的障碍是缺乏对解释性的定义。相反,我们提出了观点的转变。我们没有尝试定义可解释性,而是建议建模\ emph {emforcing}的解释性的\ emph {act}。作为起点,我们专注于二进制分类的经验风险最小化设置,并将解释性视为对学习的约束。也就是说,我们假设我们有一个假设的子集被认为是可解释的,这可能取决于上下文的数据分布和其他方面。然后,我们将可解释性的行为模拟为对一组可解释的假设进行经验风险最小化的行为。该模型使我们使用统计学习理论中的已知结果来推论可解释性的统计含义。为了关注准确性,我们进行了案例分析,解释了为什么当限制可解释分类器的限制或不符合某些过剩统计风险的代价时,人们可能会或可能不会观察到准确性和可解释性之间的权衡。我们以一些有效的例子和一些开放的问题结束,我们希望这将刺激围绕可解释性的权衡进一步发展。
To date, there has been no formal study of the statistical cost of interpretability in machine learning. As such, the discourse around potential trade-offs is often informal and misconceptions abound. In this work, we aim to initiate a formal study of these trade-offs. A seemingly insurmountable roadblock is the lack of any agreed upon definition of interpretability. Instead, we propose a shift in perspective. Rather than attempt to define interpretability, we propose to model the \emph{act} of \emph{enforcing} interpretability. As a starting point, we focus on the setting of empirical risk minimization for binary classification, and view interpretability as a constraint placed on learning. That is, we assume we are given a subset of hypothesis that are deemed to be interpretable, possibly depending on the data distribution and other aspects of the context. We then model the act of enforcing interpretability as that of performing empirical risk minimization over the set of interpretable hypotheses. This model allows us to reason about the statistical implications of enforcing interpretability, using known results in statistical learning theory. Focusing on accuracy, we perform a case analysis, explaining why one may or may not observe a trade-off between accuracy and interpretability when the restriction to interpretable classifiers does or does not come at the cost of some excess statistical risk. We close with some worked examples and some open problems, which we hope will spur further theoretical development around the tradeoffs involved in interpretability.