公平机器学习的解释性

论文标题

公平机器学习的解释性

Explainability for fair machine learning

论文作者

Begley, Tom, Schwedes, Tobias, Frye, Christopher, Feige, Ilya

论文摘要

随着机器学习模型做出或影响的决定越来越影响我们的生活，检测，理解和减轻不公平至关重要。但是，即使仅仅确定“不公平”在给定上下文中应该意味着什么是不平凡的：有许多竞争的定义，并且它们之间的选择通常需要对基本任务有深刻的理解。因此，很容易使用模型解释性来获得对模型公平性的见解，但是现有的解释性工具并不能可靠地表明模型是否确实是公平的。在这项工作中，我们提出了一种基于沙普利价值范式来解释机器学习中公平性的新方法。我们的公平性解释将模型的整体不公平归因于单个输入特征，即使在模型直接在敏感属性上运行的情况下也是如此。此外，在Shapley解释性的线性激励下，我们提出了一种用于应用现有培训时间公平干预措施的元算法，其中一个人会训练对原始模型的扰动，而不是完全新模型。通过解释原始模型，扰动和公平校正的模型，我们可以深入了解干预措施所取得的准确性权衡。我们进一步表明，该元算法既具有灵活性又具有稳定性优势，而不会损失性能。

As the decisions made or influenced by machine learning models increasingly impact our lives, it is crucial to detect, understand, and mitigate unfairness. But even simply determining what "unfairness" should mean in a given context is non-trivial: there are many competing definitions, and choosing between them often requires a deep understanding of the underlying task. It is thus tempting to use model explainability to gain insights into model fairness, however existing explainability tools do not reliably indicate whether a model is indeed fair. In this work we present a new approach to explaining fairness in machine learning, based on the Shapley value paradigm. Our fairness explanations attribute a model's overall unfairness to individual input features, even in cases where the model does not operate on sensitive attributes directly. Moreover, motivated by the linearity of Shapley explainability, we propose a meta algorithm for applying existing training-time fairness interventions, wherein one trains a perturbation to the original model, rather than a new model entirely. By explaining the original model, the perturbation, and the fair-corrected model, we gain insight into the accuracy-fairness trade-off that is being made by the intervention. We further show that this meta algorithm enjoys both flexibility and stability benefits with no loss in performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题