通过因果自我解释性解释

论文标题

通过因果自我解释性解释

Explainability Via Causal Self-Talk

论文作者

Roy, Nicholas A., Kim, Junkyung, Rabinowitz, Neil

论文摘要

解释AI系统的行为是一个重要的问题，在实践中通常可以避免。尽管Xai社区一直在开发大量的技术，但大多数人都会产生一套更广泛的深度学习社区不愿在大多数情况下支付的费用。我们对这个问题进行了务实的看法，并定义了一组desiderata，既捕捉了XAI的野心和深度学习的实际约束。我们描述了满足所有Desiderata的有效方法：训练AI系统以建立自身的因果模型。我们为深度RL代理开发了该解决方案的实例：因果自我谈话。 CST通过训练代理商在各个时间内与自己交流。我们在模拟的3D环境中实现了此方法，并展示了它如何使代理商对自己的行为产生忠实和语义上的解释。除了解释之外，我们还证明了这些学识渊博的模型提供了为AI系统构建语义控制接口的新方法。

Explaining the behavior of AI systems is an important problem that, in practice, is generally avoided. While the XAI community has been developing an abundance of techniques, most incur a set of costs that the wider deep learning community has been unwilling to pay in most situations. We take a pragmatic view of the issue, and define a set of desiderata that capture both the ambitions of XAI and the practical constraints of deep learning. We describe an effective way to satisfy all the desiderata: train the AI system to build a causal model of itself. We develop an instance of this solution for Deep RL agents: Causal Self-Talk. CST operates by training the agent to communicate with itself across time. We implement this method in a simulated 3D environment, and show how it enables agents to generate faithful and semantically-meaningful explanations of their own behavior. Beyond explanations, we also demonstrate that these learned models provide new ways of building semantic control interfaces to AI systems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题