论文标题
通过元正交化来宣传卷积神经网络
Debiasing Convolutional Neural Networks via Meta Orthogonalization
论文作者
论文摘要
尽管深度学习模型经常实现强大的任务绩效,但由于无法将伪造的相关性从因果因素(例如,他们使用受保护的属性(例如种族,性别等))做出决定时,他们的成功受到了阻碍。在这项工作中,我们解决了在这种情况下脱氧卷积神经网络(CNN)的问题。我们的元正交化方法是基于现有的词汇嵌入和模型解释性的作用,鼓励不同概念的CNN表示(例如性别和班级标签)在激活空间中彼此之间是正交的,同时保持了强大的下游任务绩效。通过各种实验,我们系统地测试了我们的方法,并证明它显着减轻了模型偏差,并且与当前的对抗性偏见方法具有竞争力。
While deep learning models often achieve strong task performance, their successes are hampered by their inability to disentangle spurious correlations from causative factors, such as when they use protected attributes (e.g., race, gender, etc.) to make decisions. In this work, we tackle the problem of debiasing convolutional neural networks (CNNs) in such instances. Building off of existing work on debiasing word embeddings and model interpretability, our Meta Orthogonalization method encourages the CNN representations of different concepts (e.g., gender and class labels) to be orthogonal to one another in activation space while maintaining strong downstream task performance. Through a variety of experiments, we systematically test our method and demonstrate that it significantly mitigates model bias and is competitive against current adversarial debiasing methods.