论文标题
利用多任务预处理的力量来实现地面自然语言解释
Harnessing the Power of Multi-Task Pretraining for Ground-Truth Level Natural Language Explanations
论文作者
论文摘要
自然语言的解释有望在复杂的视力语言任务中提供对神经网络决策过程的直观理解的解释,如最近的VL-NLE模型所述。尽管当前的模型在任务准确性和解释性上提供了令人印象深刻的性能,但它们遭受了一系列问题的困扰:某些型号具有模块化设计,其中解释生成模块与单独的模块进行了不良的模块,用于任务解答预测,采用训练有素的骨干模型,或在有限的任务集中培训,或在单个数据集合中提高了AD HOC解决方案。我们建议通过将最新进展应用于生成变压器模型的大规模多任务预处理中,以逃避这些局限性。我们的方法的表现要优于最新模型,而人类注释者则更喜欢在三个评估的数据集中的两个中,而不是地面真相的生成的解释。作为VL-NLE研究中的一个新挑战,我们提出了多任务VL-NLE的问题,并表明对多个任务进行共同培训可以提高解释质量。我们在最近的VL-NLE研究中讨论了高质量NLE生成和其他问题的道德意义。
Natural language explanations promise to offer intuitively understandable explanations of a neural network's decision process in complex vision-language tasks, as pursued in recent VL-NLE models. While current models offer impressive performance on task accuracy and explanation plausibility, they suffer from a range of issues: Some models feature a modular design where the explanation generation module is poorly integrated with a separate module for task-answer prediction, employ backbone models trained on limited sets of tasks, or incorporate ad hoc solutions to increase performance on single datasets. We propose to evade these limitations by applying recent advances in large-scale multi-task pretraining of generative Transformer models to the problem of VL-NLE tasks. Our approach outperforms recent models by a large margin, with human annotators preferring the generated explanations over the ground truth in two out of three evaluated datasets. As a novel challenge in VL-NLE research, we propose the problem of multi-task VL-NLE and show that jointly training on multiple tasks can increase the explanation quality. We discuss the ethical implications of high-quality NLE generation and other issues in recent VL-NLE research.