论文标题
具体:通过跨语性检索改善跨语性事实检查
CONCRETE: Improving Cross-lingual Fact-checking with Cross-lingual Retrieval
论文作者
论文摘要
由于伪造的信息广泛,事实检查引起了人们的关注。大多数事实核对方法仅仅是由于其他语言中的数据稀缺问题而侧重于英语的主张。缺乏低资源语言的事实检查数据集要求采用有效的跨语义转移技术来进行事实检查。此外,以不同语言的可信赖信息可以互补,有助于验证事实。为此,我们介绍了第一个以跨语性检索为增强的事实检查框架,该框架通过跨语言检索员汇总了从多种语言中检索的证据。鉴于缺乏带有索赔般查询的跨语性信息检索数据集,我们使用拟议的跨语性倒数式紧固任务(X-ICT)训练检索器,这是一种自我监督的算法,该算法通过翻译段落的标题来创建培训实例。 XICT的目标是学习跨语性检索,其中模型学会确定与给定翻译标题相对应的段落。在X-FACT数据集上,我们的方法在零射击跨语言设置中比先前的系统实现了2.23%的绝对F1改进。源代码和数据可在https://github.com/khuangaf/concrete上公开获取。
Fact-checking has gained increasing attention due to the widespread of falsified information. Most fact-checking approaches focus on claims made in English only due to the data scarcity issue in other languages. The lack of fact-checking datasets in low-resource languages calls for an effective cross-lingual transfer technique for fact-checking. Additionally, trustworthy information in different languages can be complementary and helpful in verifying facts. To this end, we present the first fact-checking framework augmented with cross-lingual retrieval that aggregates evidence retrieved from multiple languages through a cross-lingual retriever. Given the absence of cross-lingual information retrieval datasets with claim-like queries, we train the retriever with our proposed Cross-lingual Inverse Cloze Task (X-ICT), a self-supervised algorithm that creates training instances by translating the title of a passage. The goal for X-ICT is to learn cross-lingual retrieval in which the model learns to identify the passage corresponding to a given translated title. On the X-Fact dataset, our approach achieves 2.23% absolute F1 improvement in the zero-shot cross-lingual setup over prior systems. The source code and data are publicly available at https://github.com/khuangaf/CONCRETE.