论文标题
genericskb:通用语句的知识库
GenericsKB: A Knowledge Base of Generic Statements
论文作者
论文摘要
我们为NLP社区提供了一个新资源,即 *通用语句 *的大型(3.5m+句子)知识库 *,例如,“树木从大气中删除二氧化碳”,从多个语料库中收集。这是第一个包含 *天然 *通用句子的大型资源,而不是提取或众包三元组,因此具有高质量的一般,一般,语义上完整的陈述。所有genericskb的句子都带有其局部术语,周围的上下文(句子)和(学习的)信心。我们还发布了genericskb-bess(1M+句子),其中包含genericskb中最优质的仿制药,并与WordNet和ConceptNet合成的综合仿制药增强。在两个需要多台化推理的现有数据集的测试(OBQA和QASC)中,我们发现使用GenericSKB可以比使用更大的语料库获得更高的分数和更好的解释。这表明genericskb可以成为NLP应用程序的有用资源,并为仿制药及其语义的语言研究提供数据。 genericskb可从https://allenai.org/data/genericskb获得。
We present a new resource for the NLP community, namely a large (3.5M+ sentence) knowledge base of *generic statements*, e.g., "Trees remove carbon dioxide from the atmosphere", collected from multiple corpora. This is the first large resource to contain *naturally occurring* generic sentences, as opposed to extracted or crowdsourced triples, and thus is rich in high-quality, general, semantically complete statements. All GenericsKB sentences are annotated with their topical term, surrounding context (sentences), and a (learned) confidence. We also release GenericsKB-Best (1M+ sentences), containing the best-quality generics in GenericsKB augmented with selected, synthesized generics from WordNet and ConceptNet. In tests on two existing datasets requiring multihop reasoning (OBQA and QASC), we find using GenericsKB can result in higher scores and better explanations than using a much larger corpus. This demonstrates that GenericsKB can be a useful resource for NLP applications, as well as providing data for linguistic studies of generics and their semantics. GenericsKB is available at https://allenai.org/data/genericskb.