论文标题
通过基于依赖树的模式从医学文献中提取信息性因果关系
Informative Causality Extraction from Medical Literature via Dependency-tree based Patterns
论文作者
论文摘要
从医学文献中提取原因实体是医疗信息检索的重要任务。解决此任务的解决方案可以用于汇编各种因果关系,例如疾病和症状之间的因果关系,药物与副作用之间的因果关系,基因和疾病之间的副作用等。现有的解决因果效应实体的解决方案在原因和效果短语是名称实体,单词名词,单词名词,或名词(单词),或名词(包括三个单词)的句子中很好地工作。不幸的是,在医学文献中,句子中的因果关系短语不仅仅是名词或名词短语,而是由几个单词组成的复杂短语,现有方法无法正确提取此类句子中的因果实体。与给定句子中的意图相比,因果实体的部分提取传达了较差的质量,非信息性,而且通常是矛盾的事实。在这项工作中,我们通过设计一种无监督的方法来解决此问题,以提取因素和效果短语,即PatternCausality,该方法特别适合医学文献。我们提出的方法首先使用因果关系依赖模式的集合作为模板来提取因果关系和效应短语的头词,然后使用一种新颖的短语提取方法来获取从句子中获得完整而有意义的因果和效果短语。对由PubMed文章构建的因果效应数据集进行的实验表明,对于提取因果实体,模式考试比现有方法的最佳级数优于现有方法,而不是现有方法的最佳方法。
Extracting cause-effect entities from medical literature is an important task in medical information retrieval. A solution for solving this task can be used for compilation of various causality relations, such as, causality between disease and symptoms, between medications and side effects, between genes and diseases, etc. Existing solutions for extracting cause-effect entities work well for sentences where the cause and the effect phrases are name entities, single-word nouns, or noun phrases consisting of two to three words. Unfortunately, in medical literature, cause and effect phrases in a sentence are not simply nouns or noun phrases, rather they are complex phrases consisting of several words, and existing methods fail to correctly extract the cause and effect entities in such sentences. Partial extraction of cause and effect entities conveys poor quality, non informative, and often, contradictory facts, comparing to the one intended in the given sentence. In this work, we solve this problem by designing an unsupervised method for cause and effect phrase extraction, PatternCausality, which is specifically suitable for the medical literature. Our proposed approach first uses a collection of cause-effect dependency patterns as template to extract head words of cause and effect phrases and then it uses a novel phrase extraction method to obtain complete and meaningful cause and effect phrases from a sentence. Experiments on a cause-effect dataset built from sentences from PubMed articles show that for extracting cause and effect entities, PatternCausality is substantially better than the existing methods with an order of magnitude improvement in the F-score metric over the best of the existing methods.