通过模仿学习的迭代文档级信息提取

论文标题

通过模仿学习的迭代文档级信息提取

Iterative Document-level Information Extraction via Imitation Learning

论文作者

Chen, Yunmo, Gantt, William, Gu, Weiwei, Chen, Tongfei, White, Aaron Steven, Van Durme, Benjamin

论文摘要

我们提出了一种新颖的迭代提取模型ITERX，用于提取复杂关系或模板（即，在文档中代表从命名插槽到跨文本跨度的n个tuplate）。文档可能具有任何给定类型的模板的零或更多实例，模板提取的任务需要识别文档中的模板并提取每个模板的插槽值。我们的模仿学习方法将问题作为马尔可夫决策过程（MDP），并缓解了使用预定义的模板订单训练提取器的必要性。它导致了两个已建立的基准测试的最新结果 - 在Scirex上提取4- ARY关系和MUC-4上的模板提取 - 以及对新的更好颗粒状任务的强大基线。

We present a novel iterative extraction model, IterX, for extracting complex relations, or templates (i.e., N-tuples representing a mapping from named slots to spans of text) within a document. Documents may feature zero or more instances of a template of any given type, and the task of template extraction entails identifying the templates in a document and extracting each template's slot values. Our imitation learning approach casts the problem as a Markov decision process (MDP), and relieves the need to use predefined template orders to train an extractor. It leads to state-of-the-art results on two established benchmarks -- 4-ary relation extraction on SciREX and template extraction on MUC-4 -- as well as a strong baseline on the new BETTER Granular task.

下载PDF全文

下载文献需遵守相关版权规定

论文标题