论文标题
通过模仿学习的迭代文档级信息提取
Iterative Document-level Information Extraction via Imitation Learning
论文作者
论文摘要
我们提出了一种新颖的迭代提取模型ITERX,用于提取复杂关系或模板(即,在文档中代表从命名插槽到跨文本跨度的n个tuplate)。文档可能具有任何给定类型的模板的零或更多实例,模板提取的任务需要识别文档中的模板并提取每个模板的插槽值。我们的模仿学习方法将问题作为马尔可夫决策过程(MDP),并缓解了使用预定义的模板订单训练提取器的必要性。它导致了两个已建立的基准测试的最新结果 - 在Scirex上提取4- ARY关系和MUC-4上的模板提取 - 以及对新的更好颗粒状任务的强大基线。
We present a novel iterative extraction model, IterX, for extracting complex relations, or templates (i.e., N-tuples representing a mapping from named slots to spans of text) within a document. Documents may feature zero or more instances of a template of any given type, and the task of template extraction entails identifying the templates in a document and extracting each template's slot values. Our imitation learning approach casts the problem as a Markov decision process (MDP), and relieves the need to use predefined template orders to train an extractor. It leads to state-of-the-art results on two established benchmarks -- 4-ary relation extraction on SciREX and template extraction on MUC-4 -- as well as a strong baseline on the new BETTER Granular task.