在信息文本中检测叙事元素

论文标题

在信息文本中检测叙事元素

Detecting Narrative Elements in Informational Text

论文作者

Levi, Effi, Mor, Guy, Sheafer, Tamir, Shenhav, Shaul R.

论文摘要

在过去的几年中，从文本中自动提取叙事元素，将叙事理论与计算模型相结合，一直受到越来越多的关注。先前的作品利用了Labov和Waletzky的口头叙事理论来识别个人故事文本中的各种叙事元素。取而代之的是，我们将重点引向信息文本，特别是新闻报道。我们介绍了整洁的（叙事元素注释） - 一种新颖的NLP任务，用于检测原始文本中的叙事元素。为此，我们设计了一种新的多标签叙事注释方案，它更适合信息文本（例如新闻媒体），通过调整Labov和Waletzky叙事理论（并发和解决）的叙事理论，并添加了我们自己的（成功）的新叙事元素。然后，我们使用此方案注释了2209个句子的新数据集，这些数据集由来自各个类别域的46篇新闻文章编译。我们在注释数据集的几个不同的设置中训练了许多监督模型，以识别不同的叙述元素，达到的F1平均得分高达0.77。结果证明了我们注释方案的整体性质以及对领域类别的鲁棒性。

Automatic extraction of narrative elements from text, combining narrative theories with computational models, has been receiving increasing attention over the last few years. Previous works have utilized the oral narrative theory by Labov and Waletzky to identify various narrative elements in personal stories texts. Instead, we direct our focus to informational texts, specifically news stories. We introduce NEAT (Narrative Elements AnnoTation) - a novel NLP task for detecting narrative elements in raw text. For this purpose, we designed a new multi-label narrative annotation scheme, better suited for informational text (e.g. news media), by adapting elements from the narrative theory of Labov and Waletzky (Complication and Resolution) and adding a new narrative element of our own (Success). We then used this scheme to annotate a new dataset of 2,209 sentences, compiled from 46 news articles from various category domains. We trained a number of supervised models in several different setups over the annotated dataset to identify the different narrative elements, achieving an average F1 score of up to 0.77. The results demonstrate the holistic nature of our annotation scheme as well as its robustness to domain category.

下载PDF全文

下载文献需遵守相关版权规定

论文标题