关于科学出版物的全文论证挖掘

论文标题

关于科学出版物的全文论证挖掘

Full-Text Argumentation Mining on Scientific Publications

论文作者

Binder, Arne, Verma, Bhuvanesh, Hennig, Leonhard

论文摘要

学术论证挖掘（SAM）最近由于其潜力帮助学者们快速增长的科学文献而引起了人们的关注。它包括两个子任务：论证性话语单位识别（ADUR）和论证关系提取（AS），这两者都具有挑战性，因为它们需要例如域知识的整合，隐式陈述的检测以及参数结构的歧义。尽管以前的工作着重于针对特定文档部分（例如摘要或结果）的数据集构建和基线方法，但全文学术论证挖掘的进展很少。在这项工作中，我们引入了一个结合ADUR的顺序管道模型，并适用于全文SAM，并首先分析了两个子任务上的审计语言模型（PLM）的性能。我们在Sci-Arg语料库上建立了一个新的ADUR SOTA，表现优于先前的最佳报道结果（+7％F1）。我们还提出了此基准数据集上的第一个结果，因此对于完整的AM管道。我们的详细错误分析表明，非连续的ADU以及对话语连接器的解释提出了重大挑战，并且数据注释需要更加一致。

Scholarly Argumentation Mining (SAM) has recently gained attention due to its potential to help scholars with the rapid growth of published scientific literature. It comprises two subtasks: argumentative discourse unit recognition (ADUR) and argumentative relation extraction (ARE), both of which are challenging since they require e.g. the integration of domain knowledge, the detection of implicit statements, and the disambiguation of argument structure. While previous work focused on dataset construction and baseline methods for specific document sections, such as abstract or results, full-text scholarly argumentation mining has seen little progress. In this work, we introduce a sequential pipeline model combining ADUR and ARE for full-text SAM, and provide a first analysis of the performance of pretrained language models (PLMs) on both subtasks. We establish a new SotA for ADUR on the Sci-Arg corpus, outperforming the previous best reported result by a large margin (+7% F1). We also present the first results for ARE, and thus for the full AM pipeline, on this benchmark dataset. Our detailed error analysis reveals that non-contiguous ADUs as well as the interpretation of discourse connectors pose major challenges and that data annotation needs to be more consistent.

下载PDF全文

下载文献需遵守相关版权规定

论文标题