辩论：大规模参数挖掘和摘要数据集

论文标题

辩论：大规模参数挖掘和摘要数据集

DebateSum: A large-scale argument mining and summarization dataset

论文作者

Roush, Allen, Balaji, Arvind

论文摘要

在论证开采中的先前工作经常暗示其在自动辩论系统中的潜在应用。尽管有这种重点，但几乎没有将自然语言处理技术应用于竞争性正式辩论中的问题的数据集或模型。为了解决这个问题，我们介绍了辩论数据集。辩论由187,386个独特的证据组成，并具有相应的论点和提取性摘要。辩论是使用竞争对手在7年期间由国家演讲和辩论协会中的竞争对手编写的数据进行的。我们训练多个变压器摘要模型，以基准在辩论中进行汇总表现。我们还介绍了一组FastText Word向量，该媒介在辩论中训练了Debate2Vec。最后，我们为该数据集提供了一个搜索引擎，该数据集被当今国家演讲和辩论协会的成员广泛使用。 DEBATESUM搜索引擎可在此处向公众使用：http：//www.debate.cards。

Prior work in Argument Mining frequently alludes to its potential applications in automatic debating systems. Despite this focus, almost no datasets or models exist which apply natural language processing techniques to problems found within competitive formal debate. To remedy this, we present the DebateSum dataset. DebateSum consists of 187,386 unique pieces of evidence with corresponding argument and extractive summaries. DebateSum was made using data compiled by competitors within the National Speech and Debate Association over a 7-year period. We train several transformer summarization models to benchmark summarization performance on DebateSum. We also introduce a set of fasttext word-vectors trained on DebateSum called debate2vec. Finally, we present a search engine for this dataset which is utilized extensively by members of the National Speech and Debate Association today. The DebateSum search engine is available to the public here: http://www.debate.cards

下载PDF全文

下载文献需遵守相关版权规定

论文标题