论文标题
上升:利用检索技术进行总结评估
RISE: Leveraging Retrieval Techniques for Summarization Evaluation
论文作者
论文摘要
评估自动生成的文本摘要是一项具有挑战性的任务。尽管有许多有趣的方法,但它们仍然没有人类评估。我们提出了RISE,这是一种通过利用信息检索技术来评估摘要的新方法。首先使用双编码器检索设置将RISE作为检索任务进行培训,然后可以使用输入文档评估生成的摘要,而无需黄金参考摘要。在新数据集上工作时,崛起特别适合,在该数据集上可能没有参考摘要可用于评估。我们对萨蒙基准测试(Fabbri等,2021)进行了全面的实验,结果表明,与过去的许多汇总评估方法相比,上升与人类评估的相关性更高。此外,Rise还证明了语言之间的数据效率和概括性。
Evaluating automatically-generated text summaries is a challenging task. While there have been many interesting approaches, they still fall short of human evaluations. We present RISE, a new approach for evaluating summaries by leveraging techniques from information retrieval. RISE is first trained as a retrieval task using a dual-encoder retrieval setup, and can then be subsequently utilized for evaluating a generated summary given an input document, without gold reference summaries. RISE is especially well suited when working on new datasets where one may not have reference summaries available for evaluation. We conduct comprehensive experiments on the SummEval benchmark (Fabbri et al., 2021) and the results show that RISE has higher correlation with human evaluations compared to many past approaches to summarization evaluation. Furthermore, RISE also demonstrates data-efficiency and generalizability across languages.