故事师：通过排名，评级和推理的自动故事评估

论文标题

故事师：通过排名，评级和推理的自动故事评估

StoryER: Automatic Story Evaluation via Ranking, Rating and Reasoning

论文作者

Chen, Hong, Vo, Duc Minh, Takamura, Hiroya, Miyao, Yusuke, Nakayama, Hideki

论文摘要

现有的自动故事评估方法对故事词汇水平的一致性提出了溢价，这与人类的喜好偏离。我们通过考虑一种小说\ textbf {story} \ textbf {e}的评估方法来超越这种限制，该方法在判断一个故事时模仿人类的偏好，即\ textbf {storyer}，该方法由三个子任务组成：\ \ textbf {r} anking anking anking anking，\ textbff {r textbf {r textbf {r} rissim ansim和\ resound ansim。鉴于机器生成的故事或人写的故事，故事师需要机器才能输出1）与人类偏好相对应的偏好分数，2）特定的评分及其相应的信心以及3）对各个方面的评论（例如，开放，角色形象）。为了支持这些任务，我们引入了一个井井有条的数据集，其中包括（i）100K排名的故事对；（ii）关于故事的各个方面的一组46K评级和评论。我们在收集的数据集中fineTune longformer-编码器编码器（LED），编码器负责优先评分和方面预测以及评论生成的解码器。我们的全面实验为每个任务提供了竞争性基准，显示了与人类偏好的高度相关性。此外，我们目睹了偏好分数的联合学习，方面评级以及评论为每个任务带来了收益。我们的数据集和基准标准可公开用于推进故事评估任务的研究。

Existing automatic story evaluation methods place a premium on story lexical level coherence, deviating from human preference. We go beyond this limitation by considering a novel \textbf{Story} \textbf{E}valuation method that mimics human preference when judging a story, namely \textbf{StoryER}, which consists of three sub-tasks: \textbf{R}anking, \textbf{R}ating and \textbf{R}easoning. Given either a machine-generated or a human-written story, StoryER requires the machine to output 1) a preference score that corresponds to human preference, 2) specific ratings and their corresponding confidences and 3) comments for various aspects (e.g., opening, character-shaping). To support these tasks, we introduce a well-annotated dataset comprising (i) 100k ranked story pairs; and (ii) a set of 46k ratings and comments on various aspects of the story. We finetune Longformer-Encoder-Decoder (LED) on the collected dataset, with the encoder responsible for preference score and aspect prediction and the decoder for comment generation. Our comprehensive experiments result in a competitive benchmark for each task, showing the high correlation to human preference. In addition, we have witnessed the joint learning of the preference scores, the aspect ratings, and the comments brings gain in each single task. Our dataset and benchmarks are publicly available to advance the research of story evaluation tasks.\footnote{Dataset and pre-trained model demo are available at anonymous website \url{http://storytelling-lab.com/eval} and \url{https://github.com/sairin1202/StoryER}}

下载PDF全文

下载文献需遵守相关版权规定

论文标题