论文标题

StreamingQA:随着时间的流逝,适应新知识的基准,回答模型

StreamingQA: A Benchmark for Adaptation to New Knowledge over Time in Question Answering Models

论文作者

Liška, Adam, Kočiský, Tomáš, Gribovskaya, Elena, Terzi, Tayfun, Sezener, Eren, Agrawal, Devang, d'Autume, Cyprien de Masson, Scholtes, Tim, Zaheer, Manzil, Young, Susannah, Gilsenan-McMahon, Ellen, Austin, Sophia, Blunsom, Phil, Lazaridou, Angeliki

论文摘要

通常在Wikipedia等知识的静态快照上研究了通过问答(QA)评估的模型的知识和语言理解。但是,我们的世界是动态的,随着时间的流逝而发展,我们的模型的知识变得过时了。为了研究半参数质量检查模型及其基本参数语言模型(LMS)如何适应不断发展的知识,我们构建了一个新的大型数据集,即streamingqa,并在给定日期中提出的人书写和生成问题,可以从14年的时间stamped新闻文章中回答。我们每季度评估我们的模型,因为他们阅读了预培训中未见的新文章。我们表明,可以在不完全重新培训的情况下更新参数模型,同时避免灾难性遗忘。对于半参数模型,将新文章添加到搜索空间中允许快速适应,但是,具有过时的LM底层LM的模型具有重新训练的LM。对于有关命名实体高频的问题,参数更新特别有益。在我们的动态世界中,StreamingQA数据集可以对QA模型进行更现实的评估,我们的实验突出了一些有希望的未来研究方向。

Knowledge and language understanding of models evaluated through question answering (QA) has been usually studied on static snapshots of knowledge, like Wikipedia. However, our world is dynamic, evolves over time, and our models' knowledge becomes outdated. To study how semi-parametric QA models and their underlying parametric language models (LMs) adapt to evolving knowledge, we construct a new large-scale dataset, StreamingQA, with human written and generated questions asked on a given date, to be answered from 14 years of time-stamped news articles. We evaluate our models quarterly as they read new articles not seen in pre-training. We show that parametric models can be updated without full retraining, while avoiding catastrophic forgetting. For semi-parametric models, adding new articles into the search space allows for rapid adaptation, however, models with an outdated underlying LM under-perform those with a retrained LM. For questions about higher-frequency named entities, parametric updates are particularly beneficial. In our dynamic world, the StreamingQA dataset enables a more realistic evaluation of QA models, and our experiments highlight several promising directions for future research.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源