论文标题
记忆变压器
Memorizing Transformers
论文作者
论文摘要
语言模型通常需要接受培训或进行训练,以获取新知识,这涉及更新其权重。相反,我们设想了可以简单地在推理时间读取和记住新数据的语言模型,从而立即获取新知识。在这项工作中,我们扩展了语言模型,能够记住过去输入的内部表示。我们证明,近似KNN查找对最近(键,值)对的不可差的内存,可以改善各种基准和任务的语言建模,包括通用WebText(C4),Math Papers(Arxiv),Books(PG-19),Books(PG-19),Code(Github),以及正式的理论(Isabelle)。我们表明,当我们将记忆尺寸提高到262K代币时,性能会稳步改善。在包括代码和数学在内的基准上,我们发现该模型能够在测试时间内利用新定义的功能和定理。
Language models typically need to be trained or finetuned in order to acquire new knowledge, which involves updating their weights. We instead envision language models that can simply read and memorize new data at inference time, thus acquiring new knowledge immediately. In this work, we extend language models with the ability to memorize the internal representations of past inputs. We demonstrate that an approximate kNN lookup into a non-differentiable memory of recent (key, value) pairs improves language modeling across various benchmarks and tasks, including generic webtext (C4), math papers (arXiv), books (PG-19), code (Github), as well as formal theorems (Isabelle). We show that the performance steadily improves when we increase the size of memory up to 262K tokens. On benchmarks including code and mathematics, we find that the model is capable of making use of newly defined functions and theorems during test time.