有限的记忆力有限

论文标题

有限的记忆力有限

Efficient Meta Lifelong-Learning with Limited Memory

论文作者

Wang, Zirui, Mehta, Sanket Vaibhav, Póczos, Barnabás, Carbonell, Jaime

论文摘要

当前的自然语言处理模型在一项任务上效果很好，但是他们常常无法不断学习新任务，而不会忘记以前的任务，因为它们在一生中都经过了重新训练，这是一个被称为终身学习的挑战。最先进的终身语言学习方法将过去的示例存储在情节记忆中，并在培训和推理时间重播它们。但是，正如我们在实验中稍后显示的那样，有三个重要的障碍：（1）需要不切实际的大型内存模块来实现良好的性能，（2）遭受负转移的痛苦，（3）需要每个测试示例的多个局部适应步骤，这显着降低了推断速度。在本文中，我们确定了终身学习方法的三个共同原则，并提出了一个有效的元阵容框架，以协同的方式将它们结合在一起。为了达到样本效率，我们的方法以一种学习局部适应性更好的初始化的方式来训练模型。关于文本分类和问题回答基准测试的广泛实验通过仅使用1％的记忆大小来实现最先进的性能，证明了我们框架的有效性，并通过多任务学习缩小了差距。我们进一步表明，我们的方法同时减轻了灾难性的遗忘和负面转移。

Current natural language processing models work well on a single task, yet they often fail to continuously learn new tasks without forgetting previous ones as they are re-trained throughout their lifetime, a challenge known as lifelong learning. State-of-the-art lifelong language learning methods store past examples in episodic memory and replay them at both training and inference time. However, as we show later in our experiments, there are three significant impediments: (1) needing unrealistically large memory module to achieve good performance, (2) suffering from negative transfer, (3) requiring multiple local adaptation steps for each test example that significantly slows down the inference speed. In this paper, we identify three common principles of lifelong learning methods and propose an efficient meta-lifelong framework that combines them in a synergistic fashion. To achieve sample efficiency, our method trains the model in a manner that it learns a better initialization for local adaptation. Extensive experiments on text classification and question answering benchmarks demonstrate the effectiveness of our framework by achieving state-of-the-art performance using merely 1% memory size and narrowing the gap with multi-task learning. We further show that our method alleviates both catastrophic forgetting and negative transfer at the same time.

下载PDF全文

下载文献需遵守相关版权规定

论文标题