持续学习的独家超级策略子网培训

论文标题

持续学习的独家超级策略子网培训

Exclusive Supermask Subnetwork Training for Continual Learning

论文作者

Yadav, Prateek, Bansal, Mohit

论文摘要

持续学习（CL）方法集中于随着时间的推移积累知识，同时避免灾难性遗忘。最近，Wortsman等人。（2020）提出了一种CL方法，SUPSUP，使用随机初始化的固定基础网络（模型），并为每个新任务找到一个超级手机，该任务有选择地保留或删除每个重量以产生子网络。由于没有更新网络权重，他们会忘记忘记。尽管没有忘记，但SUPSUP的性能是次优的，因为固定权重限制了其代表力。此外，当学习新任务时，模型内没有知识的积累或转移。因此，我们提出了EXSSNET（独家超级武器子网络培训），该培训执行了独家和非重叠子网重量训练。这避免了通过随后的任务提高性能的同时仍然防止忘记的任务来避免对共享权重的更新。此外，我们提出了一个基于KNN的新知识转移（KKT）模块，该模块利用先前获得的知识来更好，更快地学习新任务。我们证明，EXSNET在NLP和视力域上都优于以前的强大方法，同时又可以防止忘记。此外，EXSSNET对于激活模型参数的2-10％的稀疏面膜尤其有利，导致平均提高8.3％的supsup。此外，EXSSNET缩放到大量任务（100）。我们的代码可在https://github.com/prateeky2806/exessnet上找到。

Continual Learning (CL) methods focus on accumulating knowledge over time while avoiding catastrophic forgetting. Recently, Wortsman et al. (2020) proposed a CL method, SupSup, which uses a randomly initialized, fixed base network (model) and finds a supermask for each new task that selectively keeps or removes each weight to produce a subnetwork. They prevent forgetting as the network weights are not being updated. Although there is no forgetting, the performance of SupSup is sub-optimal because fixed weights restrict its representational power. Furthermore, there is no accumulation or transfer of knowledge inside the model when new tasks are learned. Hence, we propose ExSSNeT (Exclusive Supermask SubNEtwork Training), that performs exclusive and non-overlapping subnetwork weight training. This avoids conflicting updates to the shared weights by subsequent tasks to improve performance while still preventing forgetting. Furthermore, we propose a novel KNN-based Knowledge Transfer (KKT) module that utilizes previously acquired knowledge to learn new tasks better and faster. We demonstrate that ExSSNeT outperforms strong previous methods on both NLP and Vision domains while preventing forgetting. Moreover, ExSSNeT is particularly advantageous for sparse masks that activate 2-10% of the model parameters, resulting in an average improvement of 8.3% over SupSup. Furthermore, ExSSNeT scales to a large number of tasks (100). Our code is available at https://github.com/prateeky2806/exessnet.

下载PDF全文

下载文献需遵守相关版权规定

论文标题