memformer：用于序列建模的内存功能增强变压器

论文标题

memformer：用于序列建模的内存功能增强变压器

Memformer: A Memory-Augmented Transformer for Sequence Modeling

论文作者

Wu, Qingyang, Lan, Zhenzhong, Qian, Kun, Gu, Jing, Geramifard, Alborz, Yu, Zhou

论文摘要

变压器在序列建模方面取得了显着成功。但是，这些模型存在效率问题，因为它们需要将所有历史记录令牌表示为内存。我们提出了一种用于序列建模的有效神经网络的Memformer，它利用外部动态内存来编码和检索过去的信息。处理长序列时，我们的模型可实现线性时间的复杂性和恒定的内存空间复杂性。我们还提出了一种新的优化方案，即记忆重播后传播（MRBP），该方案通过随时间而促进远程后填充，并大大减少内存要求。实验结果表明，与基准相比，通过使用8.1倍的记忆空间和推断速度，膜片的性能与基准相比获得了可比的性能。对注意力模式的分析表明，我们的外部记忆插槽可以通过时间步长编码和保留重要信息。

Transformers have reached remarkable success in sequence modeling. However, these models have efficiency issues as they need to store all the history token-level representations as memory. We present Memformer, an efficient neural network for sequence modeling, that utilizes an external dynamic memory to encode and retrieve past information. Our model achieves linear time complexity and constant memory space complexity when processing long sequences. We also propose a new optimization scheme, memory replay back-propagation (MRBP), which promotes long-range back-propagation through time with a significantly reduced memory requirement. Experimental results show that Memformer has achieved comparable performance compared to the baselines by using 8.1x less memory space and 3.2x faster on inference. Analysis of the attention pattern shows that our external memory slots can encode and retain important information through timesteps.

下载PDF全文

下载文献需遵守相关版权规定

论文标题