通过光谱归一化身份在变压器模型中修剪冗余映射先验

论文标题

通过光谱归一化身份在变压器模型中修剪冗余映射先验

Pruning Redundant Mappings in Transformer Models via Spectral-Normalized Identity Prior

论文作者

Lin, Zi, Liu, Jeremiah Zhe, Yang, Zi, Hua, Nan, Roth, Dan

论文摘要

变压器模型的传统（非结构化）修剪方法专注于通过将其惩罚为零来正规化单个权重。在这项工作中，我们探索了频谱差异化的身份先验（SNIP），这是一种结构化的修剪方法，它将变压器模型中的整个残差模块朝着身份映射进行了惩罚。我们的方法通过在函数规范上应用阈值操作员来识别和丢弃残差连接中不重要的非线性映射。它适用于任何结构化模块，包括单个注意力头，整个注意力块或馈送子网。此外，我们引入了光谱归一化，以稳定变压器层的激活后值的分布，从而进一步提高了所提出的方法的修剪效率。我们在5个胶水基准任务上使用BERT进行实验，以证明SNIP在保持可比性能的同时可以实现有效的修剪结果。具体而言，我们在50％的压缩率下平均将最新的性能提高了0.5至1.0％。

Traditional (unstructured) pruning methods for a Transformer model focus on regularizing the individual weights by penalizing them toward zero. In this work, we explore spectral-normalized identity priors (SNIP), a structured pruning approach that penalizes an entire residual module in a Transformer model toward an identity mapping. Our method identifies and discards unimportant non-linear mappings in the residual connections by applying a thresholding operator on the function norm. It is applicable to any structured module, including a single attention head, an entire attention block, or a feed-forward subnetwork. Furthermore, we introduce spectral normalization to stabilize the distribution of the post-activation values of the Transformer layers, further improving the pruning effectiveness of the proposed methodology. We conduct experiments with BERT on 5 GLUE benchmark tasks to demonstrate that SNIP achieves effective pruning results while maintaining comparable performance. Specifically, we improve the performance over the state-of-the-art by 0.5 to 1.0% on average at 50% compression ratio.

下载PDF全文

下载文献需遵守相关版权规定

论文标题