预先训练语言模型的层归一化参数调整

论文标题

预先训练语言模型的层归一化参数调整

Parameter-Efficient Tuning on Layer Normalization for Pre-trained Language Models

论文作者

Qi, Wang, Ruan, Yu-Ping, Zuo, Yuan, Li, Taihao

论文摘要

鉴于当前的预训练语言模型的规模，传统的微调遇到的困难增加了困难，这使得参数有效的调整成为边境研究的焦点。该字段中的先前方法将可调型适配器添加到变压器块的MHA或/和FFN中，以使PLMS达到可传递性。但是，作为变压器体系结构的重要组成部分，忽略了参数效率调整的图层归一化的功能。在本文中，我们首先提出LN调整，通过仅使用0.03 \％参数调整层归一化模块的增益和偏置项，该参数具有很高的时间效率，并且比小于0.1 \％可调参数的基准线显着优于。此外，我们研究了将LN调整与以前的统一框架相结合的统一框架，我们发现：（1）结合前缀调整的统一框架，基于适配器的方法，用于MHA的方法以及LN-Tuning sota sota性能。（2）统一的框架同时调整MHA和分层可以提高性能，但是调整FFN和分层同时进行的统一框架会导致性能下降。消融研究验证LN调整的参数没有丰富的参数，并进一步理解了它。

Conventional fine-tuning encounters increasing difficulties given the size of current Pre-trained Language Models, which makes parameter-efficient tuning become the focal point of frontier research. Previous methods in this field add tunable adapters into MHA or/and FFN of Transformer blocks to enable PLMs achieve transferability. However, as an important part of Transformer architecture, the power of layer normalization for parameter-efficent tuning is ignored. In this paper, we first propose LN-tuning, by tuning the gain and bias term of Layer Normalization module with only 0.03\% parameters, which is of high time-efficency and significantly superior to baselines which are less than 0.1\% tunable parameters. Further, we study the unified framework of combining LN-tuning with previous ones and we find that: (1) the unified framework of combining prefix-tuning, the adapter-based method working on MHA, and LN-tuning achieves SOTA performance. (2) unified framework which tunes MHA and LayerNorm simultaneously can get performance improvement but those which tune FFN and LayerNorm simultaneous will cause performance decrease. Ablation study validates LN-tuning is of no abundant parameters and gives a further understanding of it.

下载PDF全文

下载文献需遵守相关版权规定

论文标题