AF适配器：构建中国生物医学模型的持续预测

论文标题

AF适配器：构建中国生物医学模型的持续预测

AF Adapter: Continual Pretraining for Building Chinese Biomedical Language Model

论文作者

Yan, Yongyu, Xue, Kui, Shi, Xiaoming, Ye, Qi, Liu, Jingping, Ruan, Tong

论文摘要

持续预处理是从通用域语言模型中构建特定领域的审慎语言模型的一种流行方式。尽管其效率很高，但持续的预处理仍遭受灾难性遗忘，这可能会损害模型在下游任务中的表现。为了减轻该问题，在本文中，我们为基于BERT的模型提出了一种持续的预处理方法，该模型名为FARKIT-FFN适配器。它的主要思想是在每个自我发挥层和前馈网络中引入少量注意力头和隐藏单元。此外，我们为中国生物医学领域的基于AF Adapter的罗伯塔（Roberta）培训了一个特定领域的语言模型。在实验中，模型应用于下游任务以进行评估。结果表明，与强质基线相比，AF适配器只有约17％的模型参数训练，AF适配器平均达到2％的性能增长。进一步的实验结果表明，与微调方法相比，我们的方法减轻了灾难性遗忘问题11％。

Continual pretraining is a popular way of building a domain-specific pretrained language model from a general-domain language model. In spite of its high efficiency, continual pretraining suffers from catastrophic forgetting, which may harm the model's performance in downstream tasks. To alleviate the issue, in this paper, we propose a continual pretraining method for the BERT-based model, named Attention-FFN Adapter. Its main idea is to introduce a small number of attention heads and hidden units inside each self-attention layer and feed-forward network. Furthermore, we train a domain-specific language model named AF Adapter based RoBERTa for the Chinese biomedical domain. In experiments, models are applied to downstream tasks for evaluation. The results demonstrate that with only about 17% of model parameters trained, AF Adapter achieves 0.6%, 2% gain in performance on average, compared to strong baselines. Further experimental results show that our method alleviates the catastrophic forgetting problem by 11% compared to the fine-tuning method.

下载PDF全文

下载文献需遵守相关版权规定

论文标题