MC-BERT：通过元控制器进行有效的语言预培训

论文标题

MC-BERT：通过元控制器进行有效的语言预培训

MC-BERT: Efficient Language Pre-Training via a Meta Controller

论文作者

Xu, Zhenhui, Gong, Linyuan, Ke, Guolin, He, Di, Zheng, Shuxin, Wang, Liwei, Bian, Jiang, Liu, Tie-Yan

论文摘要

预训练的上下文表示（例如，BERT）已成为实现许多NLP任务最新结果的基础。但是，大规模的预训练在计算上是昂贵的。 Electra是一种加速预训练的早期尝试，它训练了一个判别模型，该模型预测了每个输入令牌是否被发电机代替。我们的研究表明，Electra的成功主要是由于其预训练任务的复杂性降低：二进制分类（更换令牌检测）比生成任务（蒙版语言建模）更有效地学习。但是，这种简化的任务在语义上的信息性不足。为了提高效率和有效性，我们提出了一个新型的元学习框架Mc-Bert。培训前任务是带有拒绝选项的多项选择披肩测试，其中元控制器网络提供培训输入和候选者。胶水自然语言理解基准的结果表明，我们提出的方法既有效又有效：在给定相同的计算预算的情况下，它在胶水语义任务上的表现优于基准。

Pre-trained contextual representations (e.g., BERT) have become the foundation to achieve state-of-the-art results on many NLP tasks. However, large-scale pre-training is computationally expensive. ELECTRA, an early attempt to accelerate pre-training, trains a discriminative model that predicts whether each input token was replaced by a generator. Our studies reveal that ELECTRA's success is mainly due to its reduced complexity of the pre-training task: the binary classification (replaced token detection) is more efficient to learn than the generation task (masked language modeling). However, such a simplified task is less semantically informative. To achieve better efficiency and effectiveness, we propose a novel meta-learning framework, MC-BERT. The pre-training task is a multi-choice cloze test with a reject option, where a meta controller network provides training input and candidates. Results over GLUE natural language understanding benchmark demonstrate that our proposed method is both efficient and effective: it outperforms baselines on GLUE semantic tasks given the same computational budget.

下载PDF全文

下载文献需遵守相关版权规定

论文标题