论文标题
MC-BERT:通过元控制器进行有效的语言预培训
MC-BERT: Efficient Language Pre-Training via a Meta Controller
论文作者
论文摘要
预训练的上下文表示(例如,BERT)已成为实现许多NLP任务最新结果的基础。但是,大规模的预训练在计算上是昂贵的。 Electra是一种加速预训练的早期尝试,它训练了一个判别模型,该模型预测了每个输入令牌是否被发电机代替。我们的研究表明,Electra的成功主要是由于其预训练任务的复杂性降低:二进制分类(更换令牌检测)比生成任务(蒙版语言建模)更有效地学习。但是,这种简化的任务在语义上的信息性不足。为了提高效率和有效性,我们提出了一个新型的元学习框架Mc-Bert。培训前任务是带有拒绝选项的多项选择披肩测试,其中元控制器网络提供培训输入和候选者。胶水自然语言理解基准的结果表明,我们提出的方法既有效又有效:在给定相同的计算预算的情况下,它在胶水语义任务上的表现优于基准。
Pre-trained contextual representations (e.g., BERT) have become the foundation to achieve state-of-the-art results on many NLP tasks. However, large-scale pre-training is computationally expensive. ELECTRA, an early attempt to accelerate pre-training, trains a discriminative model that predicts whether each input token was replaced by a generator. Our studies reveal that ELECTRA's success is mainly due to its reduced complexity of the pre-training task: the binary classification (replaced token detection) is more efficient to learn than the generation task (masked language modeling). However, such a simplified task is less semantically informative. To achieve better efficiency and effectiveness, we propose a novel meta-learning framework, MC-BERT. The pre-training task is a multi-choice cloze test with a reject option, where a meta controller network provides training input and candidates. Results over GLUE natural language understanding benchmark demonstrate that our proposed method is both efficient and effective: it outperforms baselines on GLUE semantic tasks given the same computational budget.