论文标题
Bertvision-一种用于答案的参数效率方法
BERTVision -- A Parameter-Efficient Approach for Question Answering
论文作者
论文摘要
我们提出了一种高度参数有效的方法来回答,该方法大大减少了扩展BERT微调的需求。我们的方法使用每个BERT变压器层的隐藏状态激活中的信息,该信息在典型的BERT推理过程中被丢弃。我们的最佳模型在培训时间的一小部分以及GPU或TPU费用的一小部分中实现了最大的BERT性能。通过将模型与BERTS预测结合在一起,进一步提高了性能。此外,我们发现使用较少的训练数据可以实现QA SPAN注释的几乎最佳性能。我们的实验表明,这种方法不仅适用于跨度注释,而且对分类也很好,这表明它可能对更广泛的任务可以扩展。
We present a highly parameter efficient approach for Question Answering that significantly reduces the need for extended BERT fine-tuning. Our method uses information from the hidden state activations of each BERT transformer layer, which is discarded during typical BERT inference. Our best model achieves maximal BERT performance at a fraction of the training time and GPU or TPU expense. Performance is further improved by ensembling our model with BERTs predictions. Furthermore, we find that near optimal performance can be achieved for QA span annotation using less training data. Our experiments show that this approach works well not only for span annotation, but also for classification, suggesting that it may be extensible to a wider range of tasks.