论文标题
更快的深度自适应变压器
Faster Depth-Adaptive Transformers
论文作者
论文摘要
深度自适应神经网络可以根据输入单词的硬度动态调整深度,从而提高效率。主要的挑战是如何测量这种硬度并决定进行所需的深度(即层)进行。以前的工作通常会建立一个停止单元,以确定计算应继续还是在每一层停止。由于没有对深度选择的具体监督,因此停止单元可能不优化和不准确,这会在建模句子时会导致次优和不稳定的性能。在本文中,我们摆脱了停止单元,并提前估算了所需的深度,从而产生了更快的深度自适应模型。具体而言,提出了两种方法来明确测量输入单词的硬度并估算相应的自适应深度,即1)基于相互信息(MI)估计和2)基于重建损失的估计。我们对文本分类任务进行了实验,其中24个数据集中有各种尺寸和域。结果证实,我们的方法可以加快香草变压器(最高7倍)的速度,同时保持高精度。此外,与其他深度自适应方法相比,效率和鲁棒性可显着提高。
Depth-adaptive neural networks can dynamically adjust depths according to the hardness of input words, and thus improve efficiency. The main challenge is how to measure such hardness and decide the required depths (i.e., layers) to conduct. Previous works generally build a halting unit to decide whether the computation should continue or stop at each layer. As there is no specific supervision of depth selection, the halting unit may be under-optimized and inaccurate, which results in suboptimal and unstable performance when modeling sentences. In this paper, we get rid of the halting unit and estimate the required depths in advance, which yields a faster depth-adaptive model. Specifically, two approaches are proposed to explicitly measure the hardness of input words and estimate corresponding adaptive depth, namely 1) mutual information (MI) based estimation and 2) reconstruction loss based estimation. We conduct experiments on the text classification task with 24 datasets in various sizes and domains. Results confirm that our approaches can speed up the vanilla Transformer (up to 7x) while preserving high accuracy. Moreover, efficiency and robustness are significantly improved when compared with other depth-adaptive approaches.