论文标题
平行注意机器翻译
Parallel Attention Forcing for Machine Translation
论文作者
论文摘要
基于注意力的自回归模型已在各种序列到序列任务中实现了最先进的性能,包括文本到语音(TTS)和神经机器翻译(NMT),但很难训练。标准的训练方法,教师强迫,指导一个具有参考后历史的模型。在推断过程中,必须使用生成的后历史。此不匹配限制了评估绩效。已经引入了注意力来解决不匹配,并以产生的背部史和参考的注意力指导模型。尽管在TTS等连续输出的任务中取得了成功,但注意力迫使在NMT等离散输出的任务中面临其他挑战。本文介绍了两种关注的扩展,迫使他们应对这些挑战。 (1)预定的注意力迫使迫使注意力强迫打开和关闭,这对于具有离散输出的任务至关重要。 (2)平行注意力强迫使训练平行,并且适用于基于变压器的模型。实验表明,提出的方法改善了基于RNN和变压器的模型的性能。
Attention-based autoregressive models have achieved state-of-the-art performance in various sequence-to-sequence tasks, including Text-To-Speech (TTS) and Neural Machine Translation (NMT), but can be difficult to train. The standard training approach, teacher forcing, guides a model with the reference back-history. During inference, the generated back-history must be used. This mismatch limits the evaluation performance. Attention forcing has been introduced to address the mismatch, guiding the model with the generated back-history and reference attention. While successful in tasks with continuous outputs like TTS, attention forcing faces additional challenges in tasks with discrete outputs like NMT. This paper introduces the two extensions of attention forcing to tackle these challenges. (1) Scheduled attention forcing automatically turns attention forcing on and off, which is essential for tasks with discrete outputs. (2) Parallel attention forcing makes training parallel, and is applicable to Transformer-based models. The experiments show that the proposed approaches improve the performance of models based on RNNs and Transformers.