DC-MBR：最低贝叶斯风险解码的分配冷却

论文标题

DC-MBR：最低贝叶斯风险解码的分配冷却

DC-MBR: Distributional Cooling for Minimum Bayesian Risk Decoding

论文作者

Yan, Jianhao, Xu, Jin, Meng, Fandong, Zhou, Jie, Zhang, Yue

论文摘要

最低贝叶斯风险解码（MBR）是神经机器翻译中有希望的解码算法。但是，MBR在标签平滑度方面的表现不佳，这令人惊讶，因为标签平滑性可通过梁搜索提供体面的改进，并改善了各种任务的一般性。在这项工作中，我们表明问题源于在令牌级别和序列级分布上标签平滑的不一致性。我们证明，即使标签平滑仅引起令牌级别的略有变化，序列级别的分布也很偏斜。我们将问题\ emph {自动回旋过度平滑度}造成。为了解决这个问题，我们提出了一种简单有效的方法，即分布冷却MBR（DC-MBR），该方法通过调低软磁性温度来操纵输出分布的熵。从理论上讲，我们证明了预先调整标签平滑因子和分布冷却之间的等效性。 NMT基准测试的广泛实验验证了分布冷却可改善各种环境中的MBR。

Minimum Bayesian Risk Decoding (MBR) emerges as a promising decoding algorithm in Neural Machine Translation. However, MBR performs poorly with label smoothing, which is surprising as label smoothing provides decent improvement with beam search and improves generality in various tasks. In this work, we show that the issue arises from the un-consistency of label smoothing on the token-level and sequence-level distributions. We demonstrate that even though label smoothing only causes a slight change in the token-level, the sequence-level distribution is highly skewed. We coin the issue \emph{autoregressive over-smoothness}. To address this issue, we propose a simple and effective method, Distributional Cooling MBR (DC-MBR), which manipulates the entropy of output distributions by tuning down the Softmax temperature. We theoretically prove the equivalence between pre-tuning label smoothing factor and distributional cooling. Extensive experiments on NMT benchmarks validate that distributional cooling improves MBR in various settings.

下载PDF全文

下载文献需遵守相关版权规定

论文标题