呼吁在光束搜索中清楚的清晰度：它的工作原理以及何时停止

论文标题

呼吁在光束搜索中清楚的清晰度：它的工作原理以及何时停止

A Call for Clarity in Beam Search: How It Works and When It Stops

论文作者

Kasai, Jungo, Sakaguchi, Keisuke, Bras, Ronan Le, Radev, Dragomir, Choi, Yejin, Smith, Noah A.

论文摘要

具有光束搜索的文本生成已被证明在广泛的应用中成功。我们指出的是，尽管在文献中很大程度上被忽略了，但横梁解码的常用实现（例如，拥抱的脸型变形金刚和Fairseq）使用了第一个启发式的启发式启发式：它可以将一组已经完成的序列置于时间步长，并在此设置的大小达到光束大小时停止。基于这一发现，我们引入了一个耐心因素，对该光束解码实现的简单修改，从而概括了停止标准并为搜索深度提供了灵活性。经验结果表明，调整这种耐心因素可以改善新闻文本摘要和机器翻译对各种语言对的解码性能，并且可以忽略不计。我们的方法仅修改一行代码，因此可以很容易地将其纳入任何实现中。此外，我们发现，光束解码的不同版本会导致汇总性能差异很大，这表明需要清楚地指定研究工作中的光束搜索实现。我们的代码将在出版后提供。

Text generation with beam search has proven successful in a wide range of applications. We point out that, though largely overlooked in the literature, the commonly-used implementation of beam decoding (e.g., Hugging Face Transformers and fairseq) uses a first come, first served heuristic: it keeps a set of already completed sequences over time steps and stops when the size of this set reaches the beam size. Based on this finding, we introduce a patience factor, a simple modification to this beam decoding implementation, that generalizes the stopping criterion and provides flexibility to the depth of search. Empirical results demonstrate that adjusting this patience factor improves decoding performance of strong pretrained models on news text summarization and machine translation over diverse language pairs, with a negligible inference slowdown. Our approach only modifies one line of code and can be thus readily incorporated in any implementation. Further, we find that different versions of beam decoding result in large performance differences in summarization, demonstrating the need for clarity in specifying the beam search implementation in research work. Our code will be available upon publication.

下载PDF全文

下载文献需遵守相关版权规定

论文标题