分解的空白阈值，以提高神经传感器的运行时效率

论文标题

分解的空白阈值，以提高神经传感器的运行时效率

Factorized Blank Thresholding for Improved Runtime Efficiency of Neural Transducers

论文作者

Le, Duc, Seide, Frank, Wang, Yuhao, Li, Yang, Schubert, Kjell, Kalinli, Ozlem, Seltzer, Michael L.

论文摘要

我们展示了RNN-T的输出分布如何显着降低计算成本和功能消耗，以且精确度损失而没有损失。随着神经变形器类型模型（如RNN-T）的普及，用于设备ASR的RNN-T，优化RNN-T的运行时效率引起了极大的兴趣。虽然先前的工作主要集中在RNN-T的声学编码器和预测器的优化上，但本文将注意力集中在细木工上。我们表明，尽管木匠只是RNN-T的一小部分，但对整个模型的运行时效率有很大影响。我们建议利用帽子式的木匠分解，以便在空白概率超过一定阈值时跳过更昂贵的非空白计算。由于可以非常有效地计算空白概率，并且RNN-T输出以空白为主，因此我们提出的方法可导致26-30％的解码加速，而在ex依易启动消耗的降低43-53％，同时又不会导致准确性降级，并且相对易于实施。

We show how factoring the RNN-T's output distribution can significantly reduce the computation cost and power consumption for on-device ASR inference with no loss in accuracy. With the rise in popularity of neural-transducer type models like the RNN-T for on-device ASR, optimizing RNN-T's runtime efficiency is of great interest. While previous work has primarily focused on the optimization of RNN-T's acoustic encoder and predictor, this paper focuses the attention on the joiner. We show that despite being only a small part of RNN-T, the joiner has a large impact on the overall model's runtime efficiency. We propose to utilize HAT-style joiner factorization for the purpose of skipping the more expensive non-blank computation when the blank probability exceeds a certain threshold. Since the blank probability can be computed very efficiently and the RNN-T output is dominated by blanks, our proposed method leads to a 26-30% decoding speed-up and 43-53% reduction in on-device power consumption, all the while incurring no accuracy degradation and being relatively simple to implement.

下载PDF全文

下载文献需遵守相关版权规定

论文标题