论文标题

分解的空白阈值,以提高神经传感器的运行时效率

Factorized Blank Thresholding for Improved Runtime Efficiency of Neural Transducers

论文作者

Le, Duc, Seide, Frank, Wang, Yuhao, Li, Yang, Schubert, Kjell, Kalinli, Ozlem, Seltzer, Michael L.

论文摘要

我们展示了RNN-T的输出分布如何显着降低计算成本和功能消耗,以且精确度损失而没有损失。随着神经变形器类型模型(如RNN-T)的普及,用于设备ASR的RNN-T,优化RNN-T的运行时效率引起了极大的兴趣。虽然先前的工作主要集中在RNN-T的声学编码器和预测器的优化上,但本文将注意力集中在细木工上。我们表明,尽管木匠只是RNN-T的一小部分,但对整个模型的运行时效率有很大影响。我们建议利用帽子式的木匠分解,以便在空白概率超过一定阈值时跳过更昂贵的非空白计算。由于可以非常有效地计算空白概率,并且RNN-T输出以空白为主,因此我们提出的方法可导致26-30%的解码加速,而在ex依易启动消耗的降低43-53%,同时又不会导致准确性降级,并且相对易于实施。

We show how factoring the RNN-T's output distribution can significantly reduce the computation cost and power consumption for on-device ASR inference with no loss in accuracy. With the rise in popularity of neural-transducer type models like the RNN-T for on-device ASR, optimizing RNN-T's runtime efficiency is of great interest. While previous work has primarily focused on the optimization of RNN-T's acoustic encoder and predictor, this paper focuses the attention on the joiner. We show that despite being only a small part of RNN-T, the joiner has a large impact on the overall model's runtime efficiency. We propose to utilize HAT-style joiner factorization for the purpose of skipping the more expensive non-blank computation when the blank probability exceeds a certain threshold. Since the blank probability can be computed very efficiently and the RNN-T output is dominated by blanks, our proposed method leads to a 26-30% decoding speed-up and 43-53% reduction in on-device power consumption, all the while incurring no accuracy degradation and being relatively simple to implement.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源