Wakeupnet：用于端到端流语音触发的基于移动转换器的框架

论文标题

Wakeupnet：用于端到端流语音触发的基于移动转换器的框架

WakeUpNet: A Mobile-Transformer based Framework for End-to-End Streaming Voice Trigger

论文作者

Zhang, Zixing, Farnsworth, Thorin, Lin, Senling, Karout, Salah

论文摘要

端到端型号逐渐成为语音触发的主要技术流，旨在达到最大的预测准确性，但占地面积很小。在目前的论文中，我们提出了一个端到端语音触发框架，即Wakeupnet，该框架基本上是在变压器编码器上构成的。该框架的目的是探索变压器的上下文捕获能力，因为顺序信息对于唤醒字检测至关重要。但是，常规的变压器编码器太大了，无法符合我们的任务。为了解决这个问题，我们引入了不同的模型压缩方法，以将香草缩小为一个小的一种，称为移动转换器。为了评估移动转换器的性能，我们对大型公共可用数据集喜剧进行了广泛的实验。获得的结果表明，在干净和嘈杂的情况下，引入的移动转换器显着优于其他常用的语音触发模型。

End-to-end models have gradually become the main technical stream for voice trigger, aiming to achieve an utmost prediction accuracy but with a small footprint. In present paper, we propose an end-to-end voice trigger framework, namely WakeupNet, which is basically structured on a Transformer encoder. The purpose of this framework is to explore the context-capturing capability of Transformer, as sequential information is vital for wakeup-word detection. However, the conventional Transformer encoder is too large to fit our task. To address this issue, we introduce different model compression approaches to shrink the vanilla one into a tiny one, called mobile-Transformer. To evaluate the performance of mobile-Transformer, we conduct extensive experiments on a large public-available dataset HiMia. The obtained results indicate that introduced mobile-Transformer significantly outperforms other frequently used models for voice trigger in both clean and noisy scenarios.

下载PDF全文

下载文献需遵守相关版权规定

论文标题