论文标题
Convmixer:功能交互式卷积,课程学习用于小足迹和嘈杂的远场关键字点。
ConvMixer: Feature Interactive Convolution with Curriculum Learning for Small Footprint and Noisy Far-field Keyword Spotting
论文作者
论文摘要
在神经语音处理中建立有效的体系结构对于关键字发现部署的成功至关重要。但是,轻巧模型可以通过简洁的神经操作实现噪声稳健性非常具有挑战性。在实际应用程序中,用户环境通常嘈杂,也可能包含混响。我们提出了一种新型特征交互式卷积模型,其中仅100K参数可以在嘈杂的远场状态下解决此问题。提出了交互式单元,代替了通过更有效的计算来促进信息流的注意模块。此外,采用基于课程的多条件培训来获得更好的噪声稳健性。我们的模型在Google语音命令V2-12上实现了98.2%的TOP-1准确性,并且在设计的噪声条件下对大型变压器模型具有竞争力。
Building efficient architecture in neural speech processing is paramount to success in keyword spotting deployment. However, it is very challenging for lightweight models to achieve noise robustness with concise neural operations. In a real-world application, the user environment is typically noisy and may also contain reverberations. We proposed a novel feature interactive convolutional model with merely 100K parameters to tackle this under the noisy far-field condition. The interactive unit is proposed in place of the attention module that promotes the flow of information with more efficient computations. Moreover, curriculum-based multi-condition training is adopted to attain better noise robustness. Our model achieves 98.2% top-1 accuracy on Google Speech Command V2-12 and is competitive against large transformer models under the designed noise condition.