论文标题
MFA:TDNN具有多尺度的频率通道的关注文本独立的扬声器验证
MFA: TDNN with Multi-scale Frequency-channel Attention for Text-independent Speaker Verification with Short Utterances
论文作者
论文摘要
时间延迟神经网络(TDNN)代表了独立于文本的说话者验证的神经解决方案的最新方法之一。但是,它们需要大量过滤器来捕获任何局部频率区域的扬声器特性。此外,此类系统的性能可能会在简短的话语场景下降低。为了解决这些问题,我们提出了一个多尺度的频率通道注意(MFA),我们通过新颖的双路设计以不同的尺度来表征说话者,该设计由卷积神经网络和TDNN组成。我们评估了Voxceleb数据库上提出的MFA,并观察到使用MFA的拟议框架可以实现最先进的性能,同时降低参数和计算复杂性。此外,发现MFA机制对于用简短的测试话语而言,对于说话者验证是有效的。
The time delay neural network (TDNN) represents one of the state-of-the-art of neural solutions to text-independent speaker verification. However, they require a large number of filters to capture the speaker characteristics at any local frequency region. In addition, the performance of such systems may degrade under short utterance scenarios. To address these issues, we propose a multi-scale frequency-channel attention (MFA), where we characterize speakers at different scales through a novel dual-path design which consists of a convolutional neural network and TDNN. We evaluate the proposed MFA on the VoxCeleb database and observe that the proposed framework with MFA can achieve state-of-the-art performance while reducing parameters and computation complexity. Further, the MFA mechanism is found to be effective for speaker verification with short test utterances.