使用正则化蒸馏框架来推动自我监督的扬声器验证的限制

论文标题

使用正则化蒸馏框架来推动自我监督的扬声器验证的限制

Pushing the limits of self-supervised speaker verification using regularized distillation framework

论文作者

Chen, Yafeng, Zheng, Siqi, Wang, Hui, Cheng, Luyao, Chen, Qian

论文摘要

长期以来，培训强大的发言人验证系统长期以来一直是一项具有挑战性的任务。先前的研究观察到自我监督和完全监督的方法之间存在巨大的性能差距。在本文中，我们采用了一个非对抗性的自我监督学习框架，称为蒸馏，没有标签（Dino），并提出了两个适用于Dino嵌入的正规化术语。一个正则化项保证了嵌入的多样性，而另一个正则化项则解除了每个嵌入的变量。在时间和频域上都探索了各种数据增强技术的有效性。在Voxceleb数据集上进行的一系列实验表明，在扬声器验证中，正规化的Dino框架的优越性。我们的方法在Voxceleb的单级自我监督环境下实现了最先进的扬声器验证性能。代码已在https://github.com/alibaba-damo-academy/3D-Speaker上公开提供。

Training robust speaker verification systems without speaker labels has long been a challenging task. Previous studies observed a large performance gap between self-supervised and fully supervised methods. In this paper, we apply a non-contrastive self-supervised learning framework called DIstillation with NO labels (DINO) and propose two regularization terms applied to embeddings in DINO. One regularization term guarantees the diversity of the embeddings, while the other regularization term decorrelates the variables of each embedding. The effectiveness of various data augmentation techniques are explored, on both time and frequency domain. A range of experiments conducted on the VoxCeleb datasets demonstrate the superiority of the regularized DINO framework in speaker verification. Our method achieves the state-of-the-art speaker verification performance under a single-stage self-supervised setting on VoxCeleb. Code has been made publicly available at https://github.com/alibaba-damo-academy/3D-Speaker.

下载PDF全文

下载文献需遵守相关版权规定

论文标题