论文标题

通过渠道相关性从自我监督的语音模型中提取说话者和情感信息

Extracting speaker and emotion information from self-supervised speech models via channel-wise correlations

论文作者

Stafylakis, Themos, Mosner, Ladislav, Kakouros, Sofoklis, Plchot, Oldrich, Burget, Lukas, Cernocky, Jan

论文摘要

从大量未标记的数据中对语音表示的自我监督学习已使最新的语音处理任务导致了最先进的信息。通常使用描述性统计数据,尤其是使用表示系数的一阶统计数据来汇总这些语音表示形式。在本文中,我们研究了一种基于表示形式系数之间的相关性 - 相关性汇总的相关性,从而从自我监管的训练有素的模型中提取说话者和情感信息的另一种方法。当通过融合组合合并方法时,我们显示出比平均合并和进一步增长的改进。该代码可在github.com/lamomal/s3prl_corralation上获得。

Self-supervised learning of speech representations from large amounts of unlabeled data has enabled state-of-the-art results in several speech processing tasks. Aggregating these speech representations across time is typically approached by using descriptive statistics, and in particular, using the first- and second-order statistics of representation coefficients. In this paper, we examine an alternative way of extracting speaker and emotion information from self-supervised trained models, based on the correlations between the coefficients of the representations - correlation pooling. We show improvements over mean pooling and further gains when the pooling methods are combined via fusion. The code is available at github.com/Lamomal/s3prl_correlation.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源