论文标题

通过预先训练的深层视觉模型的情感语音识别

Emotional Speech Recognition with Pre-trained Deep Visual Models

论文作者

Ragheb, Waleed, Mirzapour, Mehdi, Delfardi, Ali, Jacquenet, Hélène, Carbon, Lawrence

论文摘要

在本文中,我们提出了一种使用视觉深度神经网络模型的新方法,用于情感语音识别。我们采用了预训练的计算机视觉深度模型的转移学习能力,以使语音任务中的情感识别具有授权。为了实现这一目标,我们建议使用一组复合的声学特征和一个程序将其转换为图像。此外,考虑到基于声学图像和常规图像之间的不同特征,我们为这些模型提供了训练范式。在我们的实验中,我们使用预先训练的VGG-16模型,并测试柏林Emo-DB数据集上的总体方法论,以识别说话者无关的情绪识别。我们在七个情绪的完整列表中评估了所提出的模型,结果为新的最新制定了。

In this paper, we propose a new methodology for emotional speech recognition using visual deep neural network models. We employ the transfer learning capabilities of the pre-trained computer vision deep models to have a mandate for the emotion recognition in speech task. In order to achieve that, we propose to use a composite set of acoustic features and a procedure to convert them into images. Besides, we present a training paradigm for these models taking into consideration the different characteristics between acoustic-based images and regular ones. In our experiments, we use the pre-trained VGG-16 model and test the overall methodology on the Berlin EMO-DB dataset for speaker-independent emotion recognition. We evaluate the proposed model on the full list of the seven emotions and the results set a new state-of-the-art.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源