通过预先训练的深层视觉模型的情感语音识别

论文标题

通过预先训练的深层视觉模型的情感语音识别

Emotional Speech Recognition with Pre-trained Deep Visual Models

论文作者

Ragheb, Waleed, Mirzapour, Mehdi, Delfardi, Ali, Jacquenet, Hélène, Carbon, Lawrence

论文摘要

在本文中，我们提出了一种使用视觉深度神经网络模型的新方法，用于情感语音识别。我们采用了预训练的计算机视觉深度模型的转移学习能力，以使语音任务中的情感识别具有授权。为了实现这一目标，我们建议使用一组复合的声学特征和一个程序将其转换为图像。此外，考虑到基于声学图像和常规图像之间的不同特征，我们为这些模型提供了训练范式。在我们的实验中，我们使用预先训练的VGG-16模型，并测试柏林Emo-DB数据集上的总体方法论，以识别说话者无关的情绪识别。我们在七个情绪的完整列表中评估了所提出的模型，结果为新的最新制定了。

In this paper, we propose a new methodology for emotional speech recognition using visual deep neural network models. We employ the transfer learning capabilities of the pre-trained computer vision deep models to have a mandate for the emotion recognition in speech task. In order to achieve that, we propose to use a composite set of acoustic features and a procedure to convert them into images. Besides, we present a training paradigm for these models taking into consideration the different characteristics between acoustic-based images and regular ones. In our experiments, we use the pre-trained VGG-16 model and test the overall methodology on the Berlin EMO-DB dataset for speaker-independent emotion recognition. We evaluate the proposed model on the full list of the seven emotions and the results set a new state-of-the-art.

下载PDF全文

下载文献需遵守相关版权规定

论文标题