论文标题
探索情感特征和融合策略,以识别音频的情感识别
Exploring Emotion Features and Fusion Strategies for Audio-Video Emotion Recognition
论文作者
论文摘要
基于音频的情感识别旨在将给定的视频分类为基本情绪。在本文中,我们描述了2019年的Emotiw中的方法,该方法主要探讨了情感特征和音频和视觉方式的融合策略。对于情感功能,我们探索了语音光谱图和对数MEL-SPECTROGRAM的音频功能,并通过不同的CNN模型和不同的情绪预审前的策略评估了几个面部特征。对于融合策略,我们探索模式内和跨模式融合方法,例如设计注意机制以突出重要的情感特征,探索特征串联和分解的双线性池(FBP)进行跨模式特征融合。通过仔细的评估,我们在AFEW验证集上获得65.5%,在测试集中获得62.48%的挑战,在挑战中排名第三。
The audio-video based emotion recognition aims to classify a given video into basic emotions. In this paper, we describe our approaches in EmotiW 2019, which mainly explores emotion features and feature fusion strategies for audio and visual modality. For emotion features, we explore audio feature with both speech-spectrogram and Log Mel-spectrogram and evaluate several facial features with different CNN models and different emotion pretrained strategies. For fusion strategies, we explore intra-modal and cross-modal fusion methods, such as designing attention mechanisms to highlights important emotion feature, exploring feature concatenation and factorized bilinear pooling (FBP) for cross-modal feature fusion. With careful evaluation, we obtain 65.5% on the AFEW validation set and 62.48% on the test set and rank third in the challenge.