论文标题

用特定于语音的Dirichlet先验估算情感类标签的不确定性

Estimating the Uncertainty in Emotion Class Labels with Utterance-Specific Dirichlet Priors

论文作者

Wu, Wen, Zhang, Chao, Wu, Xixin, Woodland, Philip C.

论文摘要

情感识别是需要自然与人类互动的人工智能系统的关键属性。但是,由于情感的固有歧义,任务定义仍然是一个空旷的问题。在本文中,提出了一种基于dirichlet的新型贝叶斯训练损失,以言语情绪识别,该言语识别为言语情绪识别,该分布在人类注释者将相同的话语分配给不同的情感类别时创建的单热标签中的不确定性。一个额外的指标用于通过具有高标签不确定性的检测测试说法来评估性能。这消除了一个主要局限性,即情绪分类系统仅考虑大多数注释者在情感阶段一致的标签上考虑话语。此外,研究了一种常见的方法,以利用通过平均单热标签获得的连续价值“软”标签。我们提出了一个双分支模型的结构,用于以每一含量为基础,该结构在广泛使用的Iemocap数据集上实现了最新的分类结果。基于此,进行了不确定性估计实验。当在Precision-Recall曲线下,当检测高度不确定性的话语时,通过在软标签的Kullback-Leibler差异训练损失中插入贝叶斯训练损失时,可以实现最佳性能。使用MSP播客数据集验证了所提出的方法的通用性,该数据集产生了相同的结果模式。

Emotion recognition is a key attribute for artificial intelligence systems that need to naturally interact with humans. However, the task definition is still an open problem due to the inherent ambiguity of emotions. In this paper, a novel Bayesian training loss based on per-utterance Dirichlet prior distributions is proposed for verbal emotion recognition, which models the uncertainty in one-hot labels created when human annotators assign the same utterance to different emotion classes. An additional metric is used to evaluate the performance by detection test utterances with high labelling uncertainty. This removes a major limitation that emotion classification systems only consider utterances with labels where the majority of annotators agree on the emotion class. Furthermore, a frequentist approach is studied to leverage the continuous-valued "soft" labels obtained by averaging the one-hot labels. We propose a two-branch model structure for emotion classification on a per-utterance basis, which achieves state-of-the-art classification results on the widely used IEMOCAP dataset. Based on this, uncertainty estimation experiments were performed. The best performance in terms of the area under the precision-recall curve when detecting utterances with high uncertainty was achieved by interpolating the Bayesian training loss with the Kullback-Leibler divergence training loss for the soft labels. The generality of the proposed approach was verified using the MSP-Podcast dataset which yielded the same pattern of results.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源