语音活动投影：自我监督的转弯事件学习

论文标题

语音活动投影：自我监督的转弯事件学习

Voice Activity Projection: Self-supervised Learning of Turn-taking Events

论文作者

Ekstedt, Erik, Skantze, Gabriel

论文摘要

对话框中转弯的建模可以看作是对话者语音活动动力学的建模。我们扩展了先前的工作，并定义了语音活动投影的预测任务，语音活动投影，这是一个普遍的，自我监管的目标，是训练转弯模型而无需标记数据的一种方式。我们通过先前的方法强调了一个理论上的弱点，主张需要在投影窗口中建模语音活动事件的依赖性。我们提出了四个零射击任务，与即将进行的转弯和后渠道的预测有关，并表明所提出的模型优于先前的工作。

The modeling of turn-taking in dialog can be viewed as the modeling of the dynamics of voice activity of the interlocutors. We extend prior work and define the predictive task of Voice Activity Projection, a general, self-supervised objective, as a way to train turn-taking models without the need of labeled data. We highlight a theoretical weakness with prior approaches, arguing for the need of modeling the dependency of voice activity events in the projection window. We propose four zero-shot tasks, related to the prediction of upcoming turn-shifts and backchannels, and show that the proposed model outperforms prior work.

下载PDF全文

下载文献需遵守相关版权规定

论文标题