论文标题

基于感知动机的黑盒成本功能,对DNN的稳定培训进行了语音增强的培训

Stable Training of DNN for Speech Enhancement based on Perceptually-Motivated Black-Box Cost Function

论文作者

Kawanaka, Masaki, Koizumi, Yuma, Miyazaki, Ryoichi, Yatabe, Kohei

论文摘要

提高增强信号的主观声音质量是语音增强中最重要的任务之一。为了评估主观质量,已经提出了与感知动机的客观声音质量评估(OSQA)有关的几种方法,例如PESQ(语音质量的感知评估)。但是,在大多数情况下,不允许直接使用此类措施进行训练深神经网络(DNN),因为流行的OSQA在DNN参数方面是不可差异的。因此,先前的研究提议通过辅助DNN近似OSQA的得分,以便可以将其梯度用于训练初级DNN。这种方法的一个问题是由分数的近似误差引起的训练不稳定。为了克服这个问题,我们建议使用从增强学习中借用的稳定技术。该实验旨在提高PESQ的分数为例,表明所提出的方法(i)可以稳定地训练DNN以增加PESQ,(ii)在公共数据集中获得了最先进的PESQ分数,并且(III)比基于主观评估的常规方法更具声音质量。

Improving subjective sound quality of enhanced signals is one of the most important missions in speech enhancement. For evaluating the subjective quality, several methods related to perceptually-motivated objective sound quality assessment (OSQA) have been proposed such as PESQ (perceptual evaluation of speech quality). However, direct use of such measures for training deep neural network (DNN) is not allowed in most cases because popular OSQAs are non-differentiable with respect to DNN parameters. Therefore, the previous study has proposed to approximate the score of OSQAs by an auxiliary DNN so that its gradient can be used for training the primary DNN. One problem with this approach is instability of the training caused by the approximation error of the score. To overcome this problem, we propose to use stabilization techniques borrowed from reinforcement learning. The experiments, aimed to increase the score of PESQ as an example, show that the proposed method (i) can stably train a DNN to increase PESQ, (ii) achieved the state-of-the-art PESQ score on a public dataset, and (iii) resulted in better sound quality than conventional methods based on subjective evaluation.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源