论文标题

DCCRGAN:深度复杂的卷积复发发电机对抗网络,用于增强语音

DCCRGAN: Deep Complex Convolution Recurrent Generator Adversarial Network for Speech Enhancement

论文作者

Huang, Huixiang, Wu, Renjie, Huang, Jingbiao, Lin, Jucai, Yin, Jun

论文摘要

生成对抗网络(GAN)在处理语音增强(SE)任务时仍然存在一些问题。一些基于GAN的系统直接从像素到像素直接采用相同的结构,而无需特别优化。发电机网络的重要性尚未得到充分探索。其他相关研究改变了发电机网络,但在时频域中运行,这忽略了相位不匹配的问题。为了解决这些问题,本文提出了深层复杂的卷积复发gan(DCCRGAN)结构。该复合模块建立了波形的大小与相之间的相关性,并已被证明是有效的。提议的结构以端到端的方式进行了训练。发电机网络中使用了不同的LSTM层来充分探索DCCRGAN的语音增强性能。实验结果证实,拟议的DCCRGAN优于最先进的基于GAN的SE系统。

Generative adversarial network (GAN) still exists some problems in dealing with speech enhancement (SE) task. Some GAN-based systems adopt the same structure from Pixel-to-Pixel directly without special optimization. The importance of the generator network has not been fully explored. Other related researches change the generator network but operate in the time-frequency domain, which ignores the phase mismatch problem. In order to solve these problems, a deep complex convolution recurrent GAN (DCCRGAN) structure is proposed in this paper. The complex module builds the correlation between magnitude and phase of the waveform and has been proved to be effective. The proposed structure is trained in an end-to-end way. Different LSTM layers are used in the generator network to sufficiently explore the speech enhancement performance of DCCRGAN. The experimental results confirm that the proposed DCCRGAN outperforms the state-of-the-art GAN-based SE systems.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源