gan你听到我的声音吗？从扩散模型中回收无条件的语音综合

论文标题

gan你听到我的声音吗？从扩散模型中回收无条件的语音综合

GAN You Hear Me? Reclaiming Unconditional Speech Synthesis from Diffusion Models

论文作者

Baas, Matthew, Kamper, Herman

论文摘要

我们提出了Audiostylegan（Asgan），这是一种用于无条件语音综合的新生成对抗网络（GAN）。就像在样式的图像合成模型中一样，Asgan将噪声映射到了一个分离的潜在向量，然后将其映射到一系列音频特征，以便在每一层都抑制信号混叠。为了成功培训Asgan，我们引入了许多新技术，包括修改适应性歧视器的增强，以跳过歧视器更新。 Asgan在Google语音命令数据集上实现最新的最新语音综合。它的速度也比表现最佳扩散模型快得多。通过鼓励解开的设计，阿斯甘能够执行语音转换和语音编辑而无需明确培训。阿斯甘（Asgan）证明了甘恩（Gans）在扩散模型中仍然具有很高的竞争力。代码，型号，样本：https：//github.com/rf5/simple-asgan/。

We propose AudioStyleGAN (ASGAN), a new generative adversarial network (GAN) for unconditional speech synthesis. As in the StyleGAN family of image synthesis models, ASGAN maps sampled noise to a disentangled latent vector which is then mapped to a sequence of audio features so that signal aliasing is suppressed at every layer. To successfully train ASGAN, we introduce a number of new techniques, including a modification to adaptive discriminator augmentation to probabilistically skip discriminator updates. ASGAN achieves state-of-the-art results in unconditional speech synthesis on the Google Speech Commands dataset. It is also substantially faster than the top-performing diffusion models. Through a design that encourages disentanglement, ASGAN is able to perform voice conversion and speech editing without being explicitly trained to do so. ASGAN demonstrates that GANs are still highly competitive with diffusion models. Code, models, samples: https://github.com/RF5/simple-asgan/.

下载PDF全文

下载文献需遵守相关版权规定

论文标题