发音gan：无监督的关节学习建模

论文标题

发音gan：无监督的关节学习建模

Articulation GAN: Unsupervised modeling of articulatory learning

论文作者

Beguš, Gašper, Zhou, Alan, Wu, Peter, Anumanchipalli, Gopala K

论文摘要

生成的深神经网络被广泛用于语音综合，但大多数现有模型直接生成波形或光谱输出。然而，人类通过控制铰接器来产生语音，从而通过声音传播的物理特性导致语音产生。我们将关节发电机介绍给生成的对抗网络范式，这是一种新的无监督的语音生产/合成生成模型。通过学习以完全无监督的方式来产生发电性表述（电磁关节摄影或EMA），可以更亲密地模仿人类语音产生。然后，单独的预训练的物理模型（EMA2WAV）将生成的EMA表示形式转换为语音波形，该表示波形被发送到歧视器进行评估。宣传分析表明，该网络学会在语音生产过程中以类似的方式控制枢纽。对输出的声学分析表明，网络学会生成培训分布中存在和不存在的单词。我们还讨论了关节表达对人类语言和语音技术认知模型的含义。

Generative deep neural networks are widely used for speech synthesis, but most existing models directly generate waveforms or spectral outputs. Humans, however, produce speech by controlling articulators, which results in the production of speech sounds through physical properties of sound propagation. We introduce the Articulatory Generator to the Generative Adversarial Network paradigm, a new unsupervised generative model of speech production/synthesis. The Articulatory Generator more closely mimics human speech production by learning to generate articulatory representations (electromagnetic articulography or EMA) in a fully unsupervised manner. A separate pre-trained physical model (ema2wav) then transforms the generated EMA representations to speech waveforms, which get sent to the Discriminator for evaluation. Articulatory analysis suggests that the network learns to control articulators in a similar manner to humans during speech production. Acoustic analysis of the outputs suggests that the network learns to generate words that are both present and absent in the training distribution. We additionally discuss implications of articulatory representations for cognitive models of human language and speech technology in general.

下载PDF全文

下载文献需遵守相关版权规定

论文标题