COTATRON：转录引导的语音编码器，用于任何对数量的语音转换，而无需并联数据

论文标题

COTATRON：转录引导的语音编码器，用于任何对数量的语音转换，而无需并联数据

Cotatron: Transcription-Guided Speech Encoder for Any-to-Many Voice Conversion without Parallel Data

论文作者

Park, Seung-won, Kim, Doo-young, Joe, Myun-chul

论文摘要

我们提出了COTATRON，这是一种用于说话者独立语言表示的转录引导的语音编码器。 COTATRON基于MultiSpeaker TTS体系结构，可以使用常规TTS数据集进行培训。我们训练语音转换系统，以使用Cotatron特征重建语音，该功能类似于基于语音后验（PPG）的先前方法。通过培训和评估来自VCTK数据集的108位扬声器的系统，我们在自然性和说话者的相似性方面都优于先前的方法。我们的系统还可以转换在训练过程中看不见的扬声器的语音，并利用ASR自动化转录，以最小的降低性能。可以在https://mindslab-ai.github.io/cotatron上找到音频样本，并将很快提供带有预培训模型的代码。

We propose Cotatron, a transcription-guided speech encoder for speaker-independent linguistic representation. Cotatron is based on the multispeaker TTS architecture and can be trained with conventional TTS datasets. We train a voice conversion system to reconstruct speech with Cotatron features, which is similar to the previous methods based on Phonetic Posteriorgram (PPG). By training and evaluating our system with 108 speakers from the VCTK dataset, we outperform the previous method in terms of both naturalness and speaker similarity. Our system can also convert speech from speakers that are unseen during training, and utilize ASR to automate the transcription with minimal reduction of the performance. Audio samples are available at https://mindslab-ai.github.io/cotatron, and the code with a pre-trained model will be made available soon.

下载PDF全文

下载文献需遵守相关版权规定

论文标题