论文标题
Cycygan-VC3:检查和改善旋转光谱转换的CycleGAN-VCS
CycleGAN-VC3: Examining and Improving CycleGAN-VCs for Mel-spectrogram Conversion
论文作者
论文摘要
非并行语音转换(VC)是一种在不使用并行语料库的情况下学习源和目标语音之间映射的技术。最近,周期矛盾的对抗网络(CycleGAN)-VC和Cyclegan-VC2在此问题上显示出令人鼓舞的结果,并已被广泛用作基准方法。但是,由于Cyclegan-VC/VC2对MEL光谱转换的有效性的含糊不清,即使比较方法采用MEL-Spectrogragron作为转换目标,它们通常也用于Mel-Cepstrum转换。为了解决这个问题,我们检查了Cyclegan-VC/VC2对MEL光谱转换的适用性。通过最初的实验,我们发现它们的直接应用损害了在转换过程中应保留的时频结构。为了解决这个问题,我们提出了Cyclegan-VC3,这是Cyclean-VC2的改进,该Cycygan-VC2结合了时间频自适应归一化(TFAN)。使用TFAN,我们可以在反映源旋光光谱图的时频结构的同时调整转换特征的比例和偏置。我们在性别间和性别内非平行VC上评估了Cyclegan-VC3。对自然性和相似性的主观评估表明,对于每一个VC对,CycleGAN-VC3均优于表现或与两种类型的Cyclegan-VC2竞争,其中一种用于Mel-Cepstrum,另一个应用于Mel-Spectrogram。音频样本可在http://www.kecl.ntt.co.jp/people/kaneko.takuhiro/projects/cyclegan-vc3/index.html上找到。
Non-parallel voice conversion (VC) is a technique for learning mappings between source and target speeches without using a parallel corpus. Recently, cycle-consistent adversarial network (CycleGAN)-VC and CycleGAN-VC2 have shown promising results regarding this problem and have been widely used as benchmark methods. However, owing to the ambiguity of the effectiveness of CycleGAN-VC/VC2 for mel-spectrogram conversion, they are typically used for mel-cepstrum conversion even when comparative methods employ mel-spectrogram as a conversion target. To address this, we examined the applicability of CycleGAN-VC/VC2 to mel-spectrogram conversion. Through initial experiments, we discovered that their direct applications compromised the time-frequency structure that should be preserved during conversion. To remedy this, we propose CycleGAN-VC3, an improvement of CycleGAN-VC2 that incorporates time-frequency adaptive normalization (TFAN). Using TFAN, we can adjust the scale and bias of the converted features while reflecting the time-frequency structure of the source mel-spectrogram. We evaluated CycleGAN-VC3 on inter-gender and intra-gender non-parallel VC. A subjective evaluation of naturalness and similarity showed that for every VC pair, CycleGAN-VC3 outperforms or is competitive with the two types of CycleGAN-VC2, one of which was applied to mel-cepstrum and the other to mel-spectrogram. Audio samples are available at http://www.kecl.ntt.co.jp/people/kaneko.takuhiro/projects/cyclegan-vc3/index.html.