论文标题
使用Tacotron2,WaveLow和转移学习的低资源端到端梵语TT
Low-Resource End-to-end Sanskrit TTS using Tacotron2, WaveGlow and Transfer Learning
论文作者
论文摘要
端到端的文本到语音(TTS)系统是针对欧洲语言和西班牙语等欧洲语言开发的,具有最先进的语音质量,韵律和自然性。但是,印度语言的端到端TT的开发在质量方面落后。此类任务所涉及的挑战是:1)缺乏优质培训数据; 2)在训练和推理期间效率低; 3)在词汇大小较大的情况下,缓慢收敛。在本文报告的工作中,我们研究了使用有限的梵语数据来调整英语预测的tacotron2模型,以合成梵文低资源设置中的自然发声语音。我们的实验表明了令人鼓舞的结果,从37位具有良好梵语知识的评估者中获得了3.38个MOS的总体MOS。考虑到我们使用的语音数据仅为持续时间2.5小时,这确实是一个很好的结果。
End-to-end text-to-speech (TTS) systems have been developed for European languages like English and Spanish with state-of-the-art speech quality, prosody, and naturalness. However, development of end-to-end TTS for Indian languages is lagging behind in terms of quality. The challenges involved in such a task are: 1) scarcity of quality training data; 2) low efficiency during training and inference; 3) slow convergence in the case of large vocabulary size. In our work reported in this paper, we have investigated the use of fine-tuning the English-pretrained Tacotron2 model with limited Sanskrit data to synthesize natural sounding speech in Sanskrit in low resource settings. Our experiments show encouraging results, achieving an overall MOS of 3.38 from 37 evaluators with good Sanskrit spoken knowledge. This is really a very good result, considering the fact that the speech data we have used is of duration 2.5 hours only.