Bunched LPCNET：低成本神经文本到语音系统的VOCODER

论文标题

Bunched LPCNET：低成本神经文本到语音系统的VOCODER

Bunched LPCNet : Vocoder for Low-cost Neural Text-To-Speech Systems

论文作者

Vipperla, Ravichander, Park, Sangjun, Choo, Kihyun, Ishtiaq, Samin, Min, Kyoungbo, Bhattacharya, Sourav, Mehrotra, Abhinav, Ramos, Alberto Gil C. P., Lane, Nicholas D.

论文摘要

LPCNET是一种有效的Vocoder，结合了线性预测和深度神经网络模块，以保持计算复杂性较低。在这项工作中，我们提出了两种技术，以进一步降低其复杂性，旨在建立基于低成本的LPCNET Vocoder基于基于语音的神经文本到语音（TTS）系统。这些技术是：1）样品捆绑，这使LPCNET可以每次推理生成多个音频样本； 2）位束，这减少了LPCNET最后一层的计算。借助提出的捆绑技术，LPCNET与深卷积TTS（DCTT）声学模型结合使用，在运行在移动设备上运行时的2.19倍的声学模型，其平均意见分数（MOS）的降低不足0.1。

LPCNet is an efficient vocoder that combines linear prediction and deep neural network modules to keep the computational complexity low. In this work, we present two techniques to further reduce it's complexity, aiming for a low-cost LPCNet vocoder-based neural Text-to-Speech (TTS) System. These techniques are: 1) Sample-bunching, which allows LPCNet to generate more than one audio sample per inference; and 2) Bit-bunching, which reduces the computations in the final layer of LPCNet. With the proposed bunching techniques, LPCNet, in conjunction with a Deep Convolutional TTS (DCTTS) acoustic model, shows a 2.19x improvement over the baseline run-time when running on a mobile device, with a less than 0.1 decrease in TTS mean opinion score (MOS).

下载PDF全文

下载文献需遵守相关版权规定

论文标题