论文标题

将回荡的言语翻译成词汇:伯特的语音修复

Translate Reverberated Speech to Anechoic Ones: Speech Dereverberation with BERT

论文作者

Jiao, Yang

论文摘要

在这项工作中考虑了单个通道语音替代。受到自然语言处理(NLP)领域中变形金刚(BERT)模型的双向编码器表示的最新成功的启发,我们研究了其作为骨干序列模型的适用性,以增强回荡的语音信号。我们提出了基本BERT模型的变体:一个前序列网络,该网络在主干序列模型之前提取本地光谱 - 时空信息和/或提供订单信息。此外,我们使用预训练的神经声码器进行隐式相重建。为了评估我们的方法,我们使用了第三键挑战中的数据,并将结果与​​其他方法进行比较。实验表明,所提出的方法的表现优于传统方法,并通过最新的基于BLSTM的序列模型实现了可比的性能。

Single channel speech dereverberation is considered in this work. Inspired by the recent success of Bidirectional Encoder Representations from Transformers (BERT) model in the domain of Natural Language Processing (NLP), we investigate its applicability as backbone sequence model to enhance reverberated speech signal. We present a variation of the basic BERT model: a pre-sequence network, which extracts local spectral-temporal information and/or provides order information, before the backbone sequence model. In addition, we use pre-trained neural vocoder for implicit phase reconstruction. To evaluate our method, we used the data from the 3rd CHiME challenge, and compare our results with other methods. Experiments show that the proposed method outperforms traditional method WPE, and achieve comparable performance with state-of-the-art BLSTM-based sequence models.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源