启用具有语言特定编码器和解码器的零击多语言口语翻译

论文标题

启用具有语言特定编码器和解码器的零击多语言口语翻译

Enabling Zero-shot Multilingual Spoken Language Translation with Language-Specific Encoders and Decoders

论文作者

Escolano, Carlos, Costa-jussà, Marta R., Fonollosa, José A. R., Segura, Carlos

论文摘要

当前的端到端口语翻译方法（SLT）依赖有限的培训资源，尤其是用于多语言设置。另一方面，多语言神经机器翻译（MultinMT）方法依赖于更高质量和更大的数据集。我们提出的方法将基于语言特定编码器描述器的多语言架构扩展到多语言SLT（Multislt）的任务。我们的方法完全消除了Multislt数据的依赖性，并且仅在ASR和MultinMT数据上培训时就可以翻译。我们对四种不同语言的实验表明，与双语基线（$ \ pm 0.2 $ bleu）相比，将语音编码耦合到多inmt体系结构，同时有效地允许零摄像的multislt产生相似的质量翻译。此外，我们建议使用适配器模块耦合语音输入。该适配器模块在拟议的体系结构上产生高达+6 BLEU点的一致改进，端到端基线上的+1 BLEU点。

Current end-to-end approaches to Spoken Language Translation (SLT) rely on limited training resources, especially for multilingual settings. On the other hand, Multilingual Neural Machine Translation (MultiNMT) approaches rely on higher-quality and more massive data sets. Our proposed method extends a MultiNMT architecture based on language-specific encoders-decoders to the task of Multilingual SLT (MultiSLT). Our method entirely eliminates the dependency from MultiSLT data and it is able to translate while training only on ASR and MultiNMT data. Our experiments on four different languages show that coupling the speech encoder to the MultiNMT architecture produces similar quality translations compared to a bilingual baseline ($\pm 0.2$ BLEU) while effectively allowing for zero-shot MultiSLT. Additionally, we propose using an Adapter module for coupling the speech inputs. This Adapter module produces consistent improvements up to +6 BLEU points on the proposed architecture and +1 BLEU point on the end-to-end baseline.

下载PDF全文

下载文献需遵守相关版权规定

论文标题