增强和对抗性：使用扬声器标签改进ASR

论文标题

增强和对抗性：使用扬声器标签改进ASR

Enhancing and Adversarial: Improve ASR with Speaker Labels

论文作者

Zhou, Wei, Wu, Haotian, Xu, Jingjing, Zeineldeen, Mohammad, Lüscher, Christoph, Schlüter, Ralf, Ney, Hermann

论文摘要

通过多任务学习（MTL），可以通过增强域或域对抗训练来改善ASR，这是两个相反的目标，其目的分别增加/减小域差异，分别对域名/不知名的ASR。在这项工作中，我们研究了如何最好地使用扬声器标签应用这两个相反的目标来改善基于构象体的ASR。我们还提出了一种新型的自适应梯度逆转层，以实现稳定有效的对抗训练，而无需调整努力。进行了详细的分析和实验验证，以显示ASR神经网络（NN）中的最佳位置以应用扬声器增强和对抗训练。我们还探索了他们的组合以进一步改进，达到与I-向量以及对抗性训练相同的表现。我们最好的基于扬声器的MTL在总机HUB5'00套件上实现了7 \％的相对改进。我们还研究了此类基于说话者的MTL W.R.T.的效果。清洁数据集和较弱的ASR NN。

ASR can be improved by multi-task learning (MTL) with domain enhancing or domain adversarial training, which are two opposite objectives with the aim to increase/decrease domain variance towards domain-aware/agnostic ASR, respectively. In this work, we study how to best apply these two opposite objectives with speaker labels to improve conformer-based ASR. We also propose a novel adaptive gradient reversal layer for stable and effective adversarial training without tuning effort. Detailed analysis and experimental verification are conducted to show the optimal positions in the ASR neural network (NN) to apply speaker enhancing and adversarial training. We also explore their combination for further improvement, achieving the same performance as i-vectors plus adversarial training. Our best speaker-based MTL achieves 7\% relative improvement on the Switchboard Hub5'00 set. We also investigate the effect of such speaker-based MTL w.r.t. cleaner dataset and weaker ASR NN.

下载PDF全文

下载文献需遵守相关版权规定

论文标题