自动标点符号插入的端到端ASR系统

论文标题

自动标点符号插入的端到端ASR系统

End to End ASR System with Automatic Punctuation Insertion

论文作者

Guan, Yushi

论文摘要

最近的自动语音识别系统一直在朝着可以一起训练的端到端系统迈进。最近提出的许多技术启用了这一趋势，包括使用CNN的功能提取，上下文捕获和声音特征建模，使用RNN进行了捕获和声学特征建模，使用Connectionist termal分类的输入序列的自动对齐以及用RNN语言模型替换传统的N-gram语言模型。从历史上看，文本或语音对文本上下文的自动标点符号引起了很多兴趣。但是，似乎对将自动标点符号纳入新兴神经网络的端到端语音识别系统的兴趣不大，部分原因是由于缺乏英语语音语料库的标点笔录。在这项研究中，我们提出了一种使用Ted.com可用的成绩单为TEDLIUM数据集生成标点的成绩单的方法。我们还提出了一个端到端的ASR系统，该系统从语音信号同时输出单词和标点符号。将Damerau Levenshtein距离和插槽错误率结合到DLEV-SER中，当假设文本与参考文献不完全对齐时，我们可以测量标点符号错误率。与以前的方法相比，我们的模型将插槽错误率从0.497降低到0.341。

Recent Automatic Speech Recognition systems have been moving towards end-to-end systems that can be trained together. Numerous techniques that have been proposed recently enabled this trend, including feature extraction with CNNs, context capturing and acoustic feature modeling with RNNs, automatic alignment of input and output sequences using Connectionist Temporal Classifications, as well as replacing traditional n-gram language models with RNN Language Models. Historically, there has been a lot of interest in automatic punctuation in textual or speech to text context. However, there seems to be little interest in incorporating automatic punctuation into the emerging neural network based end-to-end speech recognition systems, partially due to the lack of English speech corpus with punctuated transcripts. In this study, we propose a method to generate punctuated transcript for the TEDLIUM dataset using transcripts available from ted.com. We also propose an end-to-end ASR system that outputs words and punctuations concurrently from speech signals. Combining Damerau Levenshtein Distance and slot error rate into DLev-SER, we enable measurement of punctuation error rate when the hypothesis text is not perfectly aligned with the reference. Compared with previous methods, our model reduces slot error rate from 0.497 to 0.341.

下载PDF全文

下载文献需遵守相关版权规定

论文标题