单词错误率是否是指示语言中语音识别的良好评估指标？

论文标题

单词错误率是否是指示语言中语音识别的良好评估指标？

Is Word Error Rate a good evaluation metric for Speech Recognition in Indic Languages?

论文作者

Shah, Priyanshi, Chadha, Harveen Singh, Gupta, Anirudh, Dhuriya, Ankur, Chhimwal, Neeraj, Gaur, Rishabh, Raghavan, Vivek

论文摘要

我们提出了一种用于计算自动语音识别（ASR）中错误率的新方法。这个新的指标是针对包含半字符的语言，并且可以以不同形式编写相同的字符。我们在印地语中实施了我们的方法论，这是指示上下文中的主要语言之一，我们认为这种方法可扩展到包含大字符集的其他类似语言。我们将指标称为替代单词错误率（AWER）和替代字符错误率（ACER）。我们使用wav2Vec 2.0 \ cite {baevski2020wav2vec}训练我们的ASR模型。此外，我们使用语言模型来改善我们的模型性能。我们的结果表明，在分析单词和角色级别的错误率方面有了显着提高，ASR系统的可解释性提高到$ 3 $ \％的AWER和印地语的Acer $ 7 $ \％。我们的实验表明，在具有复杂发音的语言中，有多种写单词而不改变其含义的方式。在这种情况下，Awer和Acer将更有用，而不是将其作为指标。此外，我们使用新的公制脚本为印度语开放了21小时的新基准测试数据集。

We propose a new method for the calculation of error rates in Automatic Speech Recognition (ASR). This new metric is for languages that contain half characters and where the same character can be written in different forms. We implement our methodology in Hindi which is one of the main languages from Indic context and we think this approach is scalable to other similar languages containing a large character set. We call our metrics Alternate Word Error Rate (AWER) and Alternate Character Error Rate (ACER). We train our ASR models using wav2vec 2.0\cite{baevski2020wav2vec} for Indic languages. Additionally we use language models to improve our model performance. Our results show a significant improvement in analyzing the error rates at word and character level and the interpretability of the ASR system is improved upto $3$\% in AWER and $7$\% in ACER for Hindi. Our experiments suggest that in languages which have complex pronunciation, there are multiple ways of writing words without changing their meaning. In such cases AWER and ACER will be more useful rather than WER and CER as metrics. Further, we open source a new benchmarking dataset of 21 hours for Hindi with the new metric scripts.

下载PDF全文

下载文献需遵守相关版权规定

论文标题