卷积自动编码器的上下文化口语表示

论文标题

卷积自动编码器的上下文化口语表示

Contextualized Spoken Word Representations from Convolutional Autoencoders

论文作者

Mishra, Prakamya, Mathur, Pranav

论文摘要

为了构建基于文本的语言模型来执行不同的NLP任务，但在基于音频的语言模型的情况下进行了很多工作。本文提出了一个基于卷积自动编码器的神经体系结构，以句法和语义上适当的情境化表示，对不同长度的口语进行了上下文化表示。这种表示形式的使用不仅可以在基于音频的NLP任务中取得巨大进步，而且还可以减少诸如音调，表达，口音等的信息丢失，同时将语音转换为文本以执行这些任务。通过（1）检查生成的向量空间来验证所提出的模型的性能，以及（2）在三个基准数据集上评估其性能，以测量单词相似性，与现有广泛使用的基于文本的语言模型进行了经过转录的培训。与其他两个基于语言的模型相比，提出的模型能够证明其稳健性。

A lot of work has been done to build text-based language models for performing different NLP tasks, but not much research has been done in the case of audio-based language models. This paper proposes a Convolutional Autoencoder based neural architecture to model syntactically and semantically adequate contextualized representations of varying length spoken words. The use of such representations can not only lead to great advances in the audio-based NLP tasks but can also curtail the loss of information like tone, expression, accent, etc while converting speech to text to perform these tasks. The performance of the proposed model is validated by (1) examining the generated vector space, and (2) evaluating its performance on three benchmark datasets for measuring word similarities, against existing widely used text-based language models that are trained on the transcriptions. The proposed model was able to demonstrate its robustness when compared to the other two language-based models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题