论文标题
辅助序列标签任务,用于探测
Auxiliary Sequence Labeling Tasks for Disfluency Detection
论文作者
论文摘要
在自发语音中检测出现是自然语言处理和语音识别应用中的重要预处理步骤。现有的探索工作的工作重点是设计一个单一的目标来检测,而利用单词的语言信息(例如命名实体或词性词性信息)的辅助目标是有效的。在本文中,我们着重于检测对口语转录本上的分歧,并提出了一种利用命名实体识别(NER)和词语(POS)作为辅助序列标记(SL)任务的方法,以进行分裂检测。首先,我们调查利用单词语言信息的案例可以防止错误预测的重要单词,并且有助于正确检测疏离。其次,我们表明,通过辅助SL任务训练一种差异检测模型可以改善其在弱点检测中的F评分。然后,我们分析哪些辅助SL任务取决于基线模型。广泛使用的英语总机数据集上的实验结果表明,我们的方法在探测中的前一个最先进的方法都优于先前的最先进。
Detecting disfluencies in spontaneous speech is an important preprocessing step in natural language processing and speech recognition applications. Existing works for disfluency detection have focused on designing a single objective only for disfluency detection, while auxiliary objectives utilizing linguistic information of a word such as named entity or part-of-speech information can be effective. In this paper, we focus on detecting disfluencies on spoken transcripts and propose a method utilizing named entity recognition (NER) and part-of-speech (POS) as auxiliary sequence labeling (SL) tasks for disfluency detection. First, we investigate cases that utilizing linguistic information of a word can prevent mispredicting important words and can be helpful for the correct detection of disfluencies. Second, we show that training a disfluency detection model with auxiliary SL tasks can improve its F-score in disfluency detection. Then, we analyze which auxiliary SL tasks are influential depending on baseline models. Experimental results on the widely used English Switchboard dataset show that our method outperforms the previous state-of-the-art in disfluency detection.