用原始波形cldnns端到端欺骗检测

论文标题

用原始波形cldnns端到端欺骗检测

End-to-end spoofing detection with raw waveform CLDNNs

论文作者

Dinkel, Heinrich, Chen, Nanxin, Qian, Yanmin, Yu, Kai

论文摘要

尽管在说话者验证方面的最新进展仍会产生强大的模型，以欺骗性语音的形式发生恶意攻击，但通常没有应对。 ASVSPOOF2015和BTAS2016挑战的最新结果表明，欺骗感知功能是解决此问题的可能解决方案。这两个挑战中的大多数成功方法都集中在欺骗感知功能上，而不是专注于强大的分类器。在本文中，我们提出了一种基于原始波形的新型深层模型，用于欺骗检测，该模型共同充当特征提取器和分类器，从而可以直接对语音信号进行分类。该方法可以视为端到端分类器，它消除了对数据进行任何预性或后处理的需求，使培训和评估成为简化的过程，而与其他基于神经网络的方法相比，培训和评估的时间少。 BTAS2016数据集的实验表明，从以前发布的1.26 \％的一半总错误率（HTER）到当前的0.82 \％hter，拟议的原始波形卷积长期神经网络（CLDNN）可显着提高系统性能。此外，它表明所提出的系统在未知（re-ph2-ph3，re-lpph2-ph3）条件下也表现良好。

Albeit recent progress in speaker verification generates powerful models, malicious attacks in the form of spoofed speech, are generally not coped with. Recent results in ASVSpoof2015 and BTAS2016 challenges indicate that spoof-aware features are a possible solution to this problem. Most successful methods in both challenges focus on spoof-aware features, rather than focusing on a powerful classifier. In this paper we present a novel raw waveform based deep model for spoofing detection, which jointly acts as a feature extractor and classifier, thus allowing it to directly classify speech signals. This approach can be considered as an end-to-end classifier, which removes the need for any pre- or post-processing on the data, making training and evaluation a streamlined process, consuming less time than other neural-network based approaches. The experiments on the BTAS2016 dataset show that the system performance is significantly improved by the proposed raw waveform convolutional long short term neural network (CLDNN), from the previous best published 1.26\% half total error rate (HTER) to the current 0.82\% HTER. Moreover it shows that the proposed system also performs well under the unknown (RE-PH2-PH3,RE-LPPH2-PH3) conditions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题