论文标题
在欺骗自动扬声器验证系统攻击时,人和机器生成的语音检测和评估
Detection and Evaluation of human and machine generated speech in spoofing attacks on automatic speaker verification systems
论文作者
论文摘要
自动说话者验证(ASV)系统利用人类语音中的生物识别信息来验证说话者的身份。用于执行扬声器验证的技术通常容易受到恶意攻击的影响,这些攻击试图诱导ASV系统返回错误的结果,从而使冒名顶替者绕过系统并获得访问权限。攻击者为此使用多种欺骗技术,例如语音转换,音频重播,语音合成等。近年来,很容易获得的工具来生成深层音频,从而增加了对ASV系统的潜在威胁。在本文中,我们将基于人类模仿(语音伪装)攻击的潜力与基于机器生成的语音,Black-Box和White-Box ASV系统的攻击的潜力。我们还通过使用捕获人类言语产生的独特方面的特征来研究对策,这是机器不能模仿人类语音生产机制的许多优质复杂性。我们表明,基本频率序列相关的熵,光谱包络和多态参数是有希望的候选者,可鲁棒检测未知方法产生的深层语音。
Automatic speaker verification (ASV) systems utilize the biometric information in human speech to verify the speaker's identity. The techniques used for performing speaker verification are often vulnerable to malicious attacks that attempt to induce the ASV system to return wrong results, allowing an impostor to bypass the system and gain access. Attackers use a multitude of spoofing techniques for this, such as voice conversion, audio replay, speech synthesis, etc. In recent years, easily available tools to generate deepfaked audio have increased the potential threat to ASV systems. In this paper, we compare the potential of human impersonation (voice disguise) based attacks with attacks based on machine-generated speech, on black-box and white-box ASV systems. We also study countermeasures by using features that capture the unique aspects of human speech production, under the hypothesis that machines cannot emulate many of the fine-level intricacies of the human speech production mechanism. We show that fundamental frequency sequence-related entropy, spectral envelope, and aperiodic parameters are promising candidates for robust detection of deepfaked speech generated by unknown methods.