论文标题
一种保护世界领导人免受假和篡改音频的跨验证方法
A Cross-Verification Approach for Protecting World Leaders from Fake and Tampered Audio
论文作者
论文摘要
本文解决了验证世界领导人语音录音真实性的问题。尽管以前关于检测深度假或篡改的音频的工作重点是隔离录音,但我们反而重新构架了问题,并专注于交叉验证可疑的录音,以针对可信赖的参考。我们提出了一种通过两个步骤组成的引用进行跨验证语音记录的方法:对齐两个记录,然后将每个查询框架分类为匹配或不匹配。我们提出了一种基于Needleman-Wunsch算法的子序列对齐方法,并表明它在处理常见的篡改操作时显着优于动态时间扭曲。我们还基于LSTM和Transformer体系结构探索了几个二进制分类模型,以在帧级别验证内容。通过对唐纳德·特朗普(Donald Trump)篡改的语音录音的广泛实验,我们表明我们的系统可以可靠地检测到不同类型和持续时间的音频篡改操作。我们的最佳模型以50毫秒的误差耐受性达到99.7%的对齐任务的准确性,在将音频帧分类为匹配或不匹配的情况下,误差率为0.43%。
This paper tackles the problem of verifying the authenticity of speech recordings from world leaders. Whereas previous work on detecting deep fake or tampered audio focus on scrutinizing an audio recording in isolation, we instead reframe the problem and focus on cross-verifying a questionable recording against trusted references. We present a method for cross-verifying a speech recording against a reference that consists of two steps: aligning the two recordings and then classifying each query frame as matching or non-matching. We propose a subsequence alignment method based on the Needleman-Wunsch algorithm and show that it significantly outperforms dynamic time warping in handling common tampering operations. We also explore several binary classification models based on LSTM and Transformer architectures to verify content at the frame level. Through extensive experiments on tampered speech recordings of Donald Trump, we show that our system can reliably detect audio tampering operations of different types and durations. Our best model achieves 99.7% accuracy for the alignment task at an error tolerance of 50 ms and a 0.43% equal error rate in classifying audio frames as matching or non-matching.