论文标题
深入学习的智能扬声器上的指纹加密语音流量
Fingerprinting Encrypted Voice Traffic on Smart Speakers with Deep Learning
论文作者
论文摘要
本文在加密的交通分析攻击下调查了智能扬声器的隐私泄漏,该攻击称为语音命令指纹。在这次攻击中,对手可以窃听智能扬声器的外向和传入的加密语音流量,并注入用户对加密流量说的语音命令。我们首先构建了一个自动语音流量收集工具,并在两个智能扬声器Amazon Echo和Google Home上收集了两个大规模数据集。然后,我们通过利用深度学习来实施概念验证攻击。我们对两个数据集的实验结果表明了令人不安的隐私问题。具体来说,与随机猜测的1%精度相比,我们的攻击可以正确地推断出亚马逊回声上92.89 \%精度的加密流量的语音命令。尽管人类的声音可能会导致传出流量,但我们的概念证明攻击仍然有效,甚至仅利用传入的流量(即服务器的流量)。这是因为在服务器端响应上运行的基于AI的语音服务以相同的语音命令,并且文本中具有确定性或可预测的方式,这在加密流量上留下了可区分的模式。我们还建立了概念验证防御,以使加密的流量混淆。我们的结果表明,防守可以有效地降低对亚马逊回声的攻击精度至32.18%。
This paper investigates the privacy leakage of smart speakers under an encrypted traffic analysis attack, referred to as voice command fingerprinting. In this attack, an adversary can eavesdrop both outgoing and incoming encrypted voice traffic of a smart speaker, and infers which voice command a user says over encrypted traffic. We first built an automatic voice traffic collection tool and collected two large-scale datasets on two smart speakers, Amazon Echo and Google Home. Then, we implemented proof-of-concept attacks by leveraging deep learning. Our experimental results over the two datasets indicate disturbing privacy concerns. Specifically, compared to 1% accuracy with random guess, our attacks can correctly infer voice commands over encrypted traffic with 92.89\% accuracy on Amazon Echo. Despite variances that human voices may cause on outgoing traffic, our proof-of-concept attacks remain effective even only leveraging incoming traffic (i.e., the traffic from the server). This is because the AI-based voice services running on the server side response commands in the same voice and with a deterministic or predictable manner in text, which leaves distinguishable pattern over encrypted traffic. We also built a proof-of-concept defense to obfuscate encrypted traffic. Our results show that the defense can effectively mitigate attack accuracy on Amazon Echo to 32.18%.