论文标题
新的安全稀疏内部产品,并应用机器学习
New Secure Sparse Inner Product with Applications to Machine Learning
论文作者
论文摘要
稀疏的内部产品(SIP)具有由当事人之间的输入交叉所主导的高间接费用的吸引力,而与实际输入大小无关。它具有吸引人的前景,尤其是在大规模数据上增强机器学习,这与稀疏的数据纠缠在一起。在本文中,我们调查了以前很少探索的保护隐私的SIP问题。具体来说,我们提出了两种混凝土构造,一种需要离线线性通信,可以在查询中摊销,而另一个则具有sublinear架设,但依赖于更昂贵的工具。我们的方法利用了最新的加密工具,包括乱七八糟的布鲁姆过滤器(GBF)和私人信息检索(PIR)作为基石,但仔细融合了它们以获得非平凡的高架降低。我们提供对拟议结构的正式安全分析,并将它们实施到代表性的机器学习算法中,包括K-Nearest邻居,天真的贝叶斯分类和逻辑回归。与现有的努力相比,我们的方法在运行时达到了$ 2 $ - $ 50 \ $ 50 \ times $速度,最高$ 10 \ times $ $ $减少通信。
Sparse inner product (SIP) has the attractive property of overhead being dominated by the intersection of inputs between parties, independent of the actual input size. It has intriguing prospects, especially for boosting machine learning on large-scale data, which is tangled with sparse data. In this paper, we investigate privacy-preserving SIP problems that have rarely been explored before. Specifically, we propose two concrete constructs, one requiring offline linear communication which can be amortized across queries, while the other has sublinear overhead but relies on the more computationally expensive tool. Our approach exploits state-of-the-art cryptography tools including garbled Bloom filters (GBF) and Private Information Retrieval (PIR) as the cornerstone, but carefully fuses them to obtain non-trivial overhead reductions. We provide formal security analysis of the proposed constructs and implement them into representative machine learning algorithms including k-nearest neighbors, naive Bayes classification and logistic regression. Compared to the existing efforts, our method achieves $2$-$50\times$ speedup in runtime and up to $10\times$ reduction in communication.