论文标题

垂直分区数据上的隐私数据共享

Privacy-preserving Data Sharing on Vertically Partitioned Data

论文作者

Tajeddine, Razane, Jälkö, Joonas, Kaski, Samuel, Honkela, Antti

论文摘要

在这项工作中,我们引入了一种差异性私有方法,用于从垂直分区的数据\ emph {i.e。}生成合成数据,其中同一个人的数据分布在多个数据持有人或各方之间。我们提出了一种差异性隐私随机梯度下降(DP-SGD)算法,以使用变异推理在此类分区数据上训练混合模型。我们修改了安全的多方计算(MPC)框架,以将MPC与差异隐私(DP)相结合,以便有效地使用差异性私有MPC来学习DP下在此类垂直分区数据的DP下的概率生成模型。 假设混合物组件不包含不同各方的依赖性,则可以将目标函数分解为当事方计算的贡献的产物之和。最后,MPC用于计算不同贡献之间的聚集体。此外,我们严格地定义了系统中不同玩家的隐私保证。为了证明我们的方法的准确性,我们从UCI机器学习存储库上运行算法,在成人数据集上,我们获得了与非分区情况的可比结果。

In this work, we introduce a differentially private method for generating synthetic data from vertically partitioned data, \emph{i.e.}, where data of the same individuals is distributed across multiple data holders or parties. We present a differentially privacy stochastic gradient descent (DP-SGD) algorithm to train a mixture model over such partitioned data using variational inference. We modify a secure multiparty computation (MPC) framework to combine MPC with differential privacy (DP), in order to use differentially private MPC effectively to learn a probabilistic generative model under DP on such vertically partitioned data. Assuming the mixture components contain no dependencies across different parties, the objective function can be factorized into a sum of products of the contributions calculated by the parties. Finally, MPC is used to compute the aggregate between the different contributions. Moreover, we rigorously define the privacy guarantees with respect to the different players in the system. To demonstrate the accuracy of our method, we run our algorithm on the Adult dataset from the UCI machine learning repository, where we obtain comparable results to the non-partitioned case.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源