重新审视，编码，洗牌，分析隐私：形式化和经验评估

论文标题

重新审视，编码，洗牌，分析隐私：形式化和经验评估

Encode, Shuffle, Analyze Privacy Revisited: Formalizations and Empirical Evaluation

论文作者

Erlingsson, Úlfar, Feldman, Vitaly, Mironov, Ilya, Raghunathan, Ananth, Song, Shuang, Talwar, Kunal, Thakurta, Abhradeep

论文摘要

最近，引入了许多方法和技术，用于报告具有强大隐私保证的软件统计信息。这些范围从抽象算法到具有不同假设的综合系统，并建立在当地的差异隐私机制和匿名性的基础上。基于编码垫片分析（ESA）框架，值得注意的结果正式澄清了隐私保证的大量改进，而不会通过使报告匿名而丢失公用事业。但是，这些结果要么包括具有看似不同的机制和攻击模型的系统，要么包括对从业者的指导很少的正式陈述。解决此问题时，我们提供了正式的治疗方法，并提供了匿名报告隐私报告的规定指南。我们使用一个简单的，抽象的攻击者模型以及涵盖IT和其他匿名系统的假设来重新审视ESA框架。鉴于新的正式隐私范围，我们研究了基于草图的编码和ESA机制（例如数据依赖数据的人群）的局限性。我们还展示了ESA碎片化概念（在单独的，不可链接的消息中报告数据方面）如何根据本地和中央差异私人保证来改善隐私/公用事业权衡。最后，为了帮助从业者了解隐私报告的适用性和局限性，我们报告了大量的经验实验。我们使用带有重尾或近牌分布的现实世界数据集，这对我们的技术构成了最大的困难。特别是，我们专注于从图像中绘制的数据，这些数据可以轻松地以突出重建错误的方式可视化。我们还展示了该方法的承诺和独立关注的希望，我们还使用匿名，隐私的报告报告了实验，以培训有关标准任务的高智能深度神经网络--- MNIST和CIFAR-10。

Recently, a number of approaches and techniques have been introduced for reporting software statistics with strong privacy guarantees. These range from abstract algorithms to comprehensive systems with varying assumptions and built upon local differential privacy mechanisms and anonymity. Based on the Encode-Shuffle-Analyze (ESA) framework, notable results formally clarified large improvements in privacy guarantees without loss of utility by making reports anonymous. However, these results either comprise of systems with seemingly disparate mechanisms and attack models, or formal statements with little guidance to practitioners. Addressing this, we provide a formal treatment and offer prescriptive guidelines for privacy-preserving reporting with anonymity. We revisit the ESA framework with a simple, abstract model of attackers as well as assumptions covering it and other proposed systems of anonymity. In light of new formal privacy bounds, we examine the limitations of sketch-based encodings and ESA mechanisms such as data-dependent crowds. We also demonstrate how the ESA notion of fragmentation (reporting data aspects in separate, unlinkable messages) improves privacy/utility tradeoffs both in terms of local and central differential-privacy guarantees. Finally, to help practitioners understand the applicability and limitations of privacy-preserving reporting, we report on a large number of empirical experiments. We use real-world datasets with heavy-tailed or near-flat distributions, which pose the greatest difficulty for our techniques; in particular, we focus on data drawn from images that can be easily visualized in a way that highlights reconstruction errors. Showing the promise of the approach, and of independent interest, we also report on experiments using anonymous, privacy-preserving reporting to train high-accuracy deep neural networks on standard tasks---MNIST and CIFAR-10.

下载PDF全文

下载文献需遵守相关版权规定

论文标题