论文标题

稀疏的矢量技术,重新审视

The Sparse Vector Technique, Revisited

论文作者

Kaplan, Haim, Mansour, Yishay, Stemmer, Uri

论文摘要

我们重新审视了差异隐私文献中最基本和最广泛的技术之一 - 稀疏向量技术[Dwork等,Stoc 2009]。这种简单的算法私下测试数据库上给定查询的值是否接近我们期望的。只要答案与我们的期望接近,并且在第一个不是这样的查询之后,就可以提出无限数量的查询。 我们建议一种替代性,同样简单的算法,只要任何一个人都不会对答案偏离我们所期望的偏离的问题的答案,该算法可以继续测试查询。我们的分析很微妙,其某些成分可能更广泛地适用。在某些情况下,我们的新算法允许从数据库中私下提取更多信息,而不是原始算法。 我们通过将我们的算法应用于转移的重力问题来证明这一点:在每个时间步骤,$ n $用户的每一个都将获得新的输入,而任务是私下识别所有当前的重型击球者。也就是说,按时步骤$ i $,目标是确定所有数据元素$ x $,以便许多用户将$ x $作为当前输入。我们为此问题提供了一种算法,可以改善使用现有技术可以获得的错误。具体而言,我们的算法的错误取决于单个用户将重击作为输入的最大次数,而不是存在重击的总数。

We revisit one of the most basic and widely applicable techniques in the literature of differential privacy - the sparse vector technique [Dwork et al., STOC 2009]. This simple algorithm privately tests whether the value of a given query on a database is close to what we expect it to be. It allows to ask an unbounded number of queries as long as the answer is close to what we expect, and halts following the first query for which this is not the case. We suggest an alternative, equally simple, algorithm that can continue testing queries as long as any single individual does not contribute to the answer of too many queries whose answer deviates substantially form what we expect. Our analysis is subtle and some of its ingredients may be more widely applicable. In some cases our new algorithm allows to privately extract much more information from the database than the original. We demonstrate this by applying our algorithm to the shifting heavy-hitters problem: On every time step, each of $n$ users gets a new input, and the task is to privately identify all the current heavy-hitters. That is, on time step $i$, the goal is to identify all data elements $x$ such that many of the users have $x$ as their current input. We present an algorithm for this problem with improved error guarantees over what can be obtained using existing techniques. Specifically, the error of our algorithm depends on the maximal number of times that a single user holds a heavy-hitter as input, rather than the total number of times in which a heavy-hitter exists.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源