论文标题

对抗列删除下的数据库匹配

Database Matching Under Adversarial Column Deletions

论文作者

Bakirtas, Serhat, Erkip, Elza

论文摘要

最近,通过匹配或与公开相关的数据库对齐,用户从匿名的微型数据中取消匿名化。虽然大多数对数据库匹配的严格分析都集中在随机延伸模型上,但对抗性 - 延伸模型在相关文献中一直想要。在这项工作中,研究了由时间索引微数据的采样中的同步误差,研究对抗柱缺失下随机数据库的匹配(比对)。假定观察匿名数据库的受约束对手可以将列(属性)的$δ$分数删除以阻止匹配和保留隐私。两个数据库的列直方图用作置换不变的特征来检测对手选择的列删除模式。然后,列删除模式的检测随后是一个精确的行(用户)匹配方案。对这种两相方案的最坏情况分析为在近乎完美的恢复条件下成功匹配两个数据库提供了足够的条件。对误差概率的更详细的研究导致数据库增长率的必要条件,进而导致对抗匹配能力的单个字母表征。这种对抗匹配能力显示出明显低于随机匹配能力,而列删除随机发生。总体而言,我们的结果在分析上证明了对抗机制在发布匿名时间索引数据时的隐私优势。

The de-anonymization of users from anonymized microdata through matching or aligning with publicly-available correlated databases has been of scientific interest recently. While most of the rigorous analyses of database matching have focused on random-distortion models, the adversarial-distortion models have been wanting in the relevant literature. In this work, motivated by synchronization errors in the sampling of time-indexed microdata, matching (alignment) of random databases under adversarial column deletions is investigated. It is assumed that a constrained adversary, which observes the anonymized database, can delete up to a $δ$ fraction of the columns (attributes) to hinder matching and preserve privacy. Column histograms of the two databases are utilized as permutation-invariant features to detect the column deletion pattern chosen by the adversary. The detection of the column deletion pattern is then followed by an exact row (user) matching scheme. The worst-case analysis of this two-phase scheme yields a sufficient condition for the successful matching of the two databases, under the near-perfect recovery condition. A more detailed investigation of the error probability leads to a tight necessary condition on the database growth rate, and in turn, to a single-letter characterization of the adversarial matching capacity. This adversarial matching capacity is shown to be significantly lower than the random matching capacity, where the column deletions occur randomly. Overall, our results analytically demonstrate the privacy-wise advantages of adversarial mechanisms over random ones during the publication of anonymized time-indexed data.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源