论文标题
忘记学习:在机器学习中迈向真正的数据删除
Forget Unlearning: Towards True Data-Deletion in Machine Learning
论文作者
论文摘要
学习算法的目的是从训练的模型中删除删除的数据的影响,其成本低于完全重新培训。但是,先前保证文学中的文学学历是有缺陷的,并且不能保护已删除的记录的隐私。我们表明,当用户删除数据作为已发布模型的函数时,数据库中的记录就会相互依存。因此,即使在删除记录后,即使在删除记录后重新验证新模型也不能确保其隐私。其次,未学习算法缓存部分计算以加快处理的方法可能会在一系列版本上泄漏删除的信息,从而违反了长期删除记录的隐私。为了解决这些问题,我们提出了一个可靠的删除保证,并表明现有记录的隐私对于已删除记录的隐私是必要的。在这个概念下,我们提出了一种基于嘈杂梯度下降的准确,计算效率和安全的机器学习算法。
Unlearning algorithms aim to remove deleted data's influence from trained models at a cost lower than full retraining. However, prior guarantees of unlearning in literature are flawed and don't protect the privacy of deleted records. We show that when users delete their data as a function of published models, records in a database become interdependent. So, even retraining a fresh model after deletion of a record doesn't ensure its privacy. Secondly, unlearning algorithms that cache partial computations to speed up the processing can leak deleted information over a series of releases, violating the privacy of deleted records in the long run. To address these, we propose a sound deletion guarantee and show that the privacy of existing records is necessary for the privacy of deleted records. Under this notion, we propose an accurate, computationally efficient, and secure machine unlearning algorithm based on noisy gradient descent.