模型压缩中过度参数化的可证明的好处：从双重下降到修剪神经网络

论文标题

模型压缩中过度参数化的可证明的好处：从双重下降到修剪神经网络

Provable Benefits of Overparameterization in Model Compression: From Double Descent to Pruning Neural Networks

论文作者

Chang, Xiangyu, Li, Yingcong, Oymak, Samet, Thrampoulidis, Christos

论文摘要

与培训数据集的大小相比，深层网络通常具有更多的参数培训。最近的经验证据表明，过度参数化的实践不仅使训练大型模型受益，而且有助于建立轻量级模型。具体而言，它表明过度参数有益于模型修剪 /稀疏。本文通过理论上表征过度参数化制度中模型修剪的高维渐近学，从而阐明了这些经验发现。提出的理论提出了以下核心问题：“一个人应该从一开始就训练一个小型模型，还是首先训练大型模型然后修剪？”我们在分析上确定了制度，即使知道最有用的功能的位置，我们也最好选择大型模型，然后修剪，而不是简单地使用已知的信息功能进行培训。这在稀疏模型的训练中导致了新的双重下降：种植原始模型，同时保留目标稀疏性，并提高了测试准确性，因为人们的移动超出了过度参数阈值。我们的分析进一步揭示了通过将其与特征相关性相关联来进行重新培训的好处。我们发现上述现象已经存在于线性和随机功能模型中。我们的技术方法推进了高维分析的工具集，并精确地表征了过度参数最小二乘的渐近分布。通过分析研究更简单的模型获得的直觉在神经网络上进行了数值验证。

Deep networks are typically trained with many more parameters than the size of the training dataset. Recent empirical evidence indicates that the practice of overparameterization not only benefits training large models, but also assists - perhaps counterintuitively - building lightweight models. Specifically, it suggests that overparameterization benefits model pruning / sparsification. This paper sheds light on these empirical findings by theoretically characterizing the high-dimensional asymptotics of model pruning in the overparameterized regime. The theory presented addresses the following core question: "should one train a small model from the beginning, or first train a large model and then prune?". We analytically identify regimes in which, even if the location of the most informative features is known, we are better off fitting a large model and then pruning rather than simply training with the known informative features. This leads to a new double descent in the training of sparse models: growing the original model, while preserving the target sparsity, improves the test accuracy as one moves beyond the overparameterization threshold. Our analysis further reveals the benefit of retraining by relating it to feature correlations. We find that the above phenomena are already present in linear and random-features models. Our technical approach advances the toolset of high-dimensional analysis and precisely characterizes the asymptotic distribution of over-parameterized least-squares. The intuition gained by analytically studying simpler models is numerically verified on neural networks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题