论文标题
关于双重下降峰的普遍性
On the Universality of the Double Descent Peak in Ridgeless Regression
论文作者
论文摘要
我们证明,对于无骑线性线性回归中标签噪声引起的预期平方平方误差,非反应分布无关的下限。我们的下限将相似的结果推广到过度参数化(插值)方案。与大多数以前的作品相反,我们的分析适用于几乎完全具有全级特征矩阵的广泛输入分布,这使我们能够涵盖各种类型的确定性或随机特征图。我们的下限渐近尖锐,意味着在存在标签噪声的情况下,对于任何这些特征地图中的任何一个,无骑线性的线性回归在插值阈值周围的表现不佳。我们详细分析了施加的假设,并为分析(随机)特征图提供了理论。使用该理论,我们可以证明,我们的假设可以用(Lebesgue)密度和具有分析激活功能(如Sigmoid,Tanh,SoftPlus或Gelu)的随机深神经网络给出的输入分布满足。作为进一步的示例,我们显示了来自随机傅立叶特征和多项式内核的特征图也满足了我们的假设。我们通过进一步的实验和分析结果补充理论。
We prove a non-asymptotic distribution-independent lower bound for the expected mean squared generalization error caused by label noise in ridgeless linear regression. Our lower bound generalizes a similar known result to the overparameterized (interpolating) regime. In contrast to most previous works, our analysis applies to a broad class of input distributions with almost surely full-rank feature matrices, which allows us to cover various types of deterministic or random feature maps. Our lower bound is asymptotically sharp and implies that in the presence of label noise, ridgeless linear regression does not perform well around the interpolation threshold for any of these feature maps. We analyze the imposed assumptions in detail and provide a theory for analytic (random) feature maps. Using this theory, we can show that our assumptions are satisfied for input distributions with a (Lebesgue) density and feature maps given by random deep neural networks with analytic activation functions like sigmoid, tanh, softplus or GELU. As further examples, we show that feature maps from random Fourier features and polynomial kernels also satisfy our assumptions. We complement our theory with further experimental and analytic results.