随机梯度下降的尺寸独立概括误差

论文标题

随机梯度下降的尺寸独立概括误差

Dimension Independent Generalization Error by Stochastic Gradient Descent

论文作者

Chen, Xi, Liu, Qiang, Tong, Xin T.

论文摘要

一个经典的统计经典是，大型模型容易过度拟合，并且对于高维数据而言，模型选择程序是必需的。但是，尽管经常接受简单的在线方法和正则化培训，但许多过度参数化模型（例如神经网络）在实践中表现良好。过度参数化模型的经验成功通常被称为良性过度拟合，激发了我们对在线优化的统计概括理论进行新的研究。特别是，我们提出了关于凸和局部凸损耗函数的随机梯度下降（SGD）溶液的概括误差的一般理论。我们进一步讨论导致````低有效维度''''''的数据和模型条件。在这些条件下，我们表明，概括误差要么不取决于环境尺寸$ p $ $ p $或通过多层质量因子依赖于$ p $。我们还证明，在几个广泛使用的统计模型中，``低有效的统计''的统计模型，包括``有效的统计''的自然统计范围。回归和逻辑回归以及非凸模型，例如$ m $估计器和两层神经网络。

One classical canon of statistics is that large models are prone to overfitting, and model selection procedures are necessary for high dimensional data. However, many overparameterized models, such as neural networks, perform very well in practice, although they are often trained with simple online methods and regularization. The empirical success of overparameterized models, which is often known as benign overfitting, motivates us to have a new look at the statistical generalization theory for online optimization. In particular, we present a general theory on the generalization error of stochastic gradient descent (SGD) solutions for both convex and locally convex loss functions. We further discuss data and model conditions that lead to a ``low effective dimension". Under these conditions, we show that the generalization error either does not depend on the ambient dimension $p$ or depends on $p$ via a poly-logarithmic factor. We also demonstrate that in several widely used statistical models, the ``low effective dimension'' arises naturally in overparameterized settings. The studied statistical applications include both convex models such as linear regression and logistic regression and non-convex models such as $M$-estimator and two-layer neural networks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题