通用平均场上限，用于深度神经网络的概括差距

论文标题

通用平均场上限，用于深度神经网络的概括差距

Universal mean field upper bound for the generalisation gap of deep neural networks

论文作者

Ariosto, S., Pacelli, R., Ginelli, F., Gherardi, M., Rotondo, P.

论文摘要

现代深层神经网络（DNNS）对理论家来说是一个巨大的挑战：根据描述其性能的普遍接受的概率框架，由于要训练的参数数量大量，这些架构应夸大其词，但实际上他们没有。在这里，我们利用复制品平均场理论的结果来计算具有淬火特征，在教师学生的情况下以及具有二次损失功能的回归问题的机器学习模型的概括差距。值得注意的是，该框架包括DNN的情况，其中对剩余权重的特定实现进行了优化。我们展示了这些结果如何与统计学习理论的想法相结合 - 在完全训练的DNN的概括差距上提供了严格的渐近上限，这是数据集$ p $的大小的函数。特别是，在大的$ p $和$ n _ {\ textrm {out}} $（其中$ n_ \ textrm {out} $中，是最后一层的大小）和$ n_ \ textrm {out} \ ll p $，一般性的差距均超过$ 2 n _ \ textr的intertim for for for for for for for for for for for，值得注意的是，这一结果大大改善了统计学习理论的现有界限。我们测试了广泛体系结构的预测，从玩具完全连接的神经网络几乎没有隐藏层的神经网络到最先进的深卷积神经网络。

Modern deep neural networks (DNNs) represent a formidable challenge for theorists: according to the commonly accepted probabilistic framework that describes their performance, these architectures should overfit due to the huge number of parameters to train, but in practice they do not. Here we employ results from replica mean field theory to compute the generalisation gap of machine learning models with quenched features, in the teacher-student scenario and for regression problems with quadratic loss function. Notably, this framework includes the case of DNNs where the last layer is optimised given a specific realisation of the remaining weights. We show how these results -- combined with ideas from statistical learning theory -- provide a stringent asymptotic upper bound on the generalisation gap of fully trained DNN as a function of the size of the dataset $P$. In particular, in the limit of large $P$ and $N_{\textrm{out}} $ (where $N_\textrm{out}$ is the size of the last layer) and $N_\textrm{out} \ll P$, the generalisation gap approaches zero faster than $2 N_\textrm{out}/P$, for any choice of both architecture and teacher function. Notably, this result greatly improves existing bounds from statistical learning theory. We test our predictions on a broad range of architectures, from toy fully-connected neural networks with few hidden layers to state-of-the-art deep convolutional neural networks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题