论文标题

估计神经网络学习能力的几何观点

Geometry Perspective Of Estimating Learning Capability Of Neural Networks

论文作者

Dutta, Ankan, Rakshit, Arnab

论文摘要

本文使用统计和差异几何动机来获取有关给定数据集上人工神经网络的学习能力的先前信息。本文考虑了一类广泛的神经网络,具有广义架构,具有随机梯度下降(SGD)的简单最小成方回归。分析了学习轨迹中两个关键时期的系统特征。在训练阶段的某些时期内,系统达到平衡,概括能力达到了最大值。该系统也可以与局部的非平衡状态相干,该状态的特征是Hessian基质的稳定。本文证明,具有较高概括能力的神经网络的收敛速率将较慢。还讨论了概括能力与神经网络稳定性之间的关系。通过将高能物理学的原理与神经网络的学习理论相关联,本文从人工神经网络的角度建立了复杂性猜想的变体。

The paper uses statistical and differential geometric motivation to acquire prior information about the learning capability of an artificial neural network on a given dataset. The paper considers a broad class of neural networks with generalized architecture performing simple least square regression with stochastic gradient descent (SGD). The system characteristics at two critical epochs in the learning trajectory are analyzed. During some epochs of the training phase, the system reaches equilibrium with the generalization capability attaining a maximum. The system can also be coherent with localized, non-equilibrium states, which is characterized by the stabilization of the Hessian matrix. The paper proves that neural networks with higher generalization capability will have a slower convergence rate. The relationship between the generalization capability with the stability of the neural network has also been discussed. By correlating the principles of high-energy physics with the learning theory of neural networks, the paper establishes a variant of the Complexity-Action conjecture from an artificial neural network perspective.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源