论文标题
部分可观测时空混沌系统的无模型预测
Learning and generalization of one-hidden-layer neural networks, going beyond standard Gaussian data
论文作者
论文摘要
本文分析了训练一个隐藏的神经网络的收敛和概括,当输入特征遵循由有限数量的高斯分布组成的高斯混合模型。假设标签是由具有未知地面真相重量的教师模型产生的,那么学习问题是通过最大程度地降低学生神经网络的非凸风险功能来估算基础教师模型。借助有限数量的训练样本,提到样品复杂性,迭代被证明是线性收敛到临界点,并保证了概括误差。另外,本文首次表征了输入分布对样本复杂性和学习率的影响。
This paper analyzes the convergence and generalization of training a one-hidden-layer neural network when the input features follow the Gaussian mixture model consisting of a finite number of Gaussian distributions. Assuming the labels are generated from a teacher model with an unknown ground truth weight, the learning problem is to estimate the underlying teacher model by minimizing a non-convex risk function over a student neural network. With a finite number of training samples, referred to the sample complexity, the iterations are proved to converge linearly to a critical point with guaranteed generalization error. In addition, for the first time, this paper characterizes the impact of the input distributions on the sample complexity and the learning rate.